RUCL Institutional Repository

Exploring Toxicogenomic Biomarkers using Statistical Models

Show simple item record

dc.contributor.advisor Mollah, Md. Nurul Haque
dc.contributor.advisor Begum, Anjuman Ara
dc.contributor.advisor Rahman, Moizur
dc.contributor.author Hasan, Mohammad Nazmol
dc.date.accessioned 2022-08-31T06:40:29Z
dc.date.available 2022-08-31T06:40:29Z
dc.date.issued 2019
dc.identifier.uri http://rulrepository.ru.ac.bd/handle/123456789/817
dc.description This Thesis is Submitted to the Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh for The Degree of Doctor of Philosophy (PhD) en_US
dc.description.abstract Toxicogenomics studies combines toxicology with several omics technologies (genornics, transcriptomics, proteomics and metabolomics) to assess the risk of toxins (small molecules, peptides or proteins) and chemical agents (drugs, gasoline, alcohol, pesticides, fuel oil and cosmetics) in organism. Through integration of these omics technologies with bioinformatics, toxicogenomics can be used to suggest the molecular mechanism of toxicity. This can reduce the cost in terms of time, labor, compound synthesis and animal use which are main limitations of traditional toxicology work. There are three main objectives of toxicogenomics studies as well as drug discovery and development. 1) To explore the toxicogenomic biomarkers and toxicity of the doses of chemical compounds (DCCs). 2) Exploration of co-clusters between correlated genes and DCCs. and 3) Detection of significant gene and DCCs interactions. In this thesis, we have addressed all of these objectives in absence and presence outlying observations in the toxicogenomic dataset. There are some online computational tools that can explore the toxicogenomic biomarker genes based on t-test and Mann-Whitney U test. However, these tools cannot identify the significant DCCs that regulate the expression pattern of biomarker genes. To overcome this problem we have described one-way ANOVA together with and tukeys' HSD test (post-hoc test) ( chapter 2) to explore toxicogenomic biomarker genes and the significant toxic DCCs. The biomarker genes identified by the ANOV A approach are functionally annotated and found statistically significant for the respective pathway. The tukeys' HSD test identified toxic DCCs have also been validated by the existing literature. Besides this, according to the characteristics of toxicogenomic data exploration of co-clusters between genes and DCCs is another important objective of toxicogenomic studies. Hierarchical clustering (HC) is very popular and widely applied data analysis tool; it search interesting groups of objects in a dataset based on any combination of distance (euclidean, maximum, manhattan, canberra, minkowski) and HC (ward.D, ward.D2, single, complete, average, mcquitty, median, centroid) methods. However, these distance or clustering methods do not perform equally in grouping objects for all types of dataset. Even the performance of some of these combinations is very poor in some specific field of study. In this thesis ( chapter 3) we have selected more suitable HC methods ward.D or ward.D2 in combination with distance methods euclidean, manhattan or minkowski for clustering genes and DCCs of toxicogenomic data. Furthermore, in chapter 3 we have proposed an algorithm for co-clustering between genes and DCCs based HC approach. Though the selected HC clustering approaches together with the proposed co-clustering algorithm can co-cluster the genes and DCCs, these approaches are very sensitive to outlying observations. Therefore, we robustify the selected HC approaches in chapter 4. We observed that the proposed robust HC (RHC) approaches outperform over the classical HC approaches. The gene-DCCs co-clustering results based on RHC using the co-clustering algorithm (proposed in chapter 3) have been validated by the existing literature. Nonetheless, the classical clustering approaches (e.g. k-means, fuzzy, HC, etc.) including our proposed RHC use one-way (gene or DCCs) information for clustering/co­clustering. Thus, these clustering approaches is not flexible and effective for co­clustering genes and DCCs. On the other hand, probabilistic hidden variable model (PHVM) uses two-way (gene and DCCs) information simultaneously for co-clustering between genes and DCCs. Therefore, this approach is more effective and suitable for co­clustering. However, the PHVM approach is not robust against outliers. To overcome this limitation of PHVM in this research (chapter 5) we have proposed logistic PHVM called as LPHVM for robust co-clustering between genes and DCCs discover toxicogenomic biomarkers and their regulatory DCCs. We have observed that the proposed LPHVM approach perform better compare to PHVM and classical co-clustering approaches based on the results of the simulated and real data analysis. In order to answering toxicogenomic questions researchers are being suggested different suitable offline and online computational tools, since a single computational algorithm cannot always produce the answers of the toxicogenomic questions. This is because; large scale datasets have been generated by the complex toxicogenomic experiments. For example, the co-clustering approaches based on HC and hidden variable models sometimes cannot separate the significant up-regulatory (UpR) (the gene is up-regulated by the influence of the DCC) and down-regulatory (DnR) (the gene is down-regulated by the influence of the DCC) gene-DCC interactions from the equal-regulatory (EqR) (thereis no DCC effect on that gene) interactions within a co-cluster. On the other hand, separation of UpR and DnR gene-DCC interactions from the EqR interactions is the cornerstone in toxicogenomics studies as well as in drug discovery and development. Therefore, in chapter 6 we have proposed MRC and LMRC for the detection of significant gene-DCC interactions, so far of our knowledge which have not been considered yet in toxicogenomic studies as well as in drug discovery and development. However, LMRC produces robust results compare to MRC in presence of outlying observations in the data otherwise they perform equally. en_US
dc.language.iso en en_US
dc.publisher University of Rajshahi en_US
dc.relation.ispartofseries ;D4385
dc.subject Toxicogenomic en_US
dc.subject Exploring Biomarkers en_US
dc.subject Statistics en_US
dc.title Exploring Toxicogenomic Biomarkers using Statistical Models en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search DSpace


Browse

My Account