Results 1 -
6 of
6
Multiscale Binarization of Gene Expression Data for Reconstructing Boolean Networks
- IEEE/ACM transactions on computational biology and bioinformatics
, 2011
"... Abstract—Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract—Network inference algorithms can assist life scientists in unraveling gene-regulatory systems on a molecular level. In recent years, great attention has been drawn to the reconstruction of Boolean networks from time series. These need to be binarized, as such networks model genes as binary variables (either “expressed ” or “not expressed”). Common binarization methods often cluster measurements or separate them according to statistical or information theoretic characteristics and may require many data points to determine a robust threshold. Yet, time series measurements frequently comprise only a small number of samples. To overcome this limitation, we propose a binarization that incorporates measurements at multiple resolutions. We introduce two such binarization approaches which determine thresholds based on limited numbers of samples and additionally provide a measure of threshold validity. Thus, network reconstruction and further analysis can be restricted to genes with meaningful thresholds. This reduces the complexity of network inference. The performance of our binarization algorithms was evaluated in network reconstruction experiments using artificial data as well as real-world yeast expression time series. The new approaches yield considerably improved correct network identification rates compared to other binarization techniques by effectively reducing the amount of candidate networks. Index Terms—Binarization, gene-regulatory networks, Boolean networks, reconstruction. Ç
Gene clustering methods for time series microarray data
, 2010
"... The development of advanced microarray technology over the past two decades constitutes a revolution in genomics. Today, microarrays can measure expression ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
The development of advanced microarray technology over the past two decades constitutes a revolution in genomics. Today, microarrays can measure expression
Molecular Cell Article Genome-wide Measurement of RNA Folding Energies
"... RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures in ..."
Abstract
- Add to MetaCart
(Show Context)
RNA structural transitions are important in the function and regulation of RNAs. Here, we reveal a layer of transcriptome organization in the form of RNA folding energies. By probing yeast RNA structures at different temperatures, we obtained relative melting temperatures (Tm) for RNA structures in over 4000 transcripts. Specific signatures of RNA Tm demarcated the polarity of mRNA open reading frames and highlighted numerous candidate regulatory RNA motifs in 3 0 untranslated regions. RNA Tm distinguished noncoding versus coding RNAs and identified mRNAs with distinct cellular functions. We identified thousands of putative RNA thermometers, and their presence is predictive of the pattern of RNA decay in vivo during heat shock. The exosome complex recognizes unpaired bases during heat shock to degrade these RNAs, coupling intrinsic structural stabilities to gene regulation. Thus, genome-wide structural dynamics of RNA can parse functional elements of the transcriptome and reveal diverse biological insights.
Examining Committee:
, 2009
"... Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One ..."
Abstract
- Add to MetaCart
(Show Context)
Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One such extension is the idea of finding clusters in a dataset that preserve information about some auxiliary variable. These approaches tend to guide the clustering algorithms that are traditionally unsupervised learning techniques with the background knowledge of the auxiliary variable. The auxiliary information could be some prior class label attached to the data samples or it could be the relations between data samples across different datasets. In this dissertation, we consider the latter problem of simultaneously clustering several vector valued datasets by taking into account the relationships between the data samples. We formulate objective functions that can be used to find clusters that are local in each individual dataset and at the same time maximally similar or dissimilar with respect to clusters across datasets. We introduce diverse applications of these clustering algorithms: (1) time series segmentation (2) reconstructing temporal models from time series segmentations (3) simultaneously
REVIEW Assessing the Evolution of Gene Expression Using Microarray Data
"... Abstract: Classical studies of the evolution of gene function have predominantly focused on mutations within protein coding regions. With the advent of microarrays, however, it has become possible to evaluate the transcriptional activity of a gene as an additional characteristic of function. Recent ..."
Abstract
- Add to MetaCart
Abstract: Classical studies of the evolution of gene function have predominantly focused on mutations within protein coding regions. With the advent of microarrays, however, it has become possible to evaluate the transcriptional activity of a gene as an additional characteristic of function. Recent studies have revealed an equally important role for gene regulation in the retention and evolution of duplicate genes. Here we review approaches to assessing the evolution of gene expression using microarray data, and discuss potential influences on expression divergence. Currently, there are no established standards on how best to identify and quantify instances of expression divergence. There have also been few efforts to date that incorporate suspected influences into mathematical models of expression divergence. Such developments will be crucial to a comprehensive understanding of the role gene duplications and expression evolution play in the emergence of complex traits and functional diversity. An integrative approach to gene family evolution, including both orthologous and paralogous genes, has the potential to bring strong predictive power both to the functional annotation of extant proteins and to the inference of functional characteristics of ancestral gene family members.
RESEARCH Open Access Identification of thresholds for dichotomizing
"... DNA methylation plays an important role in many biological processes by regulating gene expression. It is commonly accepted that turning on the DNA methylation leads to silencing of the expression of the corresponding genes. While methylation is often described as a binary on-off signal, it is typic ..."
Abstract
- Add to MetaCart
(Show Context)
DNA methylation plays an important role in many biological processes by regulating gene expression. It is commonly accepted that turning on the DNA methylation leads to silencing of the expression of the corresponding genes. While methylation is often described as a binary on-off signal, it is typically measured using beta values derived from either microarray or sequencing technologies, which takes continuous values between 0 and 1. If we would like to interpret methylation in a binary fashion, appropriate thresholds are needed to dichotomize the continuous measurements. In this paper, we use data from The Cancer Genome Atlas project. For a total of 992 samples across five cancer types, both methylation and gene expression data are available. A bivariate extension of the StepMiner algorithm is used to identify thresholds for dichotomizing both methylation and expression data. Hypergeometric test is applied to identify CpG sites whose methylation status is significantly associated to silencing of the expression of their corresponding genes. The test is performed on either all five cancer types together or individual cancer types separately. We notice that the appropriate thresholds vary across different CpG sites. In addition, the negative association between methylation and expression is highly tissue specific.