Results 1 - 10
of
89
Biclustering algorithms for biological data analysis: a survey.
- IEEE/ACM Transactions of Computational Biology and Bioinformatics,
, 2004
"... Abstract A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results of the application of standard clustering methods to genes are limited. These limited results are imposed by the existence of a num ..."
Abstract
-
Cited by 481 (15 self)
- Add to MetaCart
(Show Context)
Abstract A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results of the application of standard clustering methods to genes are limited. These limited results are imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the gene expression matrix has been proposed to date. This simultaneous clustering, usually designated by biclustering, seeks to find sub-matrices, that is subgroups of genes and subgroups of columns, where the genes exhibit highly correlated activities for every condition. This type of algorithms has also been proposed and used in other fields, such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search and the target applications.
Discovering Statistically Significant Biclusters in Gene Expression Data
- In Proceedings of ISMB 2002
, 2002
"... In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under p ..."
Abstract
-
Cited by 302 (4 self)
- Add to MetaCart
In gene expression data, a bicluster is a subset of the genes exhibiting consistent patterns over a subset of the conditions. We propose a new method to detect significant biclusters in large expression datasets. Our approach is graph theoretic coupled with statistical modelling of the data. Under plausible assumptions, our algorithm is polynomial and is guaranteed to find the most significant biclusters. We tested our method on a collection of yeast expression profiles and on a human cancer dataset. Cross validation results show high specificity in assigning function to genes based on their biclusters, and we are able to annotate in this way 196 uncharacterized yeast genes. We also demonstrate how the biclusters lead to detecting new concrete biological associations. In cancer data we are able to detect and relate finer tissue types than was previously possible. We also show that the method outperforms the biclustering algorithm of Cheng and Church (2000).
From patterns to pathways: gene expression data analysis comes of age.
- Nature Genetics
, 2002
"... ..."
Genome-wide discovery of transcriptional modules from DNA sequence and gene expression
- Bioinformatics
, 2003
"... In this paper, we describe an approach for understanding
transcriptional regulation from both gene expression and
promoter sequence data. We aim to identify transcriptional
modules—sets of genes that are co-regulated in a set
of experiments, through a common motif profile. Using
the EM algorithm, o ..."
Abstract
-
Cited by 122 (7 self)
- Add to MetaCart
In this paper, we describe an approach for understanding
transcriptional regulation from both gene expression and
promoter sequence data. We aim to identify transcriptional
modules—sets of genes that are co-regulated in a set
of experiments, through a common motif profile. Using
the EM algorithm, our approach refines both the module
assignment and the motif profile so as to best explain
the expression data as a function of transcriptional motifs.
It also dynamically adds and deletes motifs, as required
to provide a genome-wide explanation of the expression
data. We evaluate the method on two Saccharomyces
cerevisiae gene expression data sets, showing that our
approach is better than a standard one at recovering
known motifs and at generating biologically coherent
modules. We also combine our results with binding
localization data to obtain regulatory relationships with
known transcription factors, and show that many of the
inferred relationships have support in the literature.
Enhanced Biclustering on Expression Data
- Proc. of 3rd IEEE Symposium on BioInformatics and BioEngineering (BIBE’03
, 2003
"... molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of g ..."
Abstract
-
Cited by 87 (0 self)
- Add to MetaCart
molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.
Context-Specific Bayesian Clustering for Gene Expression Data
, 2002
"... The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors. ..."
Abstract
-
Cited by 64 (4 self)
- Add to MetaCart
The recent growth in genomic data and measurements of genome-wide expression patterns allows us to apply computational tools to examine gene regulation by transcription factors.
Biclustering microarray data by Gibbs sampling
- Bioinformatics
, 2003
"... Motivation: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data. ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
(Show Context)
Motivation: Gibbs sampling has become a method of choice for the discovery of noisy patterns, known as motifs, in DNA and protein sequences. Because handling noise in microarray data presents similar challenges, we have adapted this strategy to the biclustering of discretized microarray data.
Physical network models and . . .
, 2002
"... We develop a new framework for inferring models of transcriptional regulation. The models in this approach, which we call physical models, are constructed on the basis of verifiable molecular properties of the underlying biological system. The properties include, for example, the existence of protei ..."
Abstract
-
Cited by 48 (4 self)
- Add to MetaCart
We develop a new framework for inferring models of transcriptional regulation. The models in this approach, which we call physical models, are constructed on the basis of verifiable molecular properties of the underlying biological system. The properties include, for example, the existence of protein-protein interactions and whether a DNA binding protein binds to a regulatory region of a specific gene. Each property – either implicated or hypothesized – is included as a variable in the model. The setting of all the variables defines an annotated graph representing the molecular interactions that are present. Some of the available data sources such as factor-binding data (location data) involve measurements that are directly tied to the variables in the model. Other data sources such as gene knock-outs are functional in nature and provide only indirect evidence about the (physical) variables. We solve this data association problem by linking each knock-out effect with a set of causal paths (molecular cascades) that could in principle explain the effect. The optimal setting of all the variables (including the selection of cascades) is found approximately via the max-product algorithm operating on a factor graph. We demonstrate that this approach is capable of predicting gene knock-out effects with high degree of accuracy in a cross-validation setting. Moreover, the approach implicates likely molecular cascades responsible for each observed knock-out effect. We can easily extend the approach to include other data sources (solve the corresponding data association problems). This includes, for example, time course expression profiles. We also discuss generalizations to represent coordinated regulation and the use of automated experiment design.
A Statistical Framework for Expression-Based Molecular Classification in Cancer
, 2002
"... this paper, our aim is to provide a framework to support this tree-faceted enterprise. We propose a probabilistic definition of differential expression in the context of unsupervised classification, and we use it to define molecular profiles, and to assess quantities of potential use in classificati ..."
Abstract
-
Cited by 47 (4 self)
- Add to MetaCart
this paper, our aim is to provide a framework to support this tree-faceted enterprise. We propose a probabilistic definition of differential expression in the context of unsupervised classification, and we use it to define molecular profiles, and to assess quantities of potential use in classification, such as the probability that a tumour belongs to a given profile and the probability that two tumours have the same profile. Our long-term goals are (a) to provide tools that will facilitate the use of prior knowledge about gene function in the screening process, in an interactive way, to improve the interpretation and clinical validation of the classification that will ultimately emerge from the analysis, and (b) to capture the potentially categorical nature of differential gene expression, by using latent categorical data that can be interpreted as a gene being turned `on' or `off ' compared with normal expression