Results 1 - 10
of
22
Giannakis, “Nonparametric basis pursuit via sparse kernel-based learning
- IEEE Signal Process. Mag
, 2013
"... ar ..."
(Show Context)
SIMULTANEOUSLY SEGMENTING MULTIPLE GENE EXPRESSION TIME COURSES BY ANALYZING CLUSTER DYNAMICS
"... We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and re-arrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
We present a new approach to segmenting multiple time series by analyzing the dynamics of cluster formation and re-arrangement around putative segment boundaries. This approach finds application in distilling large numbers of gene expression profiles into temporal relationships underlying biological processes. By directly minimizing information-theoretic measures of segmentation quality derived from Kullback-Leibler (KL) divergences, our formulation reveals clusters of genes along with a segmentation such that clusters show concerted behavior within segments but exhibit significant re-grouping across segmentation boundaries. The results of the segmentation algorithm can be summarized as Gantt charts revealing temporal dependencies in the ordering of key biological processes. Applications to the yeast metabolic cycle and the yeast cell cycle are described. 1.
EXPLORATORY MODELING OF YEAST STRESS RESPONSE AND ITS REGULATION WITH gCCA AND ASSOCIATIVE CLUSTERING
"... We model dependencies between m multivariate continuous-valued information sources by a combination of (i) a generalized canonical correlations analysis (gCCA) to reduce dimensionality while preserving dependencies in m − 1 of them, and (ii) summarizing dependencies with the remaining one by associa ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We model dependencies between m multivariate continuous-valued information sources by a combination of (i) a generalized canonical correlations analysis (gCCA) to reduce dimensionality while preserving dependencies in m − 1 of them, and (ii) summarizing dependencies with the remaining one by associative clustering. This new combination of methods avoids multiway associative clustering which would require a multiway contingency table and hence suffer from curse of dimensionality of the table. The method is applied to summarizing properties of yeast stress by searching for dependencies (commonalities) between expression of genes of baker’s yeast Saccharomyces cerevisiae in various stressful treatments, and summarizing stress regulation by finally adding data about transcription factor binding sites.
CALL FOR PAPERS Computational Modeling of Physiological Systems Deletion of a subgroup of ribosome-related genes minimizes hypoxia-induced changes and
, 2011
"... confers hypoxia tolerance ..."
(Show Context)
BIOINFORMATICS Effective similarity measures for expression profiles
"... It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear apriori what is the most effective one. Moreover, so far no clear distinction has been made as for the ..."
Abstract
- Add to MetaCart
It is commonly accepted that genes with similar expression profiles are functionally related. However, there are many ways one can measure the similarity of expression profiles, and it is not clear apriori what is the most effective one. Moreover, so far no clear distinction has been made as for the type of the functional link between genes as suggested by microarray data. Similarly expressed genes can be part of the same complex as interacting partners; they can participate in the same pathway without interacting directly; they can perform similar functions; or they can simply have similar regulatory sequences. Here we conduct a study of the notion of functional link as implied from expression data. We analyze different similarity measures of gene expression profiles and assess their usefulness and robustness in detecting biological relationships by comparing the similarity scores with results obtained from databases of interacting proteins, promoter signals, and cellular pathways, as well as through sequence comparisons. We also introduce variations on similarity measures that are based on statistical analysis and better discriminate genes which are functionally nearby and faraway. Our tools can be used to assess other similarity measures for expression profiles, and are accessible at biozon.org/tools/expression/. 1
Deriving Kripke Structures from Time Series Segmentation Results
"... Abstract — Kripke structures are important modeling formalisms to understand the behavior of reactive systems. We present an approach to automatically infer Kripke structures from time series datasets. Our algorithm bridges the continuous world of time profiles and the discrete symbols of Kripke str ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — Kripke structures are important modeling formalisms to understand the behavior of reactive systems. We present an approach to automatically infer Kripke structures from time series datasets. Our algorithm bridges the continuous world of time profiles and the discrete symbols of Kripke structures by incorporating a segmentation algorithm as an intermediate step. This approach identifies, in an unsupervised manner, the states of the Kripke structure, the transition relation, and the properties (propositions) that hold true in each state. We demonstrate experimental results of our approach to understanding the interplay between key biological processes. I.
Systematic Evaluation of Scaling Methods for Gene Expression Data
"... Even after an experimentally prepared gene expression data set has been pre-processed to account for variations in the microarray technology, there may be inconsistencies between the scales of measurements in different conditions. This may happen for reasons such as the accumulation of gene expressi ..."
Abstract
- Add to MetaCart
(Show Context)
Even after an experimentally prepared gene expression data set has been pre-processed to account for variations in the microarray technology, there may be inconsistencies between the scales of measurements in different conditions. This may happen for reasons such as the accumulation of gene expression data prepared by different laboratories into a single data set. A variety of scaling and transformation methods have been used for addressing these scale inconsistencies in different studies on the analysis of gene expression data sets. However, a quantitative estimation of their relative performance has been lacking. In this paper, we report an extensive evaluation of scaling and transformation methods for their effectiveness with respect to the important problem of protein function prediction. We consider several such commonly used methods for gene expression data, such as z-score scaling, quantile normalization, diff transformation, and two new scaling methods, sigmoid and double sigmoid, that have not been used previously in this domain to the best of our knowledge. We show that the performance of these methods can vary significantly across data sets, but Dsigmoid scaling and z-score transformation generally perform well for the two types of gene expression data, namely temporal and non-temporal, respectively. 1
Examining Committee:
, 2009
"... Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One ..."
Abstract
- Add to MetaCart
(Show Context)
Data mining techniques, such as clustering, have become a mainstay in many applications such as bioinformatics, geographic information systems, and marketing. Over the last decade, due to new demands posed by these applications, clustering techniques have been significantly adapted and extended. One such extension is the idea of finding clusters in a dataset that preserve information about some auxiliary variable. These approaches tend to guide the clustering algorithms that are traditionally unsupervised learning techniques with the background knowledge of the auxiliary variable. The auxiliary information could be some prior class label attached to the data samples or it could be the relations between data samples across different datasets. In this dissertation, we consider the latter problem of simultaneously clustering several vector valued datasets by taking into account the relationships between the data samples. We formulate objective functions that can be used to find clusters that are local in each individual dataset and at the same time maximally similar or dissimilar with respect to clusters across datasets. We introduce diverse applications of these clustering algorithms: (1) time series segmentation (2) reconstructing temporal models from time series segmentations (3) simultaneously
Exploring the Yeast Genome with Generalized Singular Value Decomposition
"... June 2008I hereby declare that I am the sole author of this thesis. I authorize Princeton University to lend this thesis to other institutions or individuals for the purpose of scholarly research. Andrew Ferguson I further authorize Princeton University to reproduce this thesis by photocopying or by ..."
Abstract
- Add to MetaCart
June 2008I hereby declare that I am the sole author of this thesis. I authorize Princeton University to lend this thesis to other institutions or individuals for the purpose of scholarly research. Andrew Ferguson I further authorize Princeton University to reproduce this thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research. The method of generalized singular value decomposition (GSVD) is used to identify the principal components separating and shared between DNA microarray time courses of the yeast Saccharomyces cerevisiae under two different experimental conditions. In the first analysis, a comparison is performed between the yeast stress response to hydrogen peroxide (H2O2) and the stress response to the drug menadione (MD). The analysis confirms the similarity between the responses on a genome-wide scale. Furthermore, GSVD is shown to successfully cluster the genes involved in the