• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

H: Clustering of time-course gene expression data using a mixed-effects model with B-splines. Bioinformatics 2003 (0)

by Y Luan, Li
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 138
Next 10 →

A quantitative study of gene regulation involved in the immune response of Anopheline mosquitoes: An application of Bayesian hierarchical clustering of curves

by Nicholas A. Heard, Christopher C. Holmes, David A. Stephens - Journal of the American Statistical Association , 2006
"... Malaria represents one of the major worldwide challenges to public health. A recent breakthrough in the study of the disease follows the annotation of the genome of the malaria parasite Plasmodium falciparum and the mosquito vector 1 Anopheles. Of particular interest is the molecular biology underly ..."
Abstract - Cited by 62 (2 self) - Add to MetaCart
Malaria represents one of the major worldwide challenges to public health. A recent breakthrough in the study of the disease follows the annotation of the genome of the malaria parasite Plasmodium falciparum and the mosquito vector 1 Anopheles. Of particular interest is the molecular biology underlying the immune response system of Anopheles which actively fights against Plasmodium infection. This paper reports a statistical analysis of gene expression time profiles from mosquitoes which have been infected with a bacterial agent. Specifically, we introduce a Bayesian model-based hierarchical clustering algorithm for curve data to investigate mechanisms of regulation in the genes concerned; that is, we aim to cluster genes having similar expression profiles. Genes displaying similar, interesting profiles can then be highlighted for further investigation by the experimenter. We show how our approach reveals structure within the data not captured by other approaches. One of the most pertinent features of the data is the sample size, which records the expression levels of 2771 genes at six time points. Additionally, the time points are unequally spaced and there is expected non-stationary behaviour in the gene profiles. We demonstrate our approach to be readily implementable under these conditions, and highlight some crucial computational savings that can be made in the context of a fully Bayesian analysis.

NONPARAMETRIC FUNCTIONAL DATA ANALYSIS THROUGH BAYESIAN DENSITY ESTIMATION

by Abel Rodríguez, David B. Dunson, Alan E. Gelfand , 2007
"... In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivity-temperature-depth (CTD) data in oceanography, dose-response models in epidemiology and time-course microarray ..."
Abstract - Cited by 28 (6 self) - Add to MetaCart
In many modern experimental settings, observations are obtained in the form of functions, and interest focuses on inferences on a collection of such functions. Some examples are conductivity-temperature-depth (CTD) data in oceanography, dose-response models in epidemiology and time-course microarray experiments in biology and medicine. In this paper we propose a hierarchical model that allows us to simultaneously estimate multiple curves nonparametrically by using dependent Dirichlet Process mixtures of Gaussians to characterize the joint distribution of predictors and outcomes. Func-tion estimates are then induced through the conditional distribution of the outcome given the predic-tors. The resulting approach allows for flexible estimation and clustering, while borrowing information across curves. We also show that the function estimates we obtain are consistent on the space of inte-grable functions. As an illustration, we consider an application to the analysis of CTD data in the north Atlantic.

Variable Selection in Nonparametric Varying-coefficient Models for Analysis of Repeated Measurements

by Lifeng Wang, Hongzhe Li, Jianhua Z. Huang - Journal of the American Statistical Association , 2008
"... Nonparametric varying-coefficient models are commonly used for analyzing data measured repeatedly over time, including longitudinal and functional response data. Although many procedures have been developed for estimating varying coefficients, the problem of variable selection for such models has no ..."
Abstract - Cited by 27 (2 self) - Add to MetaCart
Nonparametric varying-coefficient models are commonly used for analyzing data measured repeatedly over time, including longitudinal and functional response data. Although many procedures have been developed for estimating varying coefficients, the problem of variable selection for such models has not been addressed to date. In this article we present a regularized estimation procedure for variable selection that combines basis function approximations and the smoothly clipped absolute deviation penalty. The proposed procedure simultaneously selects significant variables with time-varying effects and estimates the nonzero smooth coefficient functions. Under suitable conditions, we establish the theoretical properties of our procedure, including consistency in variable selection and the oracle property in estimation. Here the oracle property means that the asymptotic distribution of an estimated coefficient function is the same as that when it is known a priori which variables are in the model. The method is illustrated with simulations and two real data examples, one for identifying risk factors in the study of AIDS and one using microarray time-course gene expression data to identify the transcription factors related to the yeast cell-cycle process.

Clustering of Unevenly Sampled Gene Expression Time-Series Data

by C. S. Möller-Levet, F. Klawonn, K.-H. Cho, H. Yin, O. Wolkenhauer , 2003
"... Motivation: Time course measurements are becoming a common type of experiment in the use of microrarrays. Conventional clustering algorithms based on the Euclidean distance or the Pearson correlation coefficient are not able to include temporal information in the distance metric. The temporal order ..."
Abstract - Cited by 15 (2 self) - Add to MetaCart
Motivation: Time course measurements are becoming a common type of experiment in the use of microrarrays. Conventional clustering algorithms based on the Euclidean distance or the Pearson correlation coefficient are not able to include temporal information in the distance metric. The temporal order of the data and the varying length of sampling intervals are important and should be considered in clustering time-series. However, the shortness of gene expression time-series data limits the use of conventional statistical models and techniques for time-series analysis. To address this problem, this paper proposes the Fuzzy Short Time-Series (FSTS) clustering algorithm, which is able to cluster profiles based on the similarity of their relative change of expression level and the corresponding temporal information. One of the major advantages of fuzzy clustering is that genes can belong to more than one group, revealing distinctive features of each gene's function and regulation. Results:
(Show Context)

Citation Context

...ce, the authors use one of the largest data sets available. Although cubic splines are used commonly they are not suitable for the short gene expression time-series (de Hoon et al., 2002). Later, in (=-=Luan and Li, 2003), time-co-=-urse gene expression data is clustered using a mixed-e#ects model with B-splines. The authors utilized "long" gene expression time-series (12 and 18 time points) and four equally spaced knot...

A linear time biclustering algorithm for time series gene expression data

by Sara C. Madeira, Arlindo L. Oliveira - In Proc. of 5th Workshop on Algorithms in Bioinformatics , 2005
"... Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a nonsupervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to b ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
Several non-supervised machine learning methods have been used in the analysis of gene expression data obtained from microarray experiments. Recently, biclustering, a nonsupervised approach that performs simultaneous clustering on the row and column dimensions of the data matrix, has been shown to be remarkably effective in a variety of applications. The goal of biclustering is to find subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated behaviors. In the most common settings, biclustering is an NP-complete problem, and heuristic approaches are used to obtain sub-optimal solutions using reasonable computational resources. In this work, we examine a particular setting of the problem, where we are concerned with finding biclusters in time series expression data. In this setting, we are interested in finding biclusters where the columns are consecutive in time. For this particular version of the problem, we propose an algorithm that finds and reports all relevant biclusters in time linear on the size of the data matrix. This impressive reduction in complexity is obtained by manipulating a discretized version of the data matrix and by using advanced string manipulation techniques based on suffix trees. We report results in both synthetic and real world datasets that show the effectiveness of the approach. 1
(Show Context)

Citation Context

...llowed. Time series expression data are often used to study dynamic biological systems and gene regulatory networks since their analysis can potentially provide more insights about biological systems =-=[8]-=-. In this context, the identification of biological processes that lead to the creation of biclusters, together with their relationship, is crucial for the identification of gene regulatory networks a...

Clustering functional data using wavelets

by Anestis Antoniadis, Xavier Brossat, Jairo Cugliari - In e-book of COMPSTAT , 2010
"... ar ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
Abstract not found

Time-synchronized Clustering of Gene Expression Trajectories

by Rong Tang
"... Current clustering methods are routinely applied to gene expression time course data to find genes with similar activation patterns and ultimately to understand the dynamics of biological processes. As the dynamic unfolding of a biological process often involves the activation of genes at different ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
Current clustering methods are routinely applied to gene expression time course data to find genes with similar activation patterns and ultimately to understand the dynamics of biological processes. As the dynamic unfolding of a biological process often involves the activation of genes at different rates, successful clustering in this context requires dealing with varying time and shape patterns simultaneously. This motivates the combination of a novel pairwise warping with a suitable clustering method to discover expression shape clusters. We develop a novel clustering method that combines an initial pairwise curve alignment to adjust for time variation within likely clusters. The cluster-specific time synchronization method shows excellent performance over standard clustering methods in terms of cluster quality measures in simulations and for yeast and human fibroblast datasets. In the yeast example, the discovered clusters have high concordance with the known biological processes. Key Words: Clustering; Gene expression analysis; Microarray; Time warping.

Y (2011) Principal component analysis based methods in bioinformatics studies. Brief Bioinform 12: 714–722

by Shuangge Ma, Ying Dai
"... In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements.Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
In analysis of bioinformatics data, a unique challenge arises from the high dimensionality of measurements.Without loss of generality, we use genomic study with gene expression measurements as a representative example but note that analysis techniques discussed in this article are also applicable to other types of bioinformatics studies. Principal component analysis (PCA) is a classic dimension reduction approach. It constructs linear combinations of gene expressions, called principal components (PCs). The PCs are orthogonal to each other, can effectively explain variation of gene expressions, and may have a much lower dimensionality. PCA is computationally simple and can be realized using many existing software packages. This article consists of the following parts. First, we review the standard PCA technique and their applications in bioinformatics data analysis. Second, we describe recent ‘non-stan-dard ’ applications of PCA, including accommodating interactions among genes, pathways and network modules and conducting PCAwith estimating equations as opposed to gene expressions.Third, we introduce several recently proposed PCA-based techniques, including the supervised PCA, sparse PCA and functional PCA. The supervised PCA and sparse PCA have been shown to have better empirical performance than the standard PCA. The func-tional PCA can analyze time-course gene expression data. Last, we raise the awareness of several critical but unsolved problems related to PCA. The goal of this article is to make bioinformatics researchers aware of the PCA technique and more importantly its most recent development, so that this simple yet effective dimension reduction technique can be better employed in bioinformatics data analysis.
(Show Context)

Citation Context

...nly measured at one time point. Another type of studies measure gene expressions consecutively at multiple time points. The data so obtained has been referred to as ‘time-course gene expression data’ =-=[24]-=-. For a specific gene, its expression can no longer be described using a single number Xj. Rather, a function of time XjðtÞ is needed, where t denotes time. Consider for example the first PC. With sna...

Mixtures of regression models for time-course gene expression data: evaluation of initialization and random effects

by Theresa Scharl, Bettina Grün, Friedrich Leisch, Theresa Scharl, Friedrich Leisch - Bioinformatics , 2010
"... This is a preprint of an article that has been accepted for publication in Bioinformatics on 07-Dec-2009. Please use the journal version for citation. Summary: Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice o ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
This is a preprint of an article that has been accepted for publication in Bioinformatics on 07-Dec-2009. Please use the journal version for citation. Summary: Finite mixture models are routinely applied to time course microarray data. Due to the complexity and size of this type of data the choice of good starting values plays an important role. So far initialization strategies have only been investigated for data from a mixture of multivariate normal distributions. In this work several initialization procedures are evaluated for mixtures of regression models with and without random effects in an extensive simulation study on different artificial datasets. Finally these procedures are also applied to a real dataset from E. coli. Availability: The latest release versions of R packages flexmix, gcExplorer and kernlab are always available from CRAN
(Show Context)

Citation Context

...mixtures of mixed-effects models to treat the gene expression level as a continuous function of time without requiring the specification of the functional relationship. These are used with B-splines (=-=Luan and Li, 2003-=-; Bar-Joseph et al., 2003) and smoothing splines (Ma et al., 2006). For smoothing splines the degree of smoothness is chosen automatically by cross-validation. In this study smoothing splines and B-sp...

BIOINFORMATICS ORIGINAL PAPER

by Gene Expression, De-shuang Huang, Chun-hou Zheng
"... doi:10.1093/bioinformatics/btl190 Independent component analysis-based penalized discriminant method for tumor classification using gene expression data ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
doi:10.1093/bioinformatics/btl190 Independent component analysis-based penalized discriminant method for tumor classification using gene expression data
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University