• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

An empirical bayes approach to inferring largescale gene association networks (2005)

by J Schafer, K Strimmer
Venue:Bioinformatics
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 237
Next 10 →

Single-trial analysis and classification of ERP components -- a tutorial

by Benjamin Blankertz, Steven Lemm , Matthias Treder , Stefan Haufe , Klaus-robert Müller , 2010
"... Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trial-to-trial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehen ..."
Abstract - Cited by 74 (13 self) - Add to MetaCart
Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trial-to-trial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehensive framework for decoding ERPs, elaborating on linear concepts, namely spatio-temporal patterns and filters as well as linear ERP classification. However, the bottleneck of these techniques is that they require an accurate covariance matrix estimation in high dimensional sensor spaces which is a highly intricate problem. As a remedy, we propose to use shrinkage estimators and show that appropriate regularization of linear discriminant analysis (LDA) by shrinkage yields excellent results for single-trial ERP classification that are far superior to classical LDA classification. Furthermore, we give practical hints on the interpretation of what classifiers learned from the data and demonstrate in particular that the trade-off between goodness-of-fit and model complexity in regularized LDA relates to a morphing between a difference pattern of ERPs and a spatial filter which cancels non task-related brain activity.

A robust procedure for gaussian graphical model search from microarray data with p larger than n

by Robert Castelo, Alberto Roverato, Max Chickering - Journal of Machine Learning Research , 2006
"... Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract - Cited by 46 (6 self) - Add to MetaCart
Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.

Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks

by Jean Hausser, Bioinformatics Biozentrum, Korbinian Strimmer, Xiaotong Shen
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract - Cited by 44 (2 self) - Add to MetaCart
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, James-Stein estimator, “small n, large p ” setting, mutual information, gene association network
(Show Context)

Citation Context

...mong many others, methods have been proposed to enable the inference of large-scale correlation networks (Butte et al., 2000) and of high-dimensional partial 8correlation graphs (Dobra et al., 2004; =-=Schäfer and Strimmer, 2005-=-a; Meinshausen and Bühlmann, 2006), for learning vector-autoregressive (Opgen-Rhein and Strimmer, 2007a) and state space models (Rangel et al., 2004; Lähdesmäki and Shmulevich, 2008), and to reconstru...

Efficient markov network structure discovery using independence tests

by Facundo Bromberg, Dimitris Margaritis, Vasant Honavar - In Proc SIAM Data Mining , 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract - Cited by 32 (5 self) - Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the Grow-Shrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s well-known properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition

Multiple testing and error control in Gaussian graphical model selection

by Mathias Drton, Michael D. Perlman - Statistical Science
"... Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of cond ..."
Abstract - Cited by 28 (4 self) - Add to MetaCart
Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of conditional independences that is imposed on the variables ’ joint distribution. Focusing on Gaussian models, we review classical graphical models. For these models the defining conditional independences are equivalent to vanishing of certain (partial) correlation coefficients associated with individual edges that are absent from the graph. Hence, Gaussian graphical model selection can be performed by multiple testing of hypotheses about vanishing (partial) correlation coefficients. We show and exemplify how this approach allows one to perform model selection while controlling error rates for incorrect edge inclusion. Key words and phrases: Acyclic directed graph, Bayesian network, bidirected graph, chain graph, concentration graph, covariance graph, DAG, graphical model, multiple testing, undirected graph. 1.

Reverse engineering molecular regulatory networks from microarray data with qp-graphs

by Robert Castelo, Alberto Roverato - J Comput Biol , 2009
"... Reverse engineering bioinformatic procedures applied to high-throughput experimental data have become instrumental to generate new hypotheses about molecular regulatory mechanisms. This has been particularly the case for gene expression microarray data where a large number of statistical and computa ..."
Abstract - Cited by 25 (5 self) - Add to MetaCart
Reverse engineering bioinformatic procedures applied to high-throughput experimental data have become instrumental to generate new hypotheses about molecular regulatory mechanisms. This has been particularly the case for gene expression microarray data where a large number of statistical and computational methodologies have been developed in order to assist in building network models of transcriptional regulation. A major challenge faced by every different procedure is that the number of available samples n for estimating the network model is much smaller than the number of genes p forming the system under study. This compromises many of the assumptions on which the statistics of the methods rely, often leading to unstable performance figures. In this work we apply a recently developed novel methodology based in the so-called q-order lim-ited partial correlation graphs, qp-graphs, which is specifically tailored towards molecular network discovery from microarray expression data with p n. Using experimental and functional annotation data from Escherichia coli here we show how qp-graphs yield more stable performance figures than other state-of-the-art methods

Inferring gene dependency networks from genomic longitudinal data: a functional data approach

by Rainer Opgen-rhein, Korbinian Strimmer , 2006
"... A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairw ..."
Abstract - Cited by 24 (2 self) - Add to MetaCart
A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairwise correlation that takes account of the functional nature of the observed data. This allows to extend the graphical Gaussian modeling framework from i.i.d. data to analyze longitudinal genomic data. The new method is illustrated by analyzing highly replicated data from a genome experiment concerning the expression response of human T-cells to PMA and ionomicin treatment.

Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data

by Paul D W Kirk, Michael P H Stumpf, Associate Editor:dr. Limsoon Wong - Bioinformatics
"... Motivation: Although widely accepted that high throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify the ..."
Abstract - Cited by 23 (6 self) - Add to MetaCart
Motivation: Although widely accepted that high throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here we present a parametric bootstrapping approach for time course data, in which Gaussian process regression is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time-dependence of the data to be taken into account, and is applicable to a wide range of problems. Results: We apply Gaussian process regression bootstrapping to two data sets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ODE model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original data set are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from time-course gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data.
(Show Context)

Citation Context

...nfer GGMs from time-course gene expression data (according to Opgen-Rhein and Strimmer, 2007), and make use of the package’s capability to calculate an empirical posterior probability, pe(g1,g2) (see =-=Schäfer and Strimmer, 2005-=-), for the existence of the edge between g1 and g2. If pe(g1,g2) > τ, where τ is some pre-specified threshold (cutoff) value for the probability, then an edge is drawn between g1 and g2. 3.2.3 Bootstr...

Inferring dynamic genetic networks with low order independencies

by Sophie Lèbre , 2008
"... ..."
Abstract - Cited by 22 (0 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

... gene expression data, we have to cope with the opposite situation (n << p). Thus, the growing interest for ’small n, large p’ furthered the development of numerous alternatives (Schäfer and Strimmer =-=[28]-=- [29] , Waddell and Kishino [40] [39], Toh and Horimoto [37] [38], Wu et al. [46], Wang et al. [41]). Even though concentration graphs allow to point out some dependence relationships between genes, t...

High throughput screening of co-expressed gene pairs with controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS

by Dongxiao Zhu, Alfred O Hero, Zhaohui S Qin, Anand Swaroop - J Comput Biol , 2005
"... Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene co-expression. Traditional screening of pairwise co-expression either controls biological significance or statistical significance, but not both. The former approach does no ..."
Abstract - Cited by 22 (9 self) - Add to MetaCart
Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene co-expression. Traditional screening of pairwise co-expression either controls biological significance or statistical significance, but not both. The former approach does not provide stochastic error control, and the later approach screens many co-expressions with excessively low correlation. We have designed and imple-mented a statistically sound two-stage co-expression detection algorithm that controls both statistical significance (false discovery rate, FDR) and biological significance (minimum ac-ceptable strength, MAS) of the discovered co-expressions. Based on estimation of pairwise gene correlation, the algorithm provides an initial co-expression discovery that controls only FDR, which is then followed by a second stage co-expression discovery which controls both FDR and MAS. It also computes and thresholds the set of FDR p-values for each corre-lation that satisfied the MAS criterion. Using simulated data, we validated asymptotic null distributions of the Pearson and Kendall correlation coefficients and the two-stage error-control procedure; we also compared our two-stage test procedure with another two-stage test procedure using the receiver operating characteristic (ROC) curve. We then used yeast galactose metabolism data to illustrate the advantage of our method for clustering genes and constructing a relevance network. The method has been implemented in an R package “GeneNT ” that is freely available from the Comprehensive R Archive Network (CRAN): www.cran.r-project.org/. Key words: false discovery rate confidence interval, relevance network, error control, gene pathway.
(Show Context)

Citation Context

...tial correlation coefficients are used as the dependency measures (Whittaker, 1990). Different groups have developed approaches to infer GGM from small sample size microarray data (Wang et al., 2003; =-=Schafer and Strimmer, 2004-=-; Dobra et al., 2004). Schafer and Strimmer recently presented a procedure that is based on the bootstrap estimator of the partial correlation coefficient (Schafer and Strimmer, 2004). Most of the pai...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2016 The Pennsylvania State University