Results 1  10
of
230
Singletrial analysis and classification of ERP components  a tutorial
, 2010
"... Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trialtotrial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehen ..."
Abstract

Cited by 74 (13 self)
 Add to MetaCart
Analyzing brain states that correspond to event related potentials (ERPs) on a single trial basis is a hard problem due to the high trialtotrial variability and the unfavorable ratio between signal (ERP) and noise (artifacts and neural background activity). In this tutorial, we provide a comprehensive framework for decoding ERPs, elaborating on linear concepts, namely spatiotemporal patterns and filters as well as linear ERP classification. However, the bottleneck of these techniques is that they require an accurate covariance matrix estimation in high dimensional sensor spaces which is a highly intricate problem. As a remedy, we propose to use shrinkage estimators and show that appropriate regularization of linear discriminant analysis (LDA) by shrinkage yields excellent results for singletrial ERP classification that are far superior to classical LDA classification. Furthermore, we give practical hints on the interpretation of what classifiers learned from the data and demonstrate in particular that the tradeoff between goodnessoffit and model complexity in regularized LDA relates to a morphing between a difference pattern of ERPs and a spatial filter which cancels non taskrelated brain activity.
A robust procedure for gaussian graphical model search from microarray data with p larger than n
 Journal of Machine Learning Research
, 2006
"... Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a conse ..."
Abstract

Cited by 45 (6 self)
 Add to MetaCart
Learning of largescale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are fullorder partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of fullorder partial correlations does not exist. In this paper we consider limitedorder partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limitedorder partial correlations, that we call the nonrejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data.
Entropy Inference and the JamesStein Estimator, with Application to Nonlinear Gene Association Networks
"... We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly effic ..."
Abstract

Cited by 40 (1 self)
 Add to MetaCart
(Show Context)
We present a procedure for effective estimation of entropy and mutual information from smallsample data, and apply it to the problem of inferring highdimensional gene association networks. Specifically, we develop a JamesSteintype shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and datagenerating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropybased geneassociation network from gene expression data. A computer program is available that implements the proposed shrinkage estimator. Keywords: entropy, shrinkage estimation, JamesStein estimator, “small n, large p ” setting, mutual information, gene association network
Efficient markov network structure discovery using independence tests
 In Proc SIAM Data Mining
, 2006
"... We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GS ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
We present two algorithms for learning the structure of a Markov network from discrete data: GSMN and GSIMN. Both algorithms use statistical conditional independence tests on data to infer the structure by successively constraining the set of structures consistent with the results of these tests. GSMN is a natural adaptation of the GrowShrink algorithm of Margaritis and Thrun for learning the structure of Bayesian networks. GSIMN extends GSMN by additionally exploiting Pearl’s wellknown properties of conditional independence relations to infer novel independencies from known independencies, thus avoiding the need to perform these tests. Experiments on artificial and real data sets show GSIMN can yield savings of up to 70 % with respect to GSMN, while generating a Markov network with comparable or in several cases considerably improved quality. In addition
Multiple testing and error control in Gaussian graphical model selection
 Statistical Science
"... Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of cond ..."
Abstract

Cited by 30 (4 self)
 Add to MetaCart
Abstract. Graphical models provide a framework for exploration of multivariate dependence patterns. The connection between graph and statistical model is made by identifying the vertices of the graph with the observed variables and translating the pattern of edges in the graph into a pattern of conditional independences that is imposed on the variables ’ joint distribution. Focusing on Gaussian models, we review classical graphical models. For these models the defining conditional independences are equivalent to vanishing of certain (partial) correlation coefficients associated with individual edges that are absent from the graph. Hence, Gaussian graphical model selection can be performed by multiple testing of hypotheses about vanishing (partial) correlation coefficients. We show and exemplify how this approach allows one to perform model selection while controlling error rates for incorrect edge inclusion. Key words and phrases: Acyclic directed graph, Bayesian network, bidirected graph, chain graph, concentration graph, covariance graph, DAG, graphical model, multiple testing, undirected graph. 1.
Reverse engineering molecular regulatory networks from microarray data with qpgraphs
 J Comput Biol
, 2009
"... Reverse engineering bioinformatic procedures applied to highthroughput experimental data have become instrumental to generate new hypotheses about molecular regulatory mechanisms. This has been particularly the case for gene expression microarray data where a large number of statistical and computa ..."
Abstract

Cited by 24 (5 self)
 Add to MetaCart
Reverse engineering bioinformatic procedures applied to highthroughput experimental data have become instrumental to generate new hypotheses about molecular regulatory mechanisms. This has been particularly the case for gene expression microarray data where a large number of statistical and computational methodologies have been developed in order to assist in building network models of transcriptional regulation. A major challenge faced by every different procedure is that the number of available samples n for estimating the network model is much smaller than the number of genes p forming the system under study. This compromises many of the assumptions on which the statistics of the methods rely, often leading to unstable performance figures. In this work we apply a recently developed novel methodology based in the socalled qorder limited partial correlation graphs, qpgraphs, which is specifically tailored towards molecular network discovery from microarray expression data with p n. Using experimental and functional annotation data from Escherichia coli here we show how qpgraphs yield more stable performance figures than other stateoftheart methods
Inferring gene dependency networks from genomic longitudinal data: a functional data approach
, 2006
"... A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairw ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
A key aim of systems biology is to unravel the regulatory interactions among genes and gene products in a cell. Here we investigate a graphical model that treats the observed gene expression over time as realizations of random curves. This approach is centered around an estimator of dynamical pairwise correlation that takes account of the functional nature of the observed data. This allows to extend the graphical Gaussian modeling framework from i.i.d. data to analyze longitudinal genomic data. The new method is illustrated by analyzing highly replicated data from a genome experiment concerning the expression response of human Tcells to PMA and ionomicin treatment.
High throughput screening of coexpressed gene pairs with controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS
 J Comput Biol
, 2005
"... Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene coexpression. Traditional screening of pairwise coexpression either controls biological significance or statistical significance, but not both. The former approach does no ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
(Show Context)
Many exploratory microarray data analysis tools such as gene clustering and relevance networks rely on detecting pairwise gene coexpression. Traditional screening of pairwise coexpression either controls biological significance or statistical significance, but not both. The former approach does not provide stochastic error control, and the later approach screens many coexpressions with excessively low correlation. We have designed and implemented a statistically sound twostage coexpression detection algorithm that controls both statistical significance (false discovery rate, FDR) and biological significance (minimum acceptable strength, MAS) of the discovered coexpressions. Based on estimation of pairwise gene correlation, the algorithm provides an initial coexpression discovery that controls only FDR, which is then followed by a second stage coexpression discovery which controls both FDR and MAS. It also computes and thresholds the set of FDR pvalues for each correlation that satisfied the MAS criterion. Using simulated data, we validated asymptotic null distributions of the Pearson and Kendall correlation coefficients and the twostage errorcontrol procedure; we also compared our twostage test procedure with another twostage test procedure using the receiver operating characteristic (ROC) curve. We then used yeast galactose metabolism data to illustrate the advantage of our method for clustering genes and constructing a relevance network. The method has been implemented in an R package “GeneNT ” that is freely available from the Comprehensive R Archive Network (CRAN): www.cran.rproject.org/. Key words: false discovery rate confidence interval, relevance network, error control, gene pathway.
Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data
 Bioinformatics
"... Motivation: Although widely accepted that high throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify the ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
(Show Context)
Motivation: Although widely accepted that high throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here we present a parametric bootstrapping approach for time course data, in which Gaussian process regression is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the timedependence of the data to be taken into account, and is applicable to a wide range of problems. Results: We apply Gaussian process regression bootstrapping to two data sets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ODE model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original data set are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from timecourse gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data.