Results 1  10
of
18
2012a: Causal discovery for climate research using graphical models
 J. Climate
"... ABSTRACT Causal discovery seeks to recover causeeffect relationships from statistical data using graphical models. One goal of this paper is to provide an accessible introduction to causal discovery methods for climate scientists, with a focus on constraintbased structure learning. Second, in a d ..."
Abstract

Cited by 5 (1 self)
 Add to MetaCart
(Show Context)
ABSTRACT Causal discovery seeks to recover causeeffect relationships from statistical data using graphical models. One goal of this paper is to provide an accessible introduction to causal discovery methods for climate scientists, with a focus on constraintbased structure learning. Second, in a detailed case study constraintbased structure learning is applied to derive hypotheses of causal relationships between four prominent modes of atmospheric lowfrequency variability in boreal winter including the Western Pacific Oscillation (WPO), Eastern Pacific Oscillation (EPO), PacificNorth America (PNA) pattern, and North Atlantic Oscillation (NAO). The results are shown in the form of static and temporal independence graphs also known as Bayesian Networks. It is found that WPO and EPO are nearly indistinguishable from the causeeffect perspective as strong simultaneous coupling is identified between the two. In addition, changes in the state of EPO (NAO) may cause changes in the state of NAO (PNA) approximately 18 (36) days later. These results are not only consistent with previous findings on dynamical processes connecting different lowfrequency modes (e.g., interaction between synoptic and lowfrequency eddies) but also provide the basis for formulating new hypotheses regarding the time scale and temporal sequencing of dynamical processes responsible for these connections. Last, the authors propose to use structure learning for climate networks, which are currently based primarily on correlation analysis. While correlationbased climate networks focus on similarity between nodes, independence graphs would provide an alternative viewpoint by focusing on information flow in the network.
Reasoning about Independence in Probabilistic Models of Relational Data
, 2013
"... The rules of dseparation provide a theoretical and algorithmic framework for deriving conditional independence facts from model structure. However, this theory only applies to Bayesian networks. Many realworld systems are characterized by interacting heterogeneous entities and probabilistic depend ..."
Abstract

Cited by 5 (3 self)
 Add to MetaCart
(Show Context)
The rules of dseparation provide a theoretical and algorithmic framework for deriving conditional independence facts from model structure. However, this theory only applies to Bayesian networks. Many realworld systems are characterized by interacting heterogeneous entities and probabilistic dependencies that cross the boundaries of entities. Consequently, researchers have developed extensions to Bayesian networks that can represent these relational dependencies. We show that the theory of dseparation inaccurately infers conditional independence when applied directly to the structure of probabilistic models of relational data. We introduce relational dseparation, a theory for deriving conditional independence facts from relational models, and we provide a new representation, the abstract ground graph, that enables a sound, complete, and computationally efficient method for answering dseparation queries about relational models.
Constraintbased causal discovery from multiple interventions over overlapping variable sets. arXiv:1403.2150
, 2014
"... Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantitie ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions, manipulations, perturbations) and measure different sets of quantities (variables). The result is a collection of heterogeneous data sets coming from different data distributions. In this work, we present algorithm COmbINE, which accepts a collection of data sets over overlapping variable sets under different experimental conditions; COmbINE then outputs a summary of all causal models indicating the invariant and variant structural characteristics of all models that simultaneously fit all of the input data sets. COmbINE converts estimated dependencies and independencies in the data into path constraints on the datagenerating causal model and encodes them as a SAT instance. The algorithm is sound and complete in the sample limit. To account for conflicting constraints arising from statistical errors, we introduce a general method for sorting constraints in order of confidence, computed as a function of their corresponding pvalues. In our empirical evaluation, COmbINE outperforms in terms of efficiency the only preexisting similar algorithm; the latter additionally admits feedback cycles, but does not admit conflicting constraints which hinders the applicability on real data. As a proofofconcept, COmbINE is employed to coanalyze 4 real, masscytometry data sets measuring phosphorylated protein concentrations of overlapping protein sets under 3 different interventions. 1.
Comparison of Statistical Methods for Finding Network Motifs
"... There has been much recent interest in systems biology for investigating the structure of gene regulatory systems. One popular approach is by network analysis with Gaussian graphical models (GGMs), which are statistical models associated with undirected graphs, where vertices of the graph represent ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
There has been much recent interest in systems biology for investigating the structure of gene regulatory systems. One popular approach is by network analysis with Gaussian graphical models (GGMs), which are statistical models associated with undirected graphs, where vertices of the graph represent genes and edges indicate regulatory interactions. Gene expression microarray data allow us to observe the amount of mRNA simultaneously for a large number of genes p under different experimental conditions n, where p is usually much larger than n prohibiting the use of standard methods. In this paper we assess and compare the performance of a number of procedures that have been specifically designed to address this large p – small n issue: G–Lasso estimation (Friedman et al., 2008), Neighbourhood selection (Meinshausen and Bühlmann, 2006), shrinkage estimation using empirical Bayes for model selection (Schäfer and Strimmer, 2005), and PCalgorithm (Kalisch and Bühlmann, 2007). We found that all approaches performed poorly on the benchmark E.coli network. Hence we systematically studied their ability to detect specific recurring regulatory patterns, called network motifs, that are interesting from a biological point of view. We conclude that all methods have difficulty detecting hubs, but the PCalgorithm is most promising.
Dichotomization invariant logmean linear parameterization for discrete graphical models
, 2013
"... of marginal independence ..."
(Show Context)
PC ALGORITHM FOR GAUSSIAN COPULA GRAPHICAL MODELS
"... Abstract. The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and highdimensional consistency results have been obtained for ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and highdimensional consistency results have been obtained for the PC algorithm in this setting. We prove that highdimensional consistency carries over to the broader class of Gaussian copula or nonparanormal models when using rankbased measures of correlation. For graphs with bounded degree, our result is as strong as prior Gaussian results. In simulations, the ‘Rank PC’ algorithm works as well as the ‘Pearson PC ’ algorithm for normal data and considerably better for nonnormal Gaussian copula data, all the while incurring a negligible increase of computation time. Simulations with contaminated data show that rank correlations can also perform better than other robust estimates considered in previous work when the underlying distribution does not belong to the nonparanormal family. 1.