#### DMCA

## Merging Strategies for Sum-Product Networks: From Trees to Graphs

### Citations

915 |
Probabilistic graphical models: principles and techniques
- Koller, Friedman
- 2009
(Show Context)
Citation Context ...tion by developing post-processing approaches that induce graph SPNs from tree SPNs by merging similar sub-structures. The key benefits of graph SPNs over tree SPNs include smaller computational complexity which facilitates faster online inference, and better generalization accuracy because of reduced variance, at the cost of slight increase in the learning time. We demonstrate experimentally that our merging techniques significantly improve the accuracy of tree SPNs, achieving state-of-the-art performance on several real world benchmark datasets. 1 INTRODUCTION Probabilistic graphical models [8, 17] such as Bayesian and Markov networks are routinely used in a wide variety of application domains such as computer vision and natural language understanding for modeling and reasoning about uncertainty. However, exact inference in them – the task of answering queries given a model – is NP-hard in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive per... |

878 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...ee Markov networks Independence tests Latent Variables Rooshenas and Lowd [26] Tractable Arithmetic Circuits Independence tests Latent variables Table 1: Examples of SPN structure learning approaches in the literature that follow the prescription given in Algorithm 1. Base case is the stopping criteria for the recursive algorithm. [11, 15] stop when only one variable remains and induce a univariate distribution; [25, 28, 24] stop when the entropy of the data is small or use a Bayesian criteria, and induce an SPN corresponding to a tree Markov network at the leaves using the Chow-Liu algorithm [5] (this algorithm runs in polynomial time and yields an optimal tree Markov network according to the maximum likelihood criteria). [26] learns an SPN over observed variables in the base case using the algorithm described in [19]. In the decomposition step, [11, 26, 28] use pair-wise variable independence tests (e.g., the G-test) for inducing the product nodes; [15] uses no independence tests and instead assume that each split decomposes the variables into multiple components; while [25, 24] ignore the decomposition step inducing only sum nodes. [11, 26, 28] split only over latent variables, [15... |

286 | On the hardness of approximate reasoning
- Roth
- 1996
(Show Context)
Citation Context ...lude in section 5. 2 BACKGROUND Any (discrete) probability distribution over a set of variables V can be expressed using an arithmetic circuit (AC) [7] or a sum-product network (SPN) [23].2 The key benefit of SPNs over conventional uncertainty representations such as Bayesian and Markov networks is that in SPNs, common probabilistic inference tasks such as maximuma-posteriori (MAP) and posterior marginal (MAR) estimation can be solved in time and space that scales linearly with the size of the representation. In Bayesian and Markov networks, these tasks are known to be NP-hard in general (cf. [27]). The caveat is that SPNs can be exponentially larger than Bayesian and Markov networks; they are often compiled from the latter by running exact probabilistic inference techniques such as variable elimination [4] and AND/OR search [21], in order to facilitate faster online inference. Formally, Definition 1. An SPN [23] is recursively defined as follows: 1. A tractable univariate distribution is an SPN; 2. A product of SPNs defined over different variables is an SPN; and 3. A weighted sum of SPNs with the same scope variables is an SPN. An SPN can be expressed as a rooted directed acyclic gra... |

145 | Learning with mixtures of trees
- Meilă, Jordan
(Show Context)
Citation Context ...ly describes how they differ. Although, the structure learning problem is NP-hard in SPNs having only observed variables as well as in SPNs having both observed and latent variables, the parameter (weight) learning problem is easier in the former than the latter. In particular, parameter learning can be done in closed form when the SPN has only observed variables. On the other hand, the optimization problem is non-convex in the presence of latent variables and one has to use iterative algorithms having high computational complexity such as hard and soft EM to solve the non-convex problem (cf. [23, 22, 24]). Thus, although latent variables help yield a more powerful representation, they often significantly increase the learning time. 3 CONVERTING TREE SPNs TO GRAPH SPNs A key problem with existing methods for learning SPNs is that they induce tree models, except at the leaves. It is well known in the probabilistic inference literature [9, 7, 6] that tree SPNs can be exponentially larger than graph SPNs, which are obtained from the former by merging identical sub-SPNs (see Fig.1(a) and (b)). Thus, converting tree SPNs to graph SPNs is a good idea because they can significantly improve the time r... |

141 | A differential approach to inference in Bayesian networks
- Darwiche
(Show Context)
Citation Context ...nce in them – the task of answering queries given a model – is NP-hard in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that re... |

137 |
Modeling and Reasoning with Bayesian Networks,
- Darwiche
- 2009
(Show Context)
Citation Context ...tion by developing post-processing approaches that induce graph SPNs from tree SPNs by merging similar sub-structures. The key benefits of graph SPNs over tree SPNs include smaller computational complexity which facilitates faster online inference, and better generalization accuracy because of reduced variance, at the cost of slight increase in the learning time. We demonstrate experimentally that our merging techniques significantly improve the accuracy of tree SPNs, achieving state-of-the-art performance on several real world benchmark datasets. 1 INTRODUCTION Probabilistic graphical models [8, 17] such as Bayesian and Markov networks are routinely used in a wide variety of application domains such as computer vision and natural language understanding for modeling and reasoning about uncertainty. However, exact inference in them – the task of answering queries given a model – is NP-hard in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive per... |

128 | Decomposable negation normal form.
- Darwiche
- 2001
(Show Context)
Citation Context ...bserved variables. On the other hand, the optimization problem is non-convex in the presence of latent variables and one has to use iterative algorithms having high computational complexity such as hard and soft EM to solve the non-convex problem (cf. [23, 22, 24]). Thus, although latent variables help yield a more powerful representation, they often significantly increase the learning time. 3 CONVERTING TREE SPNs TO GRAPH SPNs A key problem with existing methods for learning SPNs is that they induce tree models, except at the leaves. It is well known in the probabilistic inference literature [9, 7, 6] that tree SPNs can be exponentially larger than graph SPNs, which are obtained from the former by merging identical sub-SPNs (see Fig.1(a) and (b)). Thus, converting tree SPNs to graph SPNs is a good idea because they can significantly improve the time required to make predictions. From a learning point of view, graph SPNs can potentially improve the generalization performance by addressing the following issue associated with the LEARNSPN algorithm: as the depth of the node increases,3 the number of train3The depth of a node equals the number of sum nodes from the root to the node. ing exampl... |

125 | Random search for hyper-parameter optimization
- Bergstra, Bengio
- 2012
(Show Context)
Citation Context ...ng bagged ensemble of graph SPNs. As a strong baseline, we also compare with five other state-of-the-art tractable model learners: (1) learning sum-product networks with direct and indirect variable interactions (ID-SPN) [26], learning Markov networks using arithmetic circuits (ACMN) [20], learning mixtures of cutset networks (MCNet) [25], learning sum-product networks via SVD based algorithm (SPN-SVD) [1] and learning ensembles of cutset networks (ECNet) [24]. In our experiments, we fixed the number of bags to 40 following [24]. Instead of performing a grid search, we performed random search [3] to create a configuration for the models in the ensemble. Each component model was then weighted according to its likelihood on the training set. To get better accuracy, we treated the bagged ensemble of L-SPNs and O-SPNs as an SPN having one latent sum node as the root and each independent component (bag) as its child sub-SPN. The benefit of this approach is that instead of optimizing the local log-likelihood scores of individual SPNs, while merging, we can directly optimize the global log-likelihood. Table 3 shows the bagged ensemble scores of L-SPNs and O-SPNs before and after merging as w... |

118 | AND/OR search spaces for graphical models
- Dechter, Mateescu
(Show Context)
Citation Context ... intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that represents a joint probability distribution over a large number of random variables, both observed and late... |

73 | Sum-product networks: A new deep architecture.
- Poon, Domingos
- 2011
(Show Context)
Citation Context ...models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that represents a joint probability distribution over a large number of random variables, both observed and latent. It has two types of intern... |

70 | Probabilistic theorem proving.
- Gogate, Domingos
- 2011
(Show Context)
Citation Context ...4 -10.54 MSWeb 294 29441 32750 5000 -9.84 -9.78 -9.77 -9.76 -9.22 Book 500 8700 1159 1739 -36.49 -34.25 -36.35 -35.89 -30.18 EachMovie 500 4524 1002 591 -54.70 -50.72 -55.82 -53.07 -51.14 WebKB 839 2803 558 838 -170.27 -150.04 -166.65 -152.82 -150.10 Reuters-52 889 6532 1028 1540 -84.32 -80.66 -86.00 -82.66 -82.10 20NewsGrp. 910 11293 3764 3764 -151.48 -150.80 -158.40 -154.28 -151.47 BBC 1058 1670 225 330 -265.89 -233.26 -244.12 -238.61 -236.82 Ad 1556 2461 327 491 -16.33 -14.58 -15.69 -14.34 -14.36 proaches that search for similarities in sub-SPNs having different (even disjoint) scopes (cf. [12, 14]); analyzing contexts – assignment to variables on the path from the root – of merged sub-SPNs for finding symmetric contexts; directly inducing graph SPNs from data rather than using post-processing schemes; and extending the approach presented in the paper to hybrid domains having both discrete and continuous variables. Acknowledgements This research was funded in part by the DARPA Probabilistic Programming for Advanced Machine Learning Program under AFRL prime contract number FA8750-14-C-0005 and by the NSF award 1528037. The views and conclusions contained in this document are those of the... |

64 | Thin junction trees
- Bach, Jordan
(Show Context)
Citation Context ...ertainty. However, exact inference in them – the task of answering queries given a model – is NP-hard in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted... |

51 | Variational approximations between mean field theory and the junction tree algorithm.
- Wiegerinck
- 2000
(Show Context)
Citation Context ... distance between two sub-SPNs can be quite hard. For instance, assume that the two sub-SPNs represent Markov networks (MNs) and the junction tree or AND/OR graph search algorithm [9] is used for computing the KL divergence between the probability distributions represented by the two MNs. In this case, the time and space complexity of computation is exponential in the treewidth of the graph obtained by taking a union of the edges of the two MNs. The treewidth of this graph can be quite large (see Fig. 2 for an example). Therefore, we propose to use the following mean-field style approximation [29] of the distance between the two distributions: D(P ||Q) ≈ 1|V| ∑ Vi∈V D(P (Vi)||Q(Vi)) where P and Q are two distributions over V and D is a distance function (e.g., KL divergence, relative error, Hellinger distance, etc.). Since single-variable marginal distributions in each sub-SPN can be computed in time that is linear in the number of nodes of the sub-SPN (and in practice can be pre-computed), our proposed distance method is also linear time. Next, we describe our greedy, bottom-up approach for merging similar sub-SPNs of a given SPN S (see Algorithm 2). The algorithm begins by initializi... |

48 | Compiling Bayesian Networks Using Variable Elimination.
- Chavira, Darwiche
- 2007
(Show Context)
Citation Context ... over latent or observed variables and product nodes which represent decomposition of variables into independent components. Leaf nodes represent simple distributions over observed variables (e.g., uniform distribution, univariate distributions, etc.). The key advantage of SPNs and other equivalent representations such as ACs 1 over thin-junction trees is that they can be much compact and never larger than the latter. This is because they take advantage of various fine-grained structural properties such as determinism, context-specific independence, dynamic variable orderings and caching (cf. [7, 9, 4, 13]). For instance, in some cases, they can represent high-treewidth junction trees using only a handful of sum and product nodes [23]. The literature abounds with algorithms for learning the structure of SPNs and ACs from data, starting with the 1The equivalence between ACs and SPNs was shown by Rooshenas and Lowd [26]. Thus, algorithms for learning ACs can be used to learn SPNs and vice versa. work of Lowd and Domingos [19] who proposed to learn ACs over observed variables by using the AC size as a learning (inductive) bias within a Bayesian network structure learning algorithm, and then compil... |

28 | Learning arithmetic circuits
- Lowd, Domingos
- 2008
(Show Context)
Citation Context ...se they take advantage of various fine-grained structural properties such as determinism, context-specific independence, dynamic variable orderings and caching (cf. [7, 9, 4, 13]). For instance, in some cases, they can represent high-treewidth junction trees using only a handful of sum and product nodes [23]. The literature abounds with algorithms for learning the structure of SPNs and ACs from data, starting with the 1The equivalence between ACs and SPNs was shown by Rooshenas and Lowd [26]. Thus, algorithms for learning ACs can be used to learn SPNs and vice versa. work of Lowd and Domingos [19] who proposed to learn ACs over observed variables by using the AC size as a learning (inductive) bias within a Bayesian network structure learning algorithm, and then compiling the induced Bayesian network to an AC. Later Lowd and Rooshenas [20] extended this algorithm to learn a Markov network having small AC size. The latter performs much better in terms of test set log likelihood score than the former because of the increased flexibility afforded by the undirected Markov network structure. A limitation of the two aforementioned approaches for learning ACs is that they do not use latent var... |

22 | Learning the structure of sumproduct networks.
- Gens, Domingos
- 2013
(Show Context)
Citation Context ...earning problem (a sub-step in structure learning) – the problem of learning the weights or probabilities of a given SPN structure – is much harder in presence of latent variables. In particular, the optimization problem is non-convex, which necessitates the use of algorithms such as gradient descent and expectation maximization that only converge to a local minima. However, since learning is often an offline process, this increase in complexity is often not a big issue. The first approach for learning the structure of SPNs having both latent and observed variables is due to Gens and Domingos [11]. An issue with this approach is that it learns only directed trees instead of (directed acyclic) graphs and as a result is unable to fully exploit the power and flexibility of SPNs. To address this limitation, Rahman et al. [25], Vergari et al. [28] and Rooshenas and Lowd [26] proposed to learn a graph SPN over observed variables while Dennis and Ventura [10] proposed to learn a graph SPN over latent variables. A drawback of these approaches is that they are unable to learn a graph SPN over both observed and latent variables. In this paper, we address this limitation. The main idea in our app... |

19 | Learning Markov network structure with decision trees.
- Lowd, Davis
- 2010
(Show Context)
Citation Context ... observed variables. In this case, unlike in the general case, we propose to learn both the structure and parameters of the merged sub-SPN (using the merged dataset). This is because both the structure and parameter learning problem in such SPNs can be solved in polynomial time using the Chow-Liu algorithm [5]. 4 EXPERIMENTS 4.1 SETUP We evaluated the impact of merging SPNs on 20 real world benchmark datasets presented in Table 3. These datasets have been used in numerous previous studies for evaluating the performance of a wide variety of tractable probabilistic graphical model learners (cf. [18, 11, 26, 25, 1, 24]). All datasets are defined over binary variables that take values from the set {0, 1}. The number of the variables in them range from 16 to 1556 and the number of training instances range from 1600 to 291326. All of our experiments were performed on a quad-core Intel i7 2.7 GHz machines with 16 GB RAM. Each algorithm was given a time bound of 48 hours, after which the algorithm was terminated. 4.2 ALGORITHMS EVALUATED We implemented two variants of SPNs: SPNs in which sum nodes split over value assignments to a latent variable and SPNs in which sum nodes split over value assignments to a heur... |

18 | AND/OR multi-valued decision diagrams (AOMDDs) for graphical models.
- Mateescu, Dechter, et al.
- 2008
(Show Context)
Citation Context ... intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that represents a joint probability distribution over a large number of random variables, both observed and late... |

17 | Learning efficient Markov networks
- Gogate, Webb, et al.
- 2010
(Show Context)
Citation Context ...f neither the base case nor the conditions for the decomposition step are satisfied, then the algorithm partitions the training instances into clusters of multiple instances, inducing a sum node, and recurses on each part. Several techniques proposed in literature for learning SPNs (and equivalently ACs) can be understood as special cases of Algorithm 1, with the difference between them being the approaches used at the three steps. Table 1 gives examples Reference Base Case Decomposition Splitting Gens and Domingos [11] Univariate distribution Independence tests Latent Variables Gogate et al. [15] Univariate distribution Independence assumption Conjunctive fixed-length features Cutset networks (CNets)[25] Tree Markov networks not used Observed variables Ensembles of CNets[24] Tree Markov networks not used Observed and Latent variables Vergari et al. [28] Tree Markov networks Independence tests Latent Variables Rooshenas and Lowd [26] Tractable Arithmetic Circuits Independence tests Latent variables Table 1: Examples of SPN structure learning approaches in the literature that follow the prescription given in Algorithm 1. Base case is the stopping criteria for the recursive algorithm. [1... |

17 | Learning Markov networks with arithmetic circuits.
- Lowd, Rooshenas
- 2013
(Show Context)
Citation Context ...nction trees using only a handful of sum and product nodes [23]. The literature abounds with algorithms for learning the structure of SPNs and ACs from data, starting with the 1The equivalence between ACs and SPNs was shown by Rooshenas and Lowd [26]. Thus, algorithms for learning ACs can be used to learn SPNs and vice versa. work of Lowd and Domingos [19] who proposed to learn ACs over observed variables by using the AC size as a learning (inductive) bias within a Bayesian network structure learning algorithm, and then compiling the induced Bayesian network to an AC. Later Lowd and Rooshenas [20] extended this algorithm to learn a Markov network having small AC size. The latter performs much better in terms of test set log likelihood score than the former because of the increased flexibility afforded by the undirected Markov network structure. A limitation of the two aforementioned approaches for learning ACs is that they do not use latent variables; it turns out that their accuracy can be greatly improved using latent variables. Unfortunately, the parameter learning problem (a sub-step in structure learning) – the problem of learning the weights or probabilities of a given SPN struct... |

16 | Formula-Based Probabilistic Inference.
- Gogate, Domingos
- 2010
(Show Context)
Citation Context ... over latent or observed variables and product nodes which represent decomposition of variables into independent components. Leaf nodes represent simple distributions over observed variables (e.g., uniform distribution, univariate distributions, etc.). The key advantage of SPNs and other equivalent representations such as ACs 1 over thin-junction trees is that they can be much compact and never larger than the latter. This is because they take advantage of various fine-grained structural properties such as determinism, context-specific independence, dynamic variable orderings and caching (cf. [7, 9, 4, 13]). For instance, in some cases, they can represent high-treewidth junction trees using only a handful of sum and product nodes [23]. The literature abounds with algorithms for learning the structure of SPNs and ACs from data, starting with the 1The equivalence between ACs and SPNs was shown by Rooshenas and Lowd [26]. Thus, algorithms for learning ACs can be used to learn SPNs and vice versa. work of Lowd and Domingos [19] who proposed to learn ACs over observed variables by using the AC size as a learning (inductive) bias within a Bayesian network structure learning algorithm, and then compil... |

13 | Learning sum-product networks with direct and indirect interactions.
- Rooshenas, Lowd
- 2014
(Show Context)
Citation Context ... 1 over thin-junction trees is that they can be much compact and never larger than the latter. This is because they take advantage of various fine-grained structural properties such as determinism, context-specific independence, dynamic variable orderings and caching (cf. [7, 9, 4, 13]). For instance, in some cases, they can represent high-treewidth junction trees using only a handful of sum and product nodes [23]. The literature abounds with algorithms for learning the structure of SPNs and ACs from data, starting with the 1The equivalence between ACs and SPNs was shown by Rooshenas and Lowd [26]. Thus, algorithms for learning ACs can be used to learn SPNs and vice versa. work of Lowd and Domingos [19] who proposed to learn ACs over observed variables by using the AC size as a learning (inductive) bias within a Bayesian network structure learning algorithm, and then compiling the induced Bayesian network to an AC. Later Lowd and Rooshenas [20] extended this algorithm to learn a Markov network having small AC size. The latter performs much better in terms of test set log likelihood score than the former because of the increased flexibility afforded by the undirected Markov network stru... |

9 | Exploiting Logical Structure in Lifted Probabilistic Inference.
- Gogate, Domingos
- 2010
(Show Context)
Citation Context ...4 -10.54 MSWeb 294 29441 32750 5000 -9.84 -9.78 -9.77 -9.76 -9.22 Book 500 8700 1159 1739 -36.49 -34.25 -36.35 -35.89 -30.18 EachMovie 500 4524 1002 591 -54.70 -50.72 -55.82 -53.07 -51.14 WebKB 839 2803 558 838 -170.27 -150.04 -166.65 -152.82 -150.10 Reuters-52 889 6532 1028 1540 -84.32 -80.66 -86.00 -82.66 -82.10 20NewsGrp. 910 11293 3764 3764 -151.48 -150.80 -158.40 -154.28 -151.47 BBC 1058 1670 225 330 -265.89 -233.26 -244.12 -238.61 -236.82 Ad 1556 2461 327 491 -16.33 -14.58 -15.69 -14.34 -14.36 proaches that search for similarities in sub-SPNs having different (even disjoint) scopes (cf. [12, 14]); analyzing contexts – assignment to variables on the path from the root – of merged sub-SPNs for finding symmetric contexts; directly inducing graph SPNs from data rather than using post-processing schemes; and extending the approach presented in the paper to hybrid domains having both discrete and continuous variables. Acknowledgements This research was funded in part by the DARPA Probabilistic Programming for Advanced Machine Learning Program under AFRL prime contract number FA8750-14-C-0005 and by the NSF award 1528037. The views and conclusions contained in this document are those of the... |

5 | Probabilistic sentential decision diagrams.
- Kisa, Broeck, et al.
- 2014
(Show Context)
Citation Context ... in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that represents a joint probability distribution over a large number of random... |

3 | Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees.
- Rahman, Kothalkar, et al.
- 2014
(Show Context)
Citation Context ...k of answering queries given a model – is NP-hard in general and computationally intractable for most real-world models. As a result, approximate inference algorithms such as loopy belief propagation and Gibbs sampling are widely used in practice. However, they can often yield highly inaccurate and high variance estimates, leading to poor predictive performance. One approach to tackle the inaccuracy and unreliability of approximate inference is to learn so-called tractable models from data. Examples of such models include thin junction trees [2], arithmetic circuits (ACs) [7], cutset networks [25], probabilistic sentential decision diagrams [16], AND/OR decision diagrams [9, 21] and sum-product networks [23]. Inference in these models is polynomial (often linear) in the size of the model and therefore the complexity and accuracy of inference is no longer an issue. In other words, once an accurate model is learned from data, predictions are guaranteed to be accurate. In this paper, we focus on the NP-hard problem of learning both the structure and parameters of sum-product networks (SPNs) from data. At a high level, an SPN is a rooted directed acyclic graph that represents a joint proba... |

1 |
Learning the structure of sum-product networks via an svd-based algorithm.
- Adel, Balduzzi, et al.
- 2015
(Show Context)
Citation Context ...ond contribution of this paper is a thorough experimental evaluation of our proposed merging algorithms on 20 benchmark datasets, all of which were used in several previous studies. Our experiments clearly show that merging always improves the performance of tree SPNs, measured in terms of test-set log-likelihood score and prediction time. We also experimentally compared bagged ensembles of graph SPNs with state-of-the-art approaches such as ensembles of cutset networks [24], sum-product networks with direct and indirect interactions [26], sumproduct networks learned via the SVD-based approach[1], arithmetic circuits with Markov networks [20], and mixtures of cutset networks [25] on the same datasets, and found that our new approach yields better test-set log likelihood score on 8 out of the 20 datasets with two ties. This clearly demonstrates the power of our new merging algorithms. The rest of the paper is organized as follows. In the next section, we present background on SPNs, related work as well as a generic algorithm for learning tree SPNs. Section 3 describes powerful merging approaches for converting an arbitrary tree SPN to a graph SPN. Experimental results are presented in ... |

1 |
Greedy structure search for sum-product networks.
- Dennis, Ventura
- 2015
(Show Context)
Citation Context ...inima. However, since learning is often an offline process, this increase in complexity is often not a big issue. The first approach for learning the structure of SPNs having both latent and observed variables is due to Gens and Domingos [11]. An issue with this approach is that it learns only directed trees instead of (directed acyclic) graphs and as a result is unable to fully exploit the power and flexibility of SPNs. To address this limitation, Rahman et al. [25], Vergari et al. [28] and Rooshenas and Lowd [26] proposed to learn a graph SPN over observed variables while Dennis and Ventura [10] proposed to learn a graph SPN over latent variables. A drawback of these approaches is that they are unable to learn a graph SPN over both observed and latent variables. In this paper, we address this limitation. The main idea in our approach is as follows. We first learn a tree SPN over latent and observed nodes using standard algorithms, and then convert the tree SPN to a graph SPN by processing the SPN in a bottom-up fashion, merging two sub-SPNs if the distributions represented by them are similar and defined over the same variables. To convert this idea into a general-purpose algorithm, ... |

1 | Learning ensembles of cutset networks.
- Rahman, Gogate
- 2016
(Show Context)
Citation Context ...solve and therefore we develop approximate algorithms for solving them, which is the main contribution of this paper. The second contribution of this paper is a thorough experimental evaluation of our proposed merging algorithms on 20 benchmark datasets, all of which were used in several previous studies. Our experiments clearly show that merging always improves the performance of tree SPNs, measured in terms of test-set log-likelihood score and prediction time. We also experimentally compared bagged ensembles of graph SPNs with state-of-the-art approaches such as ensembles of cutset networks [24], sum-product networks with direct and indirect interactions [26], sumproduct networks learned via the SVD-based approach[1], arithmetic circuits with Markov networks [20], and mixtures of cutset networks [25] on the same datasets, and found that our new approach yields better test-set log likelihood score on 8 out of the 20 datasets with two ties. This clearly demonstrates the power of our new merging algorithms. The rest of the paper is organized as follows. In the next section, we present background on SPNs, related work as well as a generic algorithm for learning tree SPNs. Section 3 descr... |

1 |
Simplifying, regularizing and strengthening sum-product network structure learning.
- Vergari, Mauro, et al.
- 2015
(Show Context)
Citation Context ...ates the use of algorithms such as gradient descent and expectation maximization that only converge to a local minima. However, since learning is often an offline process, this increase in complexity is often not a big issue. The first approach for learning the structure of SPNs having both latent and observed variables is due to Gens and Domingos [11]. An issue with this approach is that it learns only directed trees instead of (directed acyclic) graphs and as a result is unable to fully exploit the power and flexibility of SPNs. To address this limitation, Rahman et al. [25], Vergari et al. [28] and Rooshenas and Lowd [26] proposed to learn a graph SPN over observed variables while Dennis and Ventura [10] proposed to learn a graph SPN over latent variables. A drawback of these approaches is that they are unable to learn a graph SPN over both observed and latent variables. In this paper, we address this limitation. The main idea in our approach is as follows. We first learn a tree SPN over latent and observed nodes using standard algorithms, and then convert the tree SPN to a graph SPN by processing the SPN in a bottom-up fashion, merging two sub-SPNs if the distributions represented ... |