### Citations

11964 | Maximum likelihood from incomplete data via the EM algorithm,” - Dempster, Laird, et al. - 1977 |

8903 |
Probabilistic reasoning in intelligent systems: networks of plausible inference
- Pearl
- 1988
(Show Context)
Citation Context ... whose treewidth is bounded by one at each leaf. Inference over them is linear in the size of the network using a straight-forward generalization of Pearl and Dechter’s cutset conditioning algorithm (=-=Pearl 1988-=-; Dechter 1990). Unlike other state-of-the-art models such as arithmetic circuits (Darwiche 2003; Lowd and Domingos 2008), sumproduct networks (Poon and Domingos 2011), and thin junction trees (Bach a... |

3495 | A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences 55(1
- Freund, Schapire
- 1997
(Show Context)
Citation Context ..., finds a weighting co-efficient ηm for the new model by performing a line search (step 6), and finally updates the model fm with the new weighting co-efficient (step 7). Next, we derive an AdaBoost (=-=Freund and Schapire 1997-=-) style algorithm for learning ECNets by generalizing the weight update rule in step 4 of Algorithm 1. To derive this rule, we rewrite the weight as ω (i) m+1 = 1 fm(x(i)) = 1 (1− ηm)fm−1(x(i)) + ηmcm... |

2213 | Experiments with a new boosting algorithm
- Freund, Schapire
- 1996
(Show Context)
Citation Context ...moved all edges from the Chow-Liu trees whose mutual information was smaller than 0.005. For learning MCNets we generated bootstrap samples from the weighted datasets in both BDE and GBDE algorithms (=-=Freund and Schapire 1996-=-). The parameters of the mixture were then optimized via the EM algorithm using the original training set for 50 iterations or until convergence. Table 1 reports the test set log-likelihood scores ach... |

993 | A view of the EM algorithm that justifies incremental, sparse, and other variants,”
- Neal, Hinton
- 1999
(Show Context)
Citation Context ...thms are likely to yield more accurate models than the conventional EM algorithm which updates all base models and mixture weights simultaneously because they will have better convergence properties (=-=Neal and Hinton 1998-=-), smaller computational complexity (since networks will be added sequentially or via bootstrapping), and superior ability to escape local maxima than the latter. As we will describe in the section on... |

878 | Approximating discrete probability distributions with dependence trees
- Chow, Liu
- 1968
(Show Context)
Citation Context ...arameters of the network given data is NP-hard in general, in tree Bayesian networks both structure and parameter learning tasks can be solved in polynomial time using the classic Chow-Liu algorithm (=-=Chow and Liu 1968-=-). The time complexity of this algorithm is O(n2e) where n is the number of variables and e is the number of training examples. A tree Bayesian network represents the following distribution: T (x) = ∏... |

281 | Enhancement schemes for constraint processing: backjumping, learning, and cutset decomposition’,
- Dechter
- 1990
(Show Context)
Citation Context ...idth is bounded by one at each leaf. Inference over them is linear in the size of the network using a straight-forward generalization of Pearl and Dechter’s cutset conditioning algorithm (Pearl 1988; =-=Dechter 1990-=-). Unlike other state-of-the-art models such as arithmetic circuits (Darwiche 2003; Lowd and Domingos 2008), sumproduct networks (Poon and Domingos 2011), and thin junction trees (Bach and Jordan 2001... |

145 | Learning with mixtures of trees
- Meilă, Jordan
(Show Context)
Citation Context ...rning sum-product network with direct and indirect variable interactions (ID-SPN), learning Markov networks using arithmetic circuits (ACMN) (Lowd and Rooshenas 2013), learning mixtures of trees(MT) (=-=Meila and Jordan 2001-=-), learning sum-product networks (SPN) and learning latent tree models (LTM). Table 4 reports the performance of these algorithms. These results were taken from (2014; 2014). For bagging and boosting ... |

141 | A differential approach to inference in Bayesian networks - Darwiche |

73 | Sum-product networks: A new deep architecture. - Poon, Domingos - 2011 |

64 | Thin junction trees
- Bach, Jordan
(Show Context)
Citation Context ...l 1988; Dechter 1990). Unlike other state-of-the-art models such as arithmetic circuits (Darwiche 2003; Lowd and Domingos 2008), sumproduct networks (Poon and Domingos 2011), and thin junction trees (=-=Bach and Jordan 2001-=-) in which learning is computationally expensive, CNets admit tractable learning algorithms assuming that the number of nodes in the OR tree is bounded by a constant. Moreover, fast, heuristic algorit... |

28 | Learning arithmetic circuits
- Lowd, Domingos
- 2008
(Show Context)
Citation Context ...ing a straight-forward generalization of Pearl and Dechter’s cutset conditioning algorithm (Pearl 1988; Dechter 1990). Unlike other state-of-the-art models such as arithmetic circuits (Darwiche 2003; =-=Lowd and Domingos 2008-=-), sumproduct networks (Poon and Domingos 2011), and thin junction trees (Bach and Jordan 2001) in which learning is computationally expensive, CNets admit tractable learning algorithms assuming that ... |

24 | Bottom-up learning of Markov network structure.
- Davis, Domingos
- 2010
(Show Context)
Citation Context ...ted the performance of ECNets on 20 real world benchmark datasets (Lowd and Davis 2010; Gens and Domingos 2013; Rooshenas and Lowd 2014; Rahman, Kothalkar, and Gogate 2014; Van Haaren and Davis 2012; =-=Davis and Domingos 2010-=-) listed in Table 4. The number of variables range from 16 to 1556 and the number of training examples vary from 1.6K to 291K examples. All variables are binary. We ran all our experiments on a quad-c... |

24 | Boosting density estimation.
- Rosset, Segal
- 2002
(Show Context)
Citation Context ...ing-based algorithms for learning CNets from data leveraging vast amount of previous work on boosting and bagging algorithms (cf. (Zhou 2012)) as well as their generalizations for density estimation (=-=Rosset and Segal 2002-=-; Ridgeway 2002; Welling, Zemel, and Hinton 2002). Second, we perform and report on a comprehensive empirical evaluation, comparing our new algorithms with several state-of-the-art systems such as sum... |

22 | Learning the structure of sumproduct networks. - Gens, Domingos - 2013 |

19 | Learning Markov network structure with decision trees. - Lowd, Davis - 2010 |

17 | Learning Markov networks with arithmetic circuits. - Lowd, Rooshenas - 2013 |

15 | Self supervised boosting. - Welling, Zemel, et al. - 2002 |

13 | Learning sum-product networks with direct and indirect interactions. - Rooshenas, Lowd - 2014 |

13 | Markov network structure learning: A randomized feature generation approach,” - Haaren, Davis - 2012 |

10 | Looking for lumps: boosting and bagging for density estimation. - Ridgeway - 1999 |

9 | Exploiting Logical Structure in Lifted Probabilistic Inference.
- Gogate, Domingos
- 2010
(Show Context)
Citation Context ...to each other. Our results clearly show that our new additive models are quite powerful and superior to state-of-theart algorithms. Future work includes: learning ensembles of tractable lifted CNets (=-=Gogate and Domingos 2010-=-); adding AND or product nodes to CNets; inducing OR graphs instead of OR trees; etc. Acknowledgments. We gratefully acknowledge the support of the Defense Advanced Research Projects Agency (DARPA) Pr... |

7 | Random forests.” Machine learning 45(1):5–32 - Breiman |

3 | Tractable learning for structured probability spaces: A case study in learning preference distributions. - Choi, Broeck, et al. - 2015 |

3 | Cutset networks: A simple, tractable, and scalable approach for improving the accuracy of chow-liu trees. - Rahman, Kothalkar, et al. - 2014 |

2 | Sub-quadratic Markov tree mixture learning based on randomizations of the Chow-Liu algorithm.
- Ammar, Leray, et al.
- 2010
(Show Context)
Citation Context ...sen based on the accuracy on the validation set. In bagging MCNets, we randomized the structure using bootstrap replicates and then learned the best parameters for that structure on the training set (=-=Ammar et al. 2010-=-). EM was run for 100 iterations or until convergence and no restarts were performed. The test set log-likelihood scores are given in Table 3. Bag of variable depth MCNets performs the best out of the... |

1 | Learning accurate cutset networks by exploiting decomposability - Mauro, Vergari, et al. - 2015 |