Results 11  20
of
115
On Sensitivity of the MAP Bayesian Network Structure to the Equivalent Sample Size Parameter
, 2007
"... BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, o ..."
Abstract

Cited by 24 (4 self)
 Add to MetaCart
BDeu marginal likelihood score is a popular model selection criterion for selecting a Bayesian network structure based on sample data. This noninformative scoring criterion assigns same score for network structures that encode same independence statements. However, before applying the BDeu score, one must determine a single parameter, the equivalent sample size α. Unfortunately no generally accepted rule for determining the α parameter has been suggested. This is disturbing, since in this paper we show through a series of concrete experiments that the solution of the network structure optimization problem is highly sensitive to the chosen α parameter value. Based on these results, we are able to give explanations for how and why this phenomenon happens, and discuss ideas for solving this problem.
An O∗(2 n ) algorithm for graph coloring and other partitioning problems via inclusionexclusion
 IN PROCEEDINGS OF THE 47TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2006), IEEE
, 2006
"... We use the principle of inclusion and exclusion, combined with polynomial time segmentation and fast Möbius transform, to solve the generic problem of summing or optimizing over the partitions of n elements into a given number of weighted subsets. This problem subsumes various classical graph partit ..."
Abstract

Cited by 23 (1 self)
 Add to MetaCart
(Show Context)
We use the principle of inclusion and exclusion, combined with polynomial time segmentation and fast Möbius transform, to solve the generic problem of summing or optimizing over the partitions of n elements into a given number of weighted subsets. This problem subsumes various classical graph partitioning problems, such as graph coloring, domatic partitioning, and MAX kCUT, aswell as machine learning problems like decision graph learning and modelbased data clustering. Our algorithm runs in O ∗ (2 n) time, thus substantially improving on the usual O ∗ (3 n)time dynamic programming algorithm; the notation O ∗ suppresses factors polynomial in n. This result improves, e.g., Byskov’s recent record for graph coloring from O ∗ (2.4023 n) to O ∗ (2 n). We note that twenty five years ago, R. M. Karp used inclusion–exclusion in a similar fashion to reduce the space requirement of the usual dynamic programming algorithms from exponential to polynomial.
Improving the scalability of optimal Bayesian network learning with externalmemory frontier breadthfirst branch and bound search
 IN PROCEEDINGS OF THE 27TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
"... Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
(Show Context)
Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search algorithm for learning the structure of a Bayesian network. Instead of using A*, we propose a frontier breadthfirst branch and bound search that leverages the layered structure of the search graph of this problem so that no more than two layers of the graph, plus solution reconstruction information, need to be stored in memory at a time. To further improve scalability, the algorithm stores most of the graph in external memory, such as hard disk, when it does not fit in RAM. Experimental results show that the resulting algorithm solves significantly larger problems than the current state of the art.
Finding Optimal Bayesian Network Given a SuperStructure
"... Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independenc ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
Classical approaches used to learn Bayesian network structure from data have disadvantages in terms of complexity and lower accuracy of their results. However, a recent empirical study has shown that a hybrid algorithm improves sensitively accuracy and speed: it learns a skeleton with an independency test (IT) approach and constrains on the directed acyclic graphs (DAG) considered during the searchandscore phase. Subsequently, we theorize the structural constraint by introducing the concept of superstructure S, which is an undirected graph that restricts the search to networks whose skeleton is a subgraph of S. We develop a superstructure constrained optimal search (COS): its time complexity is upper bounded by O(γm n), where γm < 2 depends on the maximal degree m of S. Empirically, complexity depends on the average degree ˜m and sparse structures allow larger graphs to be calculated. Our algorithm is faster than an optimal search by several orders and even finds more accurate results when given a sound superstructure. Practically, S can be approximated by IT approaches; significance level of the tests controls its sparseness, enabling to control the tradeoff between speed and accuracy. For incomplete superstructures, a greedily postprocessed version (COS+) still enables to significantly outperform other heuristic searches. Keywords: subset Bayesian networks, structure learning, optimal search, superstructure, connected 1.
Learning optimal Bayesian networks using A* search
 In Proceedings of the 22nd International Joint Conference on Artificial Intelligence
, 2011
"... This paper formulates learning optimal Bayesian network as a shortest path finding problem. An A* search algorithm is introduced to solve the problem. With the guidance of a consistent heuristic, the algorithm learns an optimal Bayesian network by only searching the most promising parts of the solu ..."
Abstract

Cited by 16 (7 self)
 Add to MetaCart
This paper formulates learning optimal Bayesian network as a shortest path finding problem. An A* search algorithm is introduced to solve the problem. With the guidance of a consistent heuristic, the algorithm learns an optimal Bayesian network by only searching the most promising parts of the solution space. Empirical results show that the A* search algorithm significantly improves the time and space efficiency of existing methods on a set of benchmark datasets. 1
Learning Optimal Bayesian Networks: A Shortest Path Perspective
, 2013
"... In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving the shortest path problem, and the design of heuristic functions for guiding the search. This paper introduces several techniques for addressing the issues. One is an A * search algorithm that learns an optimal Bayesian network structure by only searching the most promising part of the solution space. The others are mainly two heuristic functions. The first heuristic function represents a simple relaxation of the acyclicity constraint of a Bayesian network. Although admissible and consistent, the heuristic may introduce too much relaxation and result in a loose bound. The second heuristic function reduces the amount of relaxation by avoiding directed cycles within some groups of variables. Empirical results show that these methods constitute a promising approach to learning optimal Bayesian network structures.
The “ideal parent” structure learning for continuous variable networks
 Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence
, 2004
"... In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hi ..."
Abstract

Cited by 15 (2 self)
 Add to MetaCart
In recent years, there is a growing interest in learning Bayesian networks with continuous variables. Learning the structure of such networks is a computationally expensive procedure, which limits most applications to parameter learning. This problem is even more acute when learning networks with hidden variables. We present a general method for significantly speeding the structure search algorithm for continuous variable networks with common parametric distributions. Importantly, our method facilitates the addition of new hidden variables into the network structure efficiently. We demonstrate the method on several data sets, both for learning structure on fully observable data, and for introducing new hidden variables during structure search. 1
Factorized Normalized Maximum Likelihood Criterion for Learning Bayesian Network Structures
, 2008
"... This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
This paper introduces a new scoring criterion, factorized normalized maximum likelihood, for learning Bayesian network structures. The proposed scoring criterion requires no parameter tuning, and it is decomposable and asymptotically consistent. We compare the new scoring criterion to other scoring criteria and describe its practical implementation. Empirical tests confirm its good performance.
Finding LowEntropy Sets and Trees from Binary Data ABSTRACT
"... The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the cooccurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards pote ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
The discovery of subsets with special properties from binary data has been one of the key themes in pattern discovery. Pattern classes such as frequent itemsets stress the cooccurrence of the value 1 in the data. While this choice makes sense in the context of sparse binary data, it disregards potentially interesting subsets of attributes that have some other type of dependency structure. Weconsidertheproblemoffindingallsubsetsofattributes that have low complexity. The complexity is measured by either the entropy of the projection of the data on the subset, or the entropy of the data for the subset when modeled using a Bayesian tree, with downward or upward pointing edges. We show that the entropy measure on sets has a monotonicity property, and thus a levelwise approach can find all lowentropy itemsets. We also show that the treebasedmeasuresareboundedabovebytheentropyofthecorresponding itemset, allowing similar algorithms to be used for finding lowentropy trees. We describe algorithms for finding all subsets satisfying an entropy condition. We give an extensive empirical evaluation of the performance of the methodsbothonsyntheticandonrealdata. Wealsodiscuss the search for highentropy subsets and the computation of the VapnikChervonenkis dimension of the data.
Learning Optimal Bounded Treewidth Bayesian Networks via Maximum Satisfiability
, 2014
"... Bayesian network structure learning is the wellknown computationally hard problem of finding a directed acyclic graph structure that optimally describes given data. A learned structure can then be used for probabilistic inference. While exact inference in Bayesian networks is in general NPhard, ..."
Abstract

Cited by 11 (5 self)
 Add to MetaCart
(Show Context)
Bayesian network structure learning is the wellknown computationally hard problem of finding a directed acyclic graph structure that optimally describes given data. A learned structure can then be used for probabilistic inference. While exact inference in Bayesian networks is in general NPhard, it is tractable in networks with low treewidth. This provides good motivations for developing algorithms for the NPhard problem of learning optimal bounded treewidth Bayesian networks (BTWBNSL). In this work, we develop a novel scorebased approach to BTWBNSL, based on casting BTWBNSL as weighted partial Maximum satisfiability. We demonstrate empirically that the approach scales notably better than a recent exact dynamic programming algorithm for BTWBNSL.