Results 1  10
of
29
Learning Bayesian Network Structure using LP Relaxations
"... We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty aris ..."
Abstract

Cited by 58 (2 self)
 Add to MetaCart
We propose to solve the combinatorial problem of finding the highest scoring Bayesian network structure from data. This structure learning problem can be viewed as an inference problem where the variables specify the choice of parents for each node in the graph. The key combinatorial difficulty arises from the global constraint that the graph structure has to be acyclic. We cast the structure learning problem as a linear program over the polytope defined by valid acyclic structures. In relaxing this problem, we maintain an outer bound approximation to the polytope and iteratively tighten it by searching over a new class of valid constraints. If an integral solution is found, it is guaranteed to be the optimal Bayesian network. When the relaxation is not tight, the fast dual algorithms we develop remain useful in combination with a branch and bound method. Empirical results suggest that the method is competitive or faster than alternative exact methods based on dynamic programming. 1
Improving the scalability of optimal Bayesian network learning with externalmemory frontier breadthfirst branch and bound search
 IN PROCEEDINGS OF THE 27TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE
"... Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search ..."
Abstract

Cited by 17 (9 self)
 Add to MetaCart
Previous work has shown that the problem of learning the optimal structure of a Bayesian network can be formulated as a shortest path finding problem in a graph and solved using A* search. In this paper, we improve the scalability of this approach by developing a memoryefficient heuristic search algorithm for learning the structure of a Bayesian network. Instead of using A*, we propose a frontier breadthfirst branch and bound search that leverages the layered structure of the search graph of this problem so that no more than two layers of the graph, plus solution reconstruction information, need to be stored in memory at a time. To further improve scalability, the algorithm stores most of the graph in external memory, such as hard disk, when it does not fit in RAM. Experimental results show that the resulting algorithm solves significantly larger problems than the current state of the art.
Learning Optimal Bayesian Networks: A Shortest Path Perspective
, 2013
"... In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving ..."
Abstract

Cited by 15 (5 self)
 Add to MetaCart
(Show Context)
In this paper, learning a Bayesian network structure that optimizes a scoring function for a given dataset is viewed as a shortest path problem in an implicit statespace search graph. This perspective highlights the importance of two research issues: the development of search strategies for solving the shortest path problem, and the design of heuristic functions for guiding the search. This paper introduces several techniques for addressing the issues. One is an A * search algorithm that learns an optimal Bayesian network structure by only searching the most promising part of the solution space. The others are mainly two heuristic functions. The first heuristic function represents a simple relaxation of the acyclicity constraint of a Bayesian network. Although admissible and consistent, the heuristic may introduce too much relaxation and result in a loose bound. The second heuristic function reduces the amount of relaxation by avoiding directed cycles within some groups of variables. Empirical results show that these methods constitute a promising approach to learning optimal Bayesian network structures.
An improved admissible heuristic for learning optimal Bayesian networks
 IN PROCEEDINGS OF THE 28TH CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI12
, 2012
"... Recently two search algorithms, A* and breadthfirst branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Recently two search algorithms, A* and breadthfirst branch and bound (BFBnB), were developed based on a simple admissible heuristic for learning Bayesian network structures that optimize a scoring function. The heuristic represents a relaxation of the learning problem such that each variable chooses optimal parents independently. As a result, the heuristic may contain many directed cycles and result in a loose bound. This paper introduces an improved admissible heuristic that tries to avoid directed cycles within small groups of variables. A sparse representation is also introduced to store only the unique optimal parent choices. Empirical results show that the new techniques significantly improved the efficiency and scalability of A* and BFBnB on most of datasets tested in this paper.
Properties of bayesian dirichlet scores to learn bayesian network structures
 In TwentyFourth AAAI Conference on Aritificial Intelligence (AAAI10
, 2010
"... This paper addresses exact learning of Bayesian network structure from data based on the Bayesian Dirichlet score function and its derivations. We describe useful properties that strongly reduce the computational costs of many known methods without losing global optimality guarantees. We show empiri ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
This paper addresses exact learning of Bayesian network structure from data based on the Bayesian Dirichlet score function and its derivations. We describe useful properties that strongly reduce the computational costs of many known methods without losing global optimality guarantees. We show empirically the advantages of the properties in terms of time and memory consumptions, demonstrating that stateoftheart methods, with the use of such properties, might handle larger data sets than those currently possible.
Learning Bounded Treewidth Bayesian Networks using Integer Linear Programming
, 2014
"... In many applications one wants to compute conditional probabilities given a Bayesian network. This inference problem is NPhard in general but becomes tractable when the network has low treewidth. Since the inference problem is common in many application areas, we provide a practical algorithm for ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
In many applications one wants to compute conditional probabilities given a Bayesian network. This inference problem is NPhard in general but becomes tractable when the network has low treewidth. Since the inference problem is common in many application areas, we provide a practical algorithm for learning bounded treewidth Bayesian networks. We cast this problem as an integer linear program (ILP). The program can be solved by an anytime algorithm which provides upper bounds to assess the quality of the found solutions. A key component of our program is a novel integer linear formulation for bounding treewidth of a graph. Our tests clearly indicate that our approach works in practice, as our implementation was able to find an optimal or nearly optimal network for most of the data sets.
A space–time tradeoff for permutation problems
 In Proceedings of the ACMSIAM Symposium on Discrete Algorithms (SODA
, 2010
"... Many combinatorial problems—such as the traveling salesman, feedback arcset, cutwidth, and treewidth problem— can be formulated as finding a feasible permutation of n elements. Typically, such problems can be solved by dynamic programming in time and space O ∗ (2 n), by divide and conquer in time O ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Many combinatorial problems—such as the traveling salesman, feedback arcset, cutwidth, and treewidth problem— can be formulated as finding a feasible permutation of n elements. Typically, such problems can be solved by dynamic programming in time and space O ∗ (2 n), by divide and conquer in time O ∗ (4 n) and polynomial space, or by a combination of the two in time O ∗ (4 n 2 −s) and space O ∗ (2 s) for s = n, n/2, n/4,.... Here, we show that one can improve the tradeoff to time O ∗ (T n) and space O ∗ (S n) with T S < 4 at any √ 2 < S < 2. The idea is to find a small family of “thin ” partial orders on the n elements such that every linear order is an extension of one member of the family. Our construction is optimal within a natural class of partial order families. 1
Algorithms and Complexity Results for Exact Bayesian Structure Learning
"... Bayesian structure learning is the NPhard problem of discovering a Bayesian network that optimally represents a given set of training data. In this paper we study the computational worstcase complexity of exact Bayesian structure learning under graph theoretic restrictions on the superstructure. ..."
Abstract

Cited by 8 (3 self)
 Add to MetaCart
(Show Context)
Bayesian structure learning is the NPhard problem of discovering a Bayesian network that optimally represents a given set of training data. In this paper we study the computational worstcase complexity of exact Bayesian structure learning under graph theoretic restrictions on the superstructure. The superstructure (a concept introduced by Perrier, Imoto, and Miyano, JMLR 2008) is an undirected graph that contains as subgraphs the skeletons of solution networks. Our results apply to several variants of scorebased Bayesian structure learning where the score of a network decomposes into local scores of its nodes. Results: We show that exact Bayesian structure learning can be carried out in nonuniform polynomial time if the superstructure has bounded treewidth and in linear time if in addition the superstructure has bounded maximum degree. We complement this with a number of hardness results. We show that both restrictions (treewidth and degree) are essential and cannot be dropped without loosing uniform polynomial time tractability (subject to a complexitytheoretic assumption). Furthermore, we show that the restrictions remain essential if we do not search for a globally optimal network but we aim to improve a given network by means of at most k arc additions, arc deletions, or arc reversals (kneighborhood local search).
Bayesian discovery of multiple Bayesian networks via transfer learning
 In IEEE International Conference on Data Mining
, 2013
"... Abstract—Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data is difficult, therefore transfer learnin ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Bayesian network structure learning algorithms with limited data are being used in domains such as systems biology and neuroscience to gain insight into the underlying processes that produce observed data. Learning reliable networks from limited data is difficult, therefore transfer learning can improve the robustness of learned networks by leveraging data from related tasks. Existing transfer learning algorithms for Bayesian network structure learning give a single maximum a posteriori estimate of network models. Yet, many other models may be equally likely, and so a more informative result is provided by Bayesian structure discovery. Bayesian structure discovery algorithms estimate posterior probabilities of structural features, such as edges. We present transfer learning for Bayesian structure discovery which allows us to explore the shared and unique structural features among related tasks. Efficient computation requires that our transfer learning objective factors into local calculations, which we prove is given by a broad class of transfer biases. Theoretically, we show the efficiency of our approach. Empirically, we show that compared to single task learning, transfer learning is better able to positively identify true edges. We apply the method to wholebrain neuroimaging data. I.
Finding Optimal Bayesian Networks Using Precedence Constraints
, 2013
"... We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2 n, to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P, which can be significantly less than 2 n. Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders.