Results 1  10
of
141
Factorie: Probabilistic programming via imperatively defined factor graphs
 In Advances in Neural Information Processing Systems 22
, 2009
"... Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to defin ..."
Abstract

Cited by 89 (16 self)
 Add to MetaCart
(Show Context)
Discriminatively trained undirected graphical models have had wide empirical success, and there has been increasing interest in toolkits that ease their application to complex relational data. The power in relational models is in their repeated structure and tied parameters; at issue is how to define these structures in a powerful and flexible way. Rather than using a declarative language, such as SQL or firstorder logic, we advocate using an imperative language to express various aspects of model structure, inference, and learning. By combining the traditional, declarative, statistical semantics of factor graphs with imperative definitions of their construction and operation, we allow the user to mix declarative and procedural domain knowledge, and also gain significant efficiencies. We have implemented such imperatively defined factor graphs in a system we call FACTORIE, a software library for an objectoriented, stronglytyped, functional language. In experimental comparisons to Markov Logic Networks on joint segmentation and coreference, we find our approach to be 315 times faster while reducing error by 2025%—achieving a new state of the art. 1
Bayesian Fundamentalism or Enlightenment? On the explanatory status and theoretical contributions of Bayesian models of cognition
 Behavioral and Brain Sciences
, 2011
"... To be published in Behavioral and Brain Sciences (in press) ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
To be published in Behavioral and Brain Sciences (in press)
Probabilistically accurate program transformations
 In SAS
, 2011
"... Abstract. The standard approach to program transformation involves the use of discrete logical reasoning to prove that the transformation does not change the observable semantics of the program. We propose a new approach that, in contrast, uses probabilistic reasoning to justify the application of t ..."
Abstract

Cited by 38 (14 self)
 Add to MetaCart
(Show Context)
Abstract. The standard approach to program transformation involves the use of discrete logical reasoning to prove that the transformation does not change the observable semantics of the program. We propose a new approach that, in contrast, uses probabilistic reasoning to justify the application of transformations that may change, within probabilistic accuracy bounds, the result that the program produces. Our new approach produces probabilistic guarantees of the form P(D  ≥ B) ≤ ɛ, ɛ ∈ (0, 1), where D is the difference between the results that the transformed and original programs produce, B is an acceptability bound on the absolute value of D, and ɛ is the maximum acceptable probability of observing large D. We show how to use our approach to justify the application of loop perforation (which transforms loops to execute fewer iterations) to a set of computational patterns. 1
Embedded probabilistic programming
 In Working conf. on domain specific lang
, 2009
"... Abstract. Two general techniques for implementing a domainspecific language (DSL) with less overhead are the finallytagless embedding of object programs and the directstyle representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countab ..."
Abstract

Cited by 34 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Two general techniques for implementing a domainspecific language (DSL) with less overhead are the finallytagless embedding of object programs and the directstyle representation of side effects. We use these techniques to build a DSL for probabilistic programming, for expressing countable probabilistic models and performing exact inference and importance sampling on them. Our language is embedded as an ordinary OCaml library and represents probability distributions as ordinary OCaml programs. We use delimited continuations to reify probabilistic programs as lazy search trees, which inference algorithms may traverse without imposing any interpretive overhead on deterministic parts of a model. We thus take advantage of the existing OCaml implementation to achieve competitive performance and ease of use. Inference algorithms can easily be embedded in probabilistic programs themselves.
A Stochastic Memoizer for Sequence Data
"... We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes ..."
Abstract

Cited by 25 (7 self)
 Add to MetaCart
(Show Context)
We propose an unboundeddepth, hierarchical, Bayesian nonparametric model for discrete sequence data. This model can be estimated from a single training sequence, yet shares statistical strength between subsequent symbol predictive distributions in such a way that predictive performance generalizes well. The model builds on a specific parameterization of an unboundeddepth hierarchical PitmanYor process. We introduce analytic marginalization steps (using coagulation operators) to reduce this model to one that can be represented in time and space linear in the length of the training sequence. We show how to perform inference in such a model without truncation approximation and introduce fragmentation operators necessary to do predictive inference. We demonstrate the sequence memoizer by using it as a language model, achieving stateoftheart results. 1.
Learning Programs: A Hierarchical Bayesian Approach
"... We are interested in learning programs for multiple related tasks given only a few training examples per task. Since the program for a single task is underdetermined by its data, we introduce a nonparametric hierarchical Bayesian prior over programs which shares statistical strength across multiple ..."
Abstract

Cited by 20 (1 self)
 Add to MetaCart
We are interested in learning programs for multiple related tasks given only a few training examples per task. Since the program for a single task is underdetermined by its data, we introduce a nonparametric hierarchical Bayesian prior over programs which shares statistical strength across multiple tasks. The key challenge is to parametrize this multitask sharing. For this, we introduce a new representation of programs based on combinatory logic and provide an MCMC algorithm that can perform safe program transformations on this representation to reveal shared interprogram substructures. 1.
Productivity and Reuse in Language
, 2011
"... We present a Bayesian model of the mirror image problems of linguistic productivity and reuse. The model, known as Fragment Grammar, is evaluated against several morphological datasets; its performance is compared to competing theoretical accounts including full–parsing, full–listing, and exemplar–b ..."
Abstract

Cited by 17 (4 self)
 Add to MetaCart
(Show Context)
We present a Bayesian model of the mirror image problems of linguistic productivity and reuse. The model, known as Fragment Grammar, is evaluated against several morphological datasets; its performance is compared to competing theoretical accounts including full–parsing, full–listing, and exemplar–based models. The model is able to learn the correct patterns of productivity and reuse for two very different systems: the English past tense which is characterized by a sharp dichotomy in productivity between regular and irregular forms and English derivational morphology which is characterized by a graded cline from very productive (ness) to very unproductive (th). Keywords:Productivity;Reuse;Storage;Computation; Bayesian Model;Past Tense;Derivational Morphology
Static Analysis for Probabilistic Programs: Inferring Whole Program Properties from Finitely Many Paths.
"... We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyberphysical systems. Correctness properties of such programs take the form of queries ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
We propose an approach for the static analysis of probabilistic programs that sense, manipulate, and control based on uncertain data. Examples include programs used in risk analysis, medical decision making and cyberphysical systems. Correctness properties of such programs take the form of queries that seek the probabilities of assertions over program variables. We present a static analysis approach that provides guaranteed interval bounds on the values (assertion probabilities) of such queries. First, we observe that for probabilistic programs, it is possible to conclude facts about the behavior of the entire program by choosing a finite, adequate set of its paths. We provide strategies for choosing such a set of paths and verifying its adequacy. The queries are evaluated over each path by a combination of symbolic execution and probabilistic volumebound computations. Each path yields interval bounds that can be summed up with a “coverage ” bound to yield an interval that encloses the probability of assertion for the program as a whole. We demonstrate promising results on a suite of benchmarks from many different sources including robotic manipulators and medical decision making programs.
Measure Transformer Semantics for Bayesian Machine Learning
"... Abstract. The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expres ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
Abstract. The Bayesian approach to machine learning amounts to inferring posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define combinators for measure transformers, based on theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zeroprobability events. We compile our core language to a small imperative language that has a straightforward semantics via factor graphs, data structures that enable many efficient inference algorithms. We use an existing inference engine for efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models. 1
Samplerank: Learning preference from atomic gradients
 In NIPS WS on Advances in Ranking
, 2009
"... Large templated factor graphs with complex structure that changes during inference have been shown to provide stateoftheart experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradi ..."
Abstract

Cited by 16 (5 self)
 Add to MetaCart
(Show Context)
Large templated factor graphs with complex structure that changes during inference have been shown to provide stateoftheart experimental results on tasks such as identity uncertainty and information integration. However, learning parameters in these models is difficult because computing the gradients require expensive inference routines. In this paper we propose an online algorithm that instead learns preferences over hypotheses from the gradients between the atomic steps of inference. Although there are a combinatorial number of ranking constraints over the entire hypothesis space, a connection to the frameworks of sampled convex programs reveals a polynomial bound on the number of rankings that need to be satisfied in practice. We further apply ideas of passive aggressive algorithms to our update rules, enabling us to extend recent work in confidenceweighted classification to structured prediction problems. We compare our algorithm to structured perceptron, contrastive divergence, and persistent contrastive divergence, demonstrating substantial error reductions on two realworld problems (20 % over contrastive divergence).