Results 1  10
of
640
Markov Logic Networks
 MACHINE LEARNING
, 2006
"... We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the ..."
Abstract

Cited by 811 (39 self)
 Add to MetaCart
(Show Context)
We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Large margin methods for structured and interdependent output variables
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary ..."
Abstract

Cited by 612 (12 self)
 Add to MetaCart
Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the wellknown notion of a separation margin and derive a corresponding maximummargin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.
Support vector machine learning for interdependent and structured output spaces
 In ICML
, 2004
"... Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernelbased methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs suchas multiple depe ..."
Abstract

Cited by 444 (20 self)
 Add to MetaCart
Learning general functional dependencies is one of the main goals in machine learning. Recent progress in kernelbased methods has focused on designing flexible and powerful input representations. This paper addresses the complementary issue of problems involving complex outputs suchas multiple dependent output variables and structured output spaces. We propose to generalize multiclass Support Vector Machine learning in a formulation that involves features extracted jointly from inputs and outputs. The resulting optimization problem is solved efficiently by a cutting plane algorithm that exploits the sparseness and structural decomposition of the problem. We demonstrate the versatility and effectiveness of our method on problems ranging from supervised grammar learning and namedentity recognition, to taxonomic text classification and sequence alignment. 1.
New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron
, 2002
"... This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 9 ..."
Abstract

Cited by 272 (6 self)
 Add to MetaCart
This paper introduces new learning algorithms for natural language processing based on the perceptron algorithm. We show how the algorithms can be efficiently applied to exponential sized representations of parse trees, such as the "all subtrees" (DOP) representation described by (Bod 98), or a representation tracking all subfragments of a tagged sentence. We give experimental results showing significant improvements on two tasks: parsing Wall Street Journal text, and namedentity extraction from web data.
SemiMarkov conditional random fields for information extraction
 In Advances in Neural Information Processing Systems 17
, 2004
"... We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements x ..."
Abstract

Cited by 248 (10 self)
 Add to MetaCart
(Show Context)
We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semiCRFs can measure properties of segments, and transitions within a segment can be nonMarkovian. In spite of this additional power, exact learning and inference algorithms for semiCRFs are polynomialtime—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semiCRFs generally outperform conventional CRFs. 1
Hidden Markov Support Vector Machines
, 2003
"... This paper presents a novel discriminative learning technique for label sequences based on a combination of the two most successful learning algorithms, Support Vector Machines and Hidden Markov Models which we call Hidden Markov Support Vector Machine. ..."
Abstract

Cited by 238 (9 self)
 Add to MetaCart
This paper presents a novel discriminative learning technique for label sequences based on a combination of the two most successful learning algorithms, Support Vector Machines and Hidden Markov Models which we call Hidden Markov Support Vector Machine.
Efficiently Inducing Features of Conditional Random Fields
, 2003
"... Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with ..."
Abstract

Cited by 227 (12 self)
 Add to MetaCart
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with
Learning structured prediction models: a large margin approach
, 2004
"... We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training ..."
Abstract

Cited by 225 (8 self)
 Add to MetaCart
(Show Context)
We consider large margin estimation in a broad range of prediction models where inference involves solving combinatorial optimization problems, for example, weighted graphcuts or matchings. Our goal is to learn parameters such that inference using the model reproduces correct answers on the training data. Our method relies on the expressive power of convex optimization problems to compactly capture inference or solution optimality in structured prediction models. Directly embedding this structure within the learning formulation produces concise convex problems for efficient estimation of very complex and diverse models. We describe experimental results on a matching task, disulfide connectivity prediction, showing significant improvements over stateoftheart methods. 1.
Conditional models of identity uncertainty with application to noun coreference
 In NIPS17, Lawrence K. Saul, Yair Weiss, and Léon Bottou, Eds
, 2005
"... Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditionalprobability models for coreference analysi ..."
Abstract

Cited by 145 (16 self)
 Add to MetaCart
(Show Context)
Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditionalprobability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational—they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies—paralleling the advantages of conditional random fields over hidden Markov models. We present positive results on noun phrase coreference in two standard text data sets. 1
Discriminative fields for modeling spatial dependencies in natural images
 In NIPS
, 2003
"... In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to ..."
Abstract

Cited by 142 (4 self)
 Add to MetaCart
(Show Context)
In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudolikelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph mincut algorithms. The performance of the model was verified on the synthetic as well as the realworld images. The DRF model outperforms the MRF model in the experiments. 1