Results 1  10
of
413
Markov Logic Networks
 MACHINE LEARNING
, 2006
"... We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the ..."
Abstract

Cited by 816 (39 self)
 Add to MetaCart
(Show Context)
We propose a simple approach to combining firstorder logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a firstorder knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a firstorder formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudolikelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a realworld database and knowledge base in a university domain illustrate the promise of this approach.
Maxmargin Markov networks
, 2003
"... In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ..."
Abstract

Cited by 604 (15 self)
 Add to MetaCart
In typical classification tasks, we seek a function which assigns a label to a single object. Kernelbased approaches, such as support vector machines (SVMs), which maximize the margin of confidence of the classifier, are the method of choice for many such tasks. Their popularity stems both from the ability to use highdimensional feature spaces, and from their strong theoretical guarantees. However, many realworld tasks involve sequential, spatial, or structured data, where multiple labels must be assigned. Existing kernelbased methods ignore structure in the problem, assigning labels independently to each object, losing much useful information. Conversely, probabilistic graphical models, such as Markov networks, can represent correlations between labels, by exploiting problem structure, but cannot handle highdimensional feature spaces, and lack strong theoretical generalization guarantees. In this paper, we present a new framework that combines the advantages of both approaches: Maximum margin Markov (M 3) networks incorporate both kernels, which efficiently deal with highdimensional features, and the ability to capture correlations in structured data. We present an efficient algorithm for learning M 3 networks based on a compact quadratic program formulation. We provide a new theoretical bound for generalization in structured domains. Experiments on the task of handwritten character recognition and collective hypertext classification demonstrate very significant gains over previous approaches. 1
Shallow Parsing with Conditional Random Fields
, 2003
"... Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluati ..."
Abstract

Cited by 581 (8 self)
 Add to MetaCart
Conditional random fields for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Among sequence labeling tasks in language processing, shallow parsing has received much attention, with the development of standard evaluation datasets and extensive comparison among methods. We show here how to train a conditional random field to achieve performance as good as any reported base nounphrase chunking method on the CoNLL task, and better than any reported single model. Improved training methods based on modern optimization algorithms were critical in achieving these results. We present extensive comparisons between models and training methods that confirm and strengthen previous results on shallow parsing and training methods for maximumentropy models.
Learning and inferring transportation routines
, 2004
"... This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation ..."
Abstract

Cited by 312 (22 self)
 Add to MetaCart
This paper introduces a hierarchical Markov model that can learn and infer a user’s daily movements through the community. The model uses multiple levels of abstraction in order to bridge the gap between raw GPS sensor measurements and high level information such as a user’s mode of transportation or her goal. We apply RaoBlackwellised particle filters for efficient inference both at the low level and at the higher levels of the hierarchy. Significant locations such as goals or locations where the user frequently changes mode of transportation are learned from GPS data logs without requiring any manual labeling. We show how to detect abnormal behaviors (e.g. taking a wrong bus) by concurrently tracking his activities with a trained and a prior model. Experiments show that our model is able to accurately predict the goals of a person and to recognize situations in which the user performs unknown activities.
SemiMarkov conditional random fields for information extraction
 In Advances in Neural Information Processing Systems 17
, 2004
"... We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements x ..."
Abstract

Cited by 254 (10 self)
 Add to MetaCart
(Show Context)
We describe semiMarkov conditional random fields (semiCRFs), a conditionally trained version of semiMarkov chains. Intuitively, a semiCRF on an input sequence x outputs a “segmentation ” of x, in which labels are assigned to segments (i.e., subsequences) of x rather than to individual elements xi of x. Importantly, features for semiCRFs can measure properties of segments, and transitions within a segment can be nonMarkovian. In spite of this additional power, exact learning and inference algorithms for semiCRFs are polynomialtime—often only a small constant factor slower than conventional CRFs. In experiments on five named entity recognition problems, semiCRFs generally outperform conventional CRFs. 1
Efficiently Inducing Features of Conditional Random Fields
, 2003
"... Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with ..."
Abstract

Cited by 233 (12 self)
 Add to MetaCart
Conditional Random Fields (CRFs) are undirected graphical models, a special case of which correspond to conditionallytrained finite state machines. A key advantage of CRFs is their great flexibility to include a wide variety of arbitrary, nonindependent features of the input. Faced with
Classification in Networked Data: A toolkit and a univariate case study
, 2006
"... This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a casestudy of its application to networked data used in prior machine learning resear ..."
Abstract

Cited by 200 (10 self)
 Add to MetaCart
This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a casestudy of its application to networked data used in prior machine learning research. NetKit is based on a nodecentric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing nodecentric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of classlinkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple networkclassification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—i.e., Gaussianfield classifiers, Hopfield networks, and relationalneighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.
Blog: Probabilistic models with unknown objects
 In IJCAI
, 2005
"... This paper introduces and illustrates BLOG, a formal language for defining probability models over worlds with unknown objects and identity uncertainty. BLOG unifies and extends several existing approaches. Subject to certain acyclicity constraints, every BLOG model specifies a unique probability di ..."
Abstract

Cited by 185 (10 self)
 Add to MetaCart
(Show Context)
This paper introduces and illustrates BLOG, a formal language for defining probability models over worlds with unknown objects and identity uncertainty. BLOG unifies and extends several existing approaches. Subject to certain acyclicity constraints, every BLOG model specifies a unique probability distribution over firstorder model structures that can contain varying and unbounded numbers of objects. Furthermore, complete inference algorithms exist for a large fragment of the language. We also introduce a probabilistic form of Skolemization for handling evidence. 1
Collective classification in network data
, 2008
"... Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification te ..."
Abstract

Cited by 178 (32 self)
 Add to MetaCart
(Show Context)
Numerous realworld applications produce networked data such as web data (hypertext documents connected via hyperlinks) and communication networks (people connected via communication links). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such data. In this report, we attempt to provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and realworld data.
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
 IN ICML
, 2004
"... In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain cond ..."
Abstract

Cited by 171 (13 self)
 Add to MetaCart
(Show Context)
In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when longrange dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linearchain conditional random fields (CRFs) in which each time slice contains a set of state variables and edgesa distributed state representation as in dynamic Bayesian networks (DBNs)and parameters are tied across slices. Since exact