Results 1  10
of
226
Unsupervised NamedEntity Extraction from the Web: An Experimental Study
 ARTIFICIAL INTELLIGENCE
, 2005
"... The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design princip ..."
Abstract

Cited by 364 (39 self)
 Add to MetaCart
(Show Context)
The KNOWITALL system aims to automate the tedious process of extracting large collections of facts (e.g., names of scientists or politicians) from the Web in an unsupervised, domainindependent, and scalable manner. The paper presents an overview of KNOWITALL’s novel architecture and design principles, emphasizing its distinctive ability to extract information without any handlabeled training examples. In its first major run, KNOWITALL extracted over 50,000 facts, but suggested a challenge: How can we improve KNOWITALL’s recall and extraction rate without sacrificing precision? This paper presents three distinct ways to address this challenge and evaluates their performance. Pattern Learning learns domainspecific extraction rules, which enable additional extractions. Subclass Extraction automatically identifies subclasses in order to boost recall. List Extraction locates lists of class instances, learns a “wrapper ” for each list, and extracts elements of each list. Since each method bootstraps from KNOWITALL’s domainindependent methods, the methods also obviate handlabeled training examples. The paper reports on experiments, focused on namedentity extraction, that measure the relative efficacy of each method and demonstrate their synergy. In concert, our methods gave KNOWITALL a 4fold to 8fold increase in recall, while maintaining high precision, and discovered over 10,000 cities missing from the Tipster Gazetteer.
Efficient structure learning of Markov networks using L1regularization
 In NIPS
, 2006
"... Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to ..."
Abstract

Cited by 146 (3 self)
 Add to MetaCart
(Show Context)
Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally effective method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the loglinear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: modeling the joint distribution of pixel values in the MNIST data, and modeling the joint distribution of genetic sequence variations in the human HapMap data. We show that our L1based method achieves considerably higher generalization performance than the more standard L2based method (a Gaussian parameter prior) or pure maximumlikelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem. 1
Learning to combine bottomup and topdown segmentation
 in: European Conference on Computer Vision
"... Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations t ..."
Abstract

Cited by 131 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Bottomup segmentation based only on lowlevel cues is a notoriously difficult problem. This difficulty has lead to recent topdown segmentation algorithms that are based on classspecific image information. Despite the success of topdown algorithms, they often give coarse segmentations that can be significantly refined using lowlevel cues. This raises the question of how to combine both topdown and bottomup cues in a principled manner. In this paper we approach this problem using supervised learning. Given a training set of ground truth segmentations we train a fragmentbased segmentation algorithm which takes into account both bottomup and topdown cues simultaneously, in contrast to most existing algorithms which train topdown and bottomup modules separately. We formulate the problem in the framework of Conditional Random Fields (CRF) and derive a feature induction algorithm for CRF, which allows us to efficiently search over thousands of candidate fragments. Whereas pure topdown algorithms often require hundreds of fragments, our simultaneous learning procedure yields algorithms with a handful of fragments that are combined with lowlevel cues to efficiently compute high quality segmentations. 1
BANNER: An executable survey of advances in biomedical named entity recognition
 In Pac Symp Biocomput
, 2008
"... There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freelyavailable, open source system implementing the advances ..."
Abstract

Cited by 123 (10 self)
 Add to MetaCart
(Show Context)
There has been an increasing amount of research on biomedical named entity recognition, the most basic text extraction problem, resulting in significant progress by different research teams around the world. This has created a need for a freelyavailable, open source system implementing the advances described in the literature. In this paper we present BANNER, an opensource, executable survey of advances in biomedical named entity recognition, intended to serve as a benchmark for the field. BANNER is implemented in Java as a machinelearning system based on conditional random fields and includes a wide survey of the best techniques recently described in the literature. It is designed to maximize domain independence by not employing brittle semantic features or rulebased processing steps, and achieves significantly better performance than existing baseline systems. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text. BANNER is available for download at
Collective multilabel classification
 In Proc. of CIKM
, 2005
"... This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion ..."
Abstract

Cited by 122 (3 self)
 Add to MetaCart
(Show Context)
This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion
Extracting places and activities from gps traces using hierarchical conditional random fields
 International Journal of Robotics Research
, 2007
"... Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent mod ..."
Abstract

Cited by 119 (3 self)
 Add to MetaCart
Learning patterns of human behavior from sensor data is extremely important for highlevel activity inference. We show how to extract a person’s activities and significant places from traces of GPS data. Our system uses hierarchically structured conditional random fields to generate a consistent model of a person’s activities and places. In contrast to existing techniques, our approach takes highlevel context into account in order to detect the significant places of a person. Our experiments show significant improvements over existing techniques. Furthermore, they indicate that our system is able to robustly estimate a person’s activities using a model that is trained from data collected by other persons. 1
Learning the structure of Markov logic networks
 In Proceedings of the 22nd International Conference on Machine Learning
, 2005
"... Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive l ..."
Abstract

Cited by 114 (20 self)
 Add to MetaCart
Markov logic networks (MLNs) combine logic and probability by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. In this paper we develop an algorithm for learning the structure of MLNs from relational databases, combining ideas from inductive logic programming (ILP) and feature induction in Markov networks. The algorithm performs a beam or shortestfirst search of the space of clauses, guided by a weighted pseudolikelihood measure. This requires computing the optimal weights for each candidate structure, but we show how this can be done efficiently. The algorithm can be used to learn an MLN from scratch, or to refine an existing knowledge base. We have applied it in two realworld domains, and found that it outperforms using offtheshelf ILP systems to learn the MLN structure, as well as pure ILP, purely probabilistic and purely knowledgebased approaches. 1.
Relational dependency networks
 Journal of Machine Learning Research
, 2007
"... Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most re ..."
Abstract

Cited by 112 (24 self)
 Add to MetaCart
Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational datasets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of realworld datasets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions.
Discriminative training of markov logic networks
 In Proc. of the Natl. Conf. on Artificial Intelligence
, 2005
"... Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be lear ..."
Abstract

Cited by 108 (19 self)
 Add to MetaCart
Many machine learning applications require a combination of probability and firstorder logic. Markov logic networks (MLNs) accomplish this by attaching weights to firstorder clauses, and viewing these as templates for features of Markov networks. Model parameters (i.e., clause weights) can be learned by maximizing the likelihood of a relational database, but this can be quite costly and lead to suboptimal results for any given prediction task. In this paper we propose a discriminative approach to training MLNs, one which optimizes the conditional likelihood of the query predicates given the evidence ones, rather than the joint likelihood of all predicates. We extend Collins’s (2002) voted perceptron algorithm for HMMs to MLNs by replacing the Viterbi algorithm with a weighted satisfiability solver. Experiments on entity resolution and link prediction tasks show the advantages of this approach compared to generative MLN training, as well as compared to purely probabilistic and purely logical approaches.