Results 1 - 10
of
29
Integer linear programming inference for conditional random fields
- In Proc. of the International Conference on Machine Learning (ICML
, 2005
"... Inference in Conditional Random Fields and Hidden Markov Models is done using the Viterbi algorithm, an efficient dynamic programming algorithm. In many cases, general (non-local and non-sequential) constraints may exist over the output sequence, but cannot be incorporated and exploited in a natural ..."
Abstract
-
Cited by 57 (10 self)
- Add to MetaCart
Inference in Conditional Random Fields and Hidden Markov Models is done using the Viterbi algorithm, an efficient dynamic programming algorithm. In many cases, general (non-local and non-sequential) constraints may exist over the output sequence, but cannot be incorporated and exploited in a natural way by this inference procedure. This paper proposes a novel inference procedure based on integer linear programming (ILP) and extends CRF models to naturally and efficiently support general constraint structures. For sequential constraints, this procedure reduces to simple linear programming as the inference process. Experimental evidence is supplied in the context of an important NLP problem, semantic role labeling. 1.
Deriving a Large Scale Taxonomy from Wikipedia
, 2007
"... We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e ..."
Abstract
-
Cited by 36 (3 self)
- Add to MetaCart
We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets.
Guiding semi-supervision with constraint-driven learning
- In Proc. of the Annual Meeting of the ACL
, 2007
"... Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information ..."
Abstract
-
Cited by 32 (8 self)
- Add to MetaCart
Over the last few years, two of the main research directions in machine learning of natural language processing have been the study of semi-supervised learning algorithms as a way to train classifiers when the labeled data is scarce, and the study of ways to exploit knowledge and global information in structured learning tasks. In this paper, we suggest a method for incorporating domain knowledge in semi-supervised learning algorithms. Our novel framework unifies and can exploit several kinds of task specific constraints. The experimental results presented in the information extraction domain demonstrate that applying constraints helps the model to generate better feedback during learning, and hence the framework allows for high performance learning with significantly less training data than was possible before on these tasks. 1
Structured learning with approximate inference
- Advances in Neural Information Processing Systems
"... In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning ..."
Abstract
-
Cited by 31 (1 self)
- Add to MetaCart
In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1
Piecewise pseudolikelihood for efficient CRF training
- In International Conference on Machine Learning (ICML
, 2007
"... Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accur ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Discriminative training of graphical models can be expensive if the variables have large cardinality, even if the graphical structure is tractable. In such cases, pseudolikelihood is an attractive alternative, because its running time is linear in the variable cardinality, but on some data its accuracy can be poor. Piecewise training (Sutton & McCallum, 2005) can have better accuracy but does not scale as well in the variable cardinality. In this paper, we introduce piecewise pseudolikelihood, which retains the computational efficiency of pseudolikelihood but can have much better accuracy. On several benchmark NLP data sets, piecewise pseudolikelihood has better accuracy than standard pseudolikelihood, and in many cases nearly equivalent to maximum likelihood, with five to ten times less training time than batch CRF training. 1.
Knowing what to believe (when you already know something
- In Proceedings of the 23rd International Conference on Computational Linguistics
, 2010
"... Although much work in NLP has focused on simply determining what a document means, we also must know whether or not to believe it. Fact-finding algorithms attempt to identify the “truth ” among competing claims in a corpus, but fail to take advantage of the user’s prior knowledge and presume that tr ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
Although much work in NLP has focused on simply determining what a document means, we also must know whether or not to believe it. Fact-finding algorithms attempt to identify the “truth ” among competing claims in a corpus, but fail to take advantage of the user’s prior knowledge and presume that truth itself is universal and objective rather than subjective. We introduce a framework for incorporating prior knowledge into any factfinding algorithm, expressing both general “common-sense ” reasoning and specific facts already known to the user as first-order logic and translating this into a tractable linear program. As our results show, this approach scales well to even large problems, both reducing error and allowing the system to determine truth respective to the user rather than the majority. Additionally, we introduce three new fact-finding algorithms capable of outperforming existing fact-finders in many of our experiments. 1
Piecewise Training for Structured Prediction
- MACHINE LEARNING
"... A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pi ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
A drawback of structured prediction methods is that parameter estimation requires repeated inference, which is intractable for general structures. In this paper, we present an approximate training algorithm called piecewise training that divides the factors into tractable subgraphs, which we call pieces, that are trained independently. Piecewise training can be interpreted as approximating the exact likelihood using belief propagation, and different ways of making this interpretation yield different insights into the method. We also present an extension to piecewise training, called piecewise pseudolikelihood, designed for when variables have large cardinality. On several real-world NLP data sets, piecewise training performs superior to Besag’s pseudolikelihood and sometimes comparably to exact maximum likelihood. In addition, PWPL performs similarly to piecewise and superior to standard pseudolikelihood, but is five to ten times more computationally efficient than batch maximum likelihood training.
2006, ‘Statistical LTAG Parsing
"... First and foremost, I would like to thank my advisor Aravind Joshi for his continuous support and guidance in both academic and daily life ever since my first day at Penn. Many thanks to my dissertation committee members, Mark Johnson, Mitch Marcus, Martha Palmer and Fernando Pereira. Their valuable ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
First and foremost, I would like to thank my advisor Aravind Joshi for his continuous support and guidance in both academic and daily life ever since my first day at Penn. Many thanks to my dissertation committee members, Mark Johnson, Mitch Marcus, Martha Palmer and Fernando Pereira. Their valuable advice and suggestions help me to sharpen the focus of this research. I am grateful to Anoop Sarkar and Fei Xia. During my first two years at Penn, I learned a lot of NLP from them. I appreciate enlightening discussions with NLP people at Penn.
LREC’10 Learning Based Java for Rapid Development of NLP Systems
"... Today’s natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components o ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Today’s natural language processing systems are growing more complex with the need to incorporate a wider range of language resources and more sophisticated statistical methods. In many cases, it is necessary to learn a component with input that includes the predictions of other learned components or to assign simultaneously the values that would be assigned by multiple components with an expressive, data dependent structure among them. As a result, the design of systems with multiple learning components is inevitably quite technically complex, and implementations of conceptually simple NLP systems can be time consuming and prone to error. Our new modeling language, Learning Based Java (LBJ), facilitates the rapid development of systems that learn and perform inference. LBJ has already been used to build state of the art NLP systems. This paper details recent advancements in the language which generalize its computational model, making a wider class of algorithms available. 1.
Punctuation: Making a Point in Unsupervised Dependency Parsing
"... We show how punctuation can be used to improve unsupervised dependency parsing. Our linguistic analysis confirms the strong connection between English punctuation and phrase boundaries in the Penn Treebank. However, approaches that naively include punctuation marks in the grammar (as if they were wo ..."
Abstract
-
Cited by 5 (4 self)
- Add to MetaCart
We show how punctuation can be used to improve unsupervised dependency parsing. Our linguistic analysis confirms the strong connection between English punctuation and phrase boundaries in the Penn Treebank. However, approaches that naively include punctuation marks in the grammar (as if they were words) do not perform well with Klein and Manning’s Dependency Model with Valence (DMV). Instead, we split a sentence at punctuation and impose parsing restrictions over its fragments. Our grammar inducer is trained on the Wall Street Journal (WSJ) and achieves 59.5 % accuracy out-of-domain (Brown sentences with 100 or fewer words), more than 6 % higher than the previous best results. Further evaluation, using the 2006/7 CoNLL sets, reveals that punctuation aids grammar induction in 17 of 18 languages, for an overall average net gain of 1.3%. Some of this improvement is from training, but more than half is from parsing with induced constraints, in inference. Punctuation-aware decoding works with existing (even already-trained) parsing models and always increased accuracy in our experiments. 1

