• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Polyhedral outer approximations with application to natural language parsing (2009)

by A Martins, N Smith, E Xing
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 10

Summarization with a Joint Model for Sentence Extraction and Compression

by André F. T. Martins, Noah A. Smith
"... Text summarization is one of the oldest problems in natural language processing. Popular approaches rely on extracting relevant sentences from the original documents. As a side effect, sentences that are too long but partly relevant are doomed to either not appear in the final summary, or prevent in ..."
Abstract - Cited by 9 (0 self) - Add to MetaCart
Text summarization is one of the oldest problems in natural language processing. Popular approaches rely on extracting relevant sentences from the original documents. As a side effect, sentences that are too long but partly relevant are doomed to either not appear in the final summary, or prevent inclusion of other relevant sentences. Sentence compression is a recent framework that aims to select the shortest subsequence of words that yields an informative and grammatical sentence. This work proposes a one-step approach for document summarization that jointly performs sentence extraction and compression by solving an integer linear program. We report favorable experimental results on newswire data.

Learning Efficiently with Approximate Inference via Dual Losses

by Ofer Meshi, David Sontag, Tommi Jaakkola, Amir Globerson
"... Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeat ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane. 1.

Structured Output Learning with Indirect Supervision

by Ming-wei Chang, Vivek Srikumar, Dan Goldwasser, Dan Roth - PROC. 27TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING , 2010
"... We present a novel approach for structure prediction that addresses the difficulty of obtaining labeled structures for training. We observe that structured output problems often have a companion learning problem of determining whether a given input possesses a good structure. For example, the compan ..."
Abstract - Cited by 8 (0 self) - Add to MetaCart
We present a novel approach for structure prediction that addresses the difficulty of obtaining labeled structures for training. We observe that structured output problems often have a companion learning problem of determining whether a given input possesses a good structure. For example, the companion problem for the part-ofspeech (POS) tagging task asks whether a given sequence of words has a corresponding sequence of POS tags that is “legitimate”. While obtaining direct supervision for structures is difficult and expensive, it is often very easy to obtain indirect supervision from the companion binary decision problem. In this paper, we develop a large margin framework that jointly learns from both direct and indirect forms of supervision. Our experiments exhibit the significant contribution of the easy-toget indirect binary supervision on three important NLP structure learning problems. 1.

More data means less inference: A pseudo-max approach to structured learning

by David Sontag, Ofer Meshi, Tommi Jaakkola, Amir Globerson
"... The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
The problem of learning to predict structured labels is of key importance in many applications. However, for general graph structure both learning and inference are intractable. Here we show that it is possible to circumvent this difficulty when the distribution of training examples is rich enough, via a method similar in spirit to pseudo-likelihood. We show that our new method achieves consistency, and illustrate empirically that it indeed approaches the performance of exact methods when sufficiently large training sets are used. Many prediction problems in machine learning applications are structured prediction tasks. For example, in protein folding we are given a protein sequence and the goal is to predict the protein’s native structure [14]. In parsing for natural language processing, we are given a sentence and the goal is to predict the most likely parse tree [2]. In these and many other applications, we can formalize the structured prediction problem as taking an input x (e.g., primary sequence, sentence) and predicting y (e.g., structure, parse) according to y = arg maxˆy∈Y θ · φ(x, ˆy), where φ(x, y) is a function that maps any input and a candidate assignment to a feature vector, Y denotes the space of all possible

Concise Integer Linear Programming Formulations for Dependency Parsing

by unknown authors
"... We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraint ..."
Abstract - Add to MetaCart
We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data. In particular, our model is able to learn correlations among neighboring arcs (siblings and grandparents), word valency, and tendencies toward nearlyprojective parses. The model parameters are learned in a max-margin framework by employing a linear programming relaxation. We evaluate the performance of our parser on data in several natural languages, achieving improvements over existing state-of-the-art methods. 1

Concise Integer Linear Programming Formulations for Dependency Parsing

by unknown authors
"... We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraint ..."
Abstract - Add to MetaCart
We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data. In particular, our model is able to learn correlations among neighboring arcs (siblings and grandparents), word valency, and tendencies toward nearlyprojective parses. The model parameters are learned in a max-margin framework by employing a linear programming relaxation. We evaluate the performance of our parser on data in several natural languages, achieving improvements over existing state-of-the-art methods. 1

Concise Integer Linear Programming Formulations for Dependency Parsing

by unknown authors
"... We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraint ..."
Abstract - Add to MetaCart
We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data. In particular, our model is able to learn correlations among neighboring arcs (siblings and grandparents), word valency, and tendencies toward nearlyprojective parses. The model parameters are learned in a max-margin framework by employing a linear programming relaxation. We evaluate the performance of our parser on data in several natural languages, achieving improvements over existing state-of-the-art methods. 1

Concise Integer Linear Programming Formulations for Dependency Parsing

by unknown authors
"... We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraint ..."
Abstract - Add to MetaCart
We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an efficient manner; not only is it compatible with prior knowledge encoded as hard constraints, it can also learn soft constraints from data. In particular, our model is able to learn correlations among neighboring arcs (siblings and grandparents), word valency, and tendencies toward nearlyprojective parses. The model parameters are learned in a max-margin framework by employing a linear programming relaxation. We evaluate the performance of our parser on data in several natural languages, achieving improvements over existing state-of-the-art methods. 1

Approximate Learning for Structured Prediction Problems

by Alex Kulesza
"... Prediction problems such as image segmentation, sentence parsing, and gene prediction involve complex output spaces for which multiple decisions must be coordinated to achieve optimal results. Unfortunately, this means that there are generally an exponential number of possible predictions for every ..."
Abstract - Add to MetaCart
Prediction problems such as image segmentation, sentence parsing, and gene prediction involve complex output spaces for which multiple decisions must be coordinated to achieve optimal results. Unfortunately, this means that there are generally an exponential number of possible predictions for every input. Markov random fields can be used to express structure in these output spaces, reducing the number of model parameters to a manageable size; however, the problem of learning those parameters from a training sample remains NP-hard in general. We review some recent results on approximate learning of structured prediction problems. There are two distinct approaches. In the first, results from the well-studied field of approximate inference are adapted to the learning setting. In the second, learning performance is characterized directly, producing bounds even when the underlying inference method does not offer formal approximation guarantees. While the literature on this topic is still sparse, we review the strengths and weaknesses of current results, and discuss issues

Improving the . . . of Discriminative Learning Methods for Markov Logic Networks

by Tuyen Ngoc Huynh , 2011
"... ..."
Abstract - Add to MetaCart
Abstract not found
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University