Results 21 - 30
of
94
Efficient parameter estimation for RNA secondary structure prediction
- BIOINFORMATICS
"... Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, t ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. Results: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our constraint generation approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly-computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state-of-the-art methods.
Solving MultiClass Support Vector Machines with LaRank
"... Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not spa ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Optimization algorithms for large margin multiclass recognizers are often too costly to handle ambitious problems with structured outputs and exponential numbers of classes. Optimization algorithms that rely on the full gradient are not effective because, unlike the solution, the gradient is not sparse and is very large. The LaRank algorithm sidesteps this difficulty by relying on a randomized exploration inspired by the perceptron algorithm. We show that this approach is competitive with gradient based optimizers on simple multiclass problems. Furthermore, a single LaRank pass over the training examples delivers test error rates that are nearly as good as those of the final solution. 1.
Learning Efficiently with Approximate Inference via Dual Losses
"... Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeat ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
Many structured prediction tasks involve complex models where inference is computationally intractable, but where it can be well approximated using a linear programming relaxation. Previous approaches for learning for structured prediction (e.g., cuttingplane, subgradient methods, perceptron) repeatedly make predictions for some of the data points. These approaches are computationally demanding because each prediction involves solving a linear program to optimality. We present a scalable algorithm for learning for structured prediction. The main idea is to instead solve the dual of the structured prediction loss. We formulate the learning task as a convex minimization over both the weights and the dual variables corresponding to each data point. As a result, we can begin to optimize the weights even before completely solving any of the individual prediction problems. We show how the dual variables can be efficiently optimized using coordinate descent. Our algorithm is competitive with state-of-the-art methods such as stochastic subgradient and cutting-plane. 1.
Kernelizing the output of tree-based methods
- In Proceedings of the 23rd International Conference on Machine Learning Edited by: Cohen W, Moore A. ACM
, 2006
"... We extend tree-based methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
We extend tree-based methods to the prediction of structured outputs using a kernelization of the algorithm that allows one to grow trees as soon as a kernel can be defined on the output space. The resulting algorithm, called output kernel trees (OK3), generalizes classification and regression trees as well as treebased ensemble methods in a principled way. It inherits several features of these methods such as interpretability, robustness to irrelevant variables, and input scalability. When only the Gram matrix over the outputs of the learning sample is given, it learns the output kernel as a function of inputs. We show that the proposed algorithm works well on an image reconstruction task and on a biological network inference problem. 1.
Domain Adaptation of Natural Language Processing Systems
, 2007
"... My first thanks must go to Fernando Pereira. He was a wonderful advisor, and every aspect of this thesis has benefitted from his insight. At times I was a difficult, even unruly graduate student, and Fernando had patience with all my ideas, whether good or bad. What I’ll miss most, though, is the qu ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
My first thanks must go to Fernando Pereira. He was a wonderful advisor, and every aspect of this thesis has benefitted from his insight. At times I was a difficult, even unruly graduate student, and Fernando had patience with all my ideas, whether good or bad. What I’ll miss most, though, is the quick trip to Fernando’s office, coming away with new insights on everything from numerical underflow to the state of the academic community in machine learning and NLP. In addition to Fernando, this thesis was shaped by a great committee. Having Ben Taskar as committee chairman has given me the perfect excuse to interrupt his workday with new, ostensibly-thesis-related machine learning ideas. Mark Liberman and Mitch Marcus brought a much-needed linguistic perspective to a thesis on language, and many of the techniques described are based on work by Tong Zhang, who kindly served as my external committee member. Although he didn’t directly serve on my committee, Shai Ben-David got me started on the theoretical aspects of this work, and chapter 4 grew out of work I co-authored with him. I was also fortunate to have a great academic family. With brothers (and one sister!)
Evolutionary Learning with Kernels: A Generic Solution for Large Margin Problems
, 2006
"... In this paper we embed evolutionary computation into statistical learning theory. First, we outline the connection between large margin optimization and statistical learning and see why this paradigm is successful for many pattern recognition problems. We then embed evolutionary computation into the ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
In this paper we embed evolutionary computation into statistical learning theory. First, we outline the connection between large margin optimization and statistical learning and see why this paradigm is successful for many pattern recognition problems. We then embed evolutionary computation into the most prominent representative of this class of learning methods, namely into Support Vector Machines (SVM). In contrast to former applications of evolutionary algorithms to SVMs we do not only optimize the method or kernel parameters. We rather use both evolution strategies and particle swarm optimization in order to directly solve the posed constrained optimization problem. Transforming the problem into the Wolfe dual reduces the total runtime and allows the usage of kernel functions. Exploiting the knowledge about this optimization problem leads to a hybrid mutation which further decreases convergence time while classification accuracy is preserved. We will show that evolutionary SVMs are at least as accurate as their quadratic programming counterparts on six real-world benchmark data sets. The evolutionary SVM variants frequently outperform their quadratic programming competitors. Additionally, the proposed algorithm is more generic than existing traditional solutions since it will also work for non-positive semidefinite kernel functions and for several, possibly competing, performance criteria.
Discriminative Learning and Spanning Tree Algorithms for Dependency Parsing
, 2006
"... In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented effici ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
In this thesis we develop a discriminative learning method for dependency parsing using
online large-margin training combined with spanning tree inference algorithms. We will
show that this method provides state-of-the-art accuracy, is extensible through the feature
set and can be implemented efficiently. Furthermore, we display the language independent
nature of the method by evaluating it on over a dozen diverse languages as well as show its
practical applicability through integration into a sentence compression system.
We start by presenting an online large-margin learning framework that is a generaliza-
tion of the work of Crammer and Singer [34, 37] to structured outputs, such as sequences
and parse trees. This will lead to the heart of this thesis – discriminative dependency pars-
ing. Here we will formulate dependency parsing in a spanning tree framework, yielding
efficient parsing algorithms for both projective and non-projective tree structures. We will
then extend the parsing algorithm to incorporate features over larger substructures with-
out an increase in computational complexity for the projective case. Unfortunately, the
non-projective problem then becomes NP-hard so we provide structurally motivated ap-
proximate algorithms. Having defined a set of parsing algorithms, we will also define a
rich feature set and train various parsers using the online large-margin learning framework.
We then compare our trained dependency parsers to other state-of-the-art parsers on 14
diverse languages: Arabic, Bulgarian, Chinese, Czech, Danish, Dutch, English, German,
Japanese, Portuguese, Slovene, Spanish, Swedish and Turkish.
Having built an efficient and accurate discriminative dependency parser, this thesis will
then turn to improving and applying the parser. First we will show how additional re-
sources can provide useful features to increase parsing accuracy and to adapt parsers to
new domains. We will also argue that the robustness of discriminative inference-based
learning algorithms lend themselves well to dependency parsing when feature representa-
tions or structural constraints do not allow for tractable parsing algorithms. Finally, we
integrate our parsing models into a state-of-the-art sentence compression system to show
its applicability to a real world problem.
Accurate max-margin training for structured output spaces
- In ICML
, 2008
"... Tsochantaridis et al. (2005) proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Tsochantaridis et al. (2005) proposed two formulations for maximum margin training of structured spaces: margin scaling and slack scaling. While margin scaling has been extensively used since it requires the same kind of MAP inference as normal structured prediction, slack scaling is believed to be more accurate and better-behaved. We present an efficient variational approximation to the slack scaling method that solves its inference bottleneck while retaining its accuracy advantage over margin scaling. We further argue that existing scaling approaches do not separate the true labeling comprehensively while generating violating constraints. We propose a new max-margin trainer PosLearn that generates violators to ensure separation at each position of a decomposable loss function. Empirical results on real datasets illustrate that PosLearn can reduce test error by up to 25 % over margin scaling and 10 % over slack scaling. Further, PosLearn violators can be generated more efficiently than slack violators; for many structured tasks the time required is just twice that of MAP inference. 1.
Training Parsers by Inverse Reinforcement Learning
- MACHINE LEARNING
, 2009
"... One major idea in structured prediction is to assume that the predictor computes its output by finding the maximum of a score function. The training of such a predictor can then be cast as the problem of finding weights of the score function so that the output of the predictor on the inputs matche ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
One major idea in structured prediction is to assume that the predictor computes its output by finding the maximum of a score function. The training of such a predictor can then be cast as the problem of finding weights of the score function so that the output of the predictor on the inputs matches the corresponding structured labels on the training set. A similar problem is studied in inverse reinforcement learning (IRL) where one is given an environment and a set of trajectories and the problem is to find a reward function such that an agent acting optimally with respect to the reward function would follow trajectories that match those in the training set. In this paper we show how IRL algorithms can be applied to structured prediction, in particular to parser training. We present a number of recent incremental IRL algorithms in a unified framework and map them to parser training algorithms. This allows us to recover some existing parser training algorithms, as well as to obtain a new one. The resulting algorithms are compared in terms of their sensitivity to the choice of various parameters and generalization ability on the Penn Treebank WSJ corpus.
Structured log linear models for noise robust speech recognition
- Signal Processing Letters, IEEE
, 2010
"... [ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-l ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
[ The use of discriminative models for structured classification tasks, such as automatic speech recognition is becoming increasingly popular. The major contribution of this work is we proposed a large margin structured log-linear model for noise robust continuous ASR. 1 An important aspect of log-linear models is the form of the features. The features used in our structured log linear model are derived from generative kernels. This provides an elegant way of combining generative and discriminative models to handle time-varying data. Additionally, since the features are based on the generative models, model-based compensation can be easily performed for noise robustness. Third, the designed joint feature space can be decomposed at the arc level. This allows efficient decoding and training with lattices, which is important for any larger vocabulary extensions. Previous work in this area is extended in two important directions. First, instead of using CML training which is commonly used for discriminative models, this paper describes efficient large margin training for sentence-level log linear models based on lattices. Depending on the nature of the joint feature-space and labels, we have proved that this form of model is closely related to structured SVMs and Multiclass SVMs. Second, efficient lattice-based classification of continuous data is also performed incorporating a joint feature space. This novel model combines generative kernels, discriminative models, efficient lattice-based large margin training and modelbased noise compensation. It is evaluated on a noise corrupted continuous digit task: AURORA 2.0. Results on the AURORA 2 demonstrate that modelling the structure information yields significant improvements.]

