Results 1  10
of
176
Regularization paths for generalized linear models via coordinate descent
, 2009
"... We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic ..."
Abstract

Cited by 724 (15 self)
 Add to MetaCart
We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, twoclass logistic regression, and multinomial regression problems while the penalties include ℓ1 (the lasso), ℓ2 (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.
Dual averaging methods for regularized stochastic learning and online optimization
 In Advances in Neural Information Processing Systems 23
, 2009
"... We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nes ..."
Abstract

Cited by 133 (7 self)
 Add to MetaCart
(Show Context)
We consider regularized stochastic learning and online optimization problems, where the objective function is the sum of two convex terms: one is the loss function of the learning task, and the other is a simple regularization term such as ℓ1norm for promoting sparsity. We develop extensions of Nesterov’s dual averaging method, that can exploit the regularization structure in an online setting. At each iteration of these methods, the learning variables are adjusted by solving a simple minimization problem that involves the running average of all past subgradients of the loss function and the whole regularization term, not just its subgradient. In the case of ℓ1regularization, our method is particularly effective in obtaining sparse solutions. We show that these methods achieve the optimal convergence rates or regret bounds that are standard in the literature on stochastic and online convex optimization. For stochastic learning problems in which the loss functions have Lipschitz continuous gradients, we also present an accelerated version of the dual averaging method.
Improved partofspeech tagging for online conversational text with word clusters
 In Proceedings of NAACL
, 2013
"... We consider the problem of partofspeech tagging for informal, online conversational text. We systematically evaluate the use of largescale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves stateoftheart tagging results o ..."
Abstract

Cited by 95 (7 self)
 Add to MetaCart
We consider the problem of partofspeech tagging for informal, online conversational text. We systematically evaluate the use of largescale unsupervised word clustering and new lexical features to improve tagging accuracy. With these features, our system achieves stateoftheart tagging results on both Twitter and IRC POS tagging tasks; Twitter tagging is improved from 90 % to 93% accuracy (more than 3 % absolute). Qualitative analysis of these word clusters yields insights about NLP and linguistic phenomena in this genre. Additionally, we contribute the first POS annotation guidelines for such text and release a new dataset of English language tweets annotated using these guidelines. Tagging software, annotation guidelines, and largescale word clusters are available at:
More Generality in Efficient Multiple Kernel Learning
"... Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms h ..."
Abstract

Cited by 80 (3 self)
 Add to MetaCart
(Show Context)
Recent advances in Multiple Kernel Learning (MKL) have positioned it as an attractive tool for tackling many supervised learning tasks. The development of efficient gradient descent based optimization schemes has made it possible to tackle large scale problems. Simultaneously, MKL based algorithms have achieved very good results on challenging real world applications. Yet, despite their successes, MKL approaches are limited in that they focus on learning a linear combination of given base kernels. In this paper, we observe that existing MKL formulations can be extended to learn general kernel combinations subject to general regularization. This can be achieved while retaining all the efficiency of existing large scale optimization algorithms. To highlight the advantages of generalized kernel learning, we tackle feature selection problems on benchmark vision and UCI databases. It is demonstrated that the proposed formulation can lead to better results not only as compared to traditional MKL but also as compared to stateoftheart wrapper and filter methods for feature selection. 1.
Understanding Natural Language Commands for Robotic Navigation and Mobile Manipulation
 In Proceedings of the National Conference on Artificial Intelligence
, 2011
"... This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semistructured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given t ..."
Abstract

Cited by 76 (16 self)
 Add to MetaCart
(Show Context)
This paper describes a new model for understanding natural language commands given to autonomous systems that perform navigation and mobile manipulation in semistructured environments. Previous approaches have used models with fixed structure to infer the likelihood of a sequence of actions given the environment and the command. In contrast, our framework, called Generalized Grounding Graphs (G 3), dynamically instantiates a probabilistic graphical model for a particular natural language command according to the command’s hierarchical and compositional semantic structure. Our system performs inference in the model to successfully find and execute plans corresponding to natural language commands such as “Put the tire pallet on the truck. ” The model is trained using a corpus of commands collected using crowdsourcing. We pair each command with robot actions and use the corpus to learn the parameters of the model. We evaluate the robot’s performance by inferring plans from natural language commands, executing each plan in a realistic robot simulator, and asking users to evaluate the system’s performance. We demonstrate that our system can successfully follow many natural language commands from the corpus. 1
Conditional random fields for activity recognition
 In Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007
, 2007
"... of any sponsoring institution, the U.S. government or any other entity. ..."
Abstract

Cited by 76 (0 self)
 Add to MetaCart
(Show Context)
of any sponsoring institution, the U.S. government or any other entity.
Practical very large scale CRFs
 in Proceedings the 48th Annual Meeting of the Association for Computational Linguistics (ACL
"... Conditional Random Fields (CRFs) are a widelyused approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linearchain model, taking structure into account implies a number ..."
Abstract

Cited by 69 (8 self)
 Add to MetaCart
(Show Context)
Conditional Random Fields (CRFs) are a widelyused approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linearchain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hundreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a ℓ1 penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.
An Introduction to Conditional Random Fields
 Foundations and Trends in Machine Learning
, 2012
"... ..."
(Show Context)
StatSnowball: a Statistical Approach to Extracting Entity Relationships
 WWW 2009 MADRID! TRACK: DATA MINING / SESSION: STATISTICAL METHODS
, 2009
"... Traditional relation extraction methods require prespecified relations and relationspecific humantagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristicbased methods to combine a set of strict hard rules, which limit the abili ..."
Abstract

Cited by 56 (2 self)
 Add to MetaCart
Traditional relation extraction methods require prespecified relations and relationspecific humantagged examples. Bootstrapping systems significantly reduce the number of training examples, but they usually apply heuristicbased methods to combine a set of strict hard rules, which limit the ability to generalize and thus generate a low recall. Furthermore, existing bootstrapping methods do not perform open information extraction (Open IE), which can identify various types of relations without requiring prespecifications. In this paper, we propose a statistical extraction framework called Statistical Snowball (StatSnowball), which is a bootstrapping system and can perform both traditional relation extraction and Open IE. StatSnowball uses the discriminative Markov logic networks
Discriminative Structure and Parameter Learning for Markov Logic Networks
"... Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve spe ..."
Abstract

Cited by 56 (5 self)
 Add to MetaCart
(Show Context)
Markov logic networks (MLNs) are an expressive representation for statistical relational learning that generalizes both firstorder logic and graphical models. Existing methods for learning the logical structure of an MLN are not discriminative; however, many relational learning problems involve specific target predicates that must be inferred from given background information. We found that existing MLN methods perform very poorly on several such ILP benchmark problems, and we present improved discriminative methods for learning MLN clauses and weights that outperform existing MLN and traditional ILP methods. 1.