Results 1  10
of
18
Accelerated gradient method for multitask sparse learning problem
 in Proceedings of the International Conference on Data Mining
, 2009
"... Abstract—Many real world learning problems can be recast as multitask learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multitask setting has many applications in fie ..."
Abstract

Cited by 24 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Many real world learning problems can be recast as multitask learning problems which utilize correlations among different tasks to obtain better generalization performance than learning each task individually. The feature selection problem in multitask setting has many applications in fields of computer vision, text classification and bioinformatics. Generally, it can be realized by solving a L1infinity regularized optimization problem. And the solution automatically yields the joint sparsity among different tasks. However, due to the nonsmooth nature of the L1infinity norm, there lacks an efficient training algorithm for solving such problem with general convex loss functions. In this paper, we propose an accelerated gradient method based on an “optimal ” first order blackbox method named after Nesterov and provide the convergence rate for smooth convex loss functions. For nonsmooth convex loss functions, such as hinge loss, our method still has fast convergence rate empirically. Moreover, by exploiting the structure of the L1infinity ball, we solve the blackbox oracle in Nesterov’s method by a simple sorting scheme. Our method is suitable for largescale multitask learning problem since it only utilizes the first order information and is very easy to implement. Experimental results show that our method significantly outperforms the most stateoftheart methods in both convergence speed and learning accuracy. Keywordsmultitask learning; L1infinity regularization; optimal method; gradient descend I.
Online Learning for Group Lasso
"... We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
We develop a novel online learning algorithm for the group lasso in order to efficiently find the important explanatory factors in a grouped manner. Different from traditional batchmode group lasso algorithms, which suffer from the inefficiency and poor scalability, our proposed algorithm performs in an online mode and scales well: at each iteration one can update the weight vector according to a closedform solution based on the average of previous subgradients. Therefore, the proposed online algorithm can be very efficient and scalable. This is guaranteed by its low worstcase time complexity and memory cost both in the order of O(d), where d is the number of dimensions. Moreover, in order to achieve more sparsity in both the group level and the individual feature level, we successively extend our online system to efficiently solve a number of variants of sparse group lasso models. We also show that the online system is applicable to other group lasso models, such as the group lasso with overlap and graph lasso. Finally, we demonstrate the merits of our algorithm by experimenting with both synthetic and realworld datasets. 1.
The Infinite Push: A New Support Vector Ranking Algorithm that Directly Optimizes Accuracy at the Absolute Top of the List
"... Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximize ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
Ranking problems have become increasingly important in machine learning and data mining in recent years, with applications ranging from information retrieval and recommender systems to computational biology and drug discovery. In this paper, we describe a new ranking algorithm that directly maximizes the number of relevant objects retrieved at the absolute top of the list. The algorithm is a support vector style algorithm, but due to the different objective, it no longer leads to a quadratic programming problem. Instead, the dual optimization problem involves l1, ∞ constraints; we solve this dual problem using the recent l1, ∞ projection method of Quattoni et al (2009). Our algorithm can be viewed as an l∞norm extreme of the lpnorm based algorithm of Rudin (2009) (albeit in a support vector setting rather than a boosting setting); thus we refer to the algorithm as the ‘Infinite Push’. Experiments on realworld data sets confirm the algorithm’s focus on accuracy at the absolute top of the list.
Graphical Model Structure Learning with `1Regularization
, 2010
"... This work looks at fitting probabilistic graphical models to data when the structure is not known. The main tool to do this is `1regularization and the more general group `1regularization. We describe limitedmemory quasiNewton methods to solve optimization problems with these types of regularize ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
This work looks at fitting probabilistic graphical models to data when the structure is not known. The main tool to do this is `1regularization and the more general group `1regularization. We describe limitedmemory quasiNewton methods to solve optimization problems with these types of regularizers, and we examine learning directed acyclic graphical models with `1regularization, learning undirected graphical models with group `1regularization, and learning hierarchical loglinear models with overlapping group `1regularization. ii
MultiTask Minimum Error Rate Training for SMT
, 2011
"... We present experiments on multitask learning for discriminative training in statistical machine translation (SMT), extending standard minimumerrorrate training (MERT) by techniques that take advantage of the similarity of related tasks. We apply our techniques to GermantoEnglish translation of ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
We present experiments on multitask learning for discriminative training in statistical machine translation (SMT), extending standard minimumerrorrate training (MERT) by techniques that take advantage of the similarity of related tasks. We apply our techniques to GermantoEnglish translation of patents from 8 tasks according to the International Patent Classification (IPC) system. Our experiments show statistically significant gains over taskspecific training by techniques that model commonalities through shared parameters. However, more finegrained combinations of shared parameters with taskspecific ones could not be brought to bear on models with a small number of dense features. The software used in the experiments is released as opensource tool. 1.
unknown title
"... BIOINFORMATICS Vol. 28 ISMB 2012, pages i127–i136doi:10.1093/bioinformatics/bts228 Identifying disease sensitive and quantitative traitrelevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning ..."
Abstract
 Add to MetaCart
BIOINFORMATICS Vol. 28 ISMB 2012, pages i127–i136doi:10.1093/bioinformatics/bts228 Identifying disease sensitive and quantitative traitrelevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning
unknown title
"... BIOINFORMATICS Vol. 28 ISMB 2012, pages i127–i136doi:10.1093/bioinformatics/bts228 Identifying disease sensitive and quantitative traitrelevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning ..."
Abstract
 Add to MetaCart
BIOINFORMATICS Vol. 28 ISMB 2012, pages i127–i136doi:10.1093/bioinformatics/bts228 Identifying disease sensitive and quantitative traitrelevant biomarkers from multidimensional heterogeneous imaging genetics data via sparse multimodal multitask learning
Structured Sparsity in Structured Prediction
"... Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L ..."
Abstract
 Add to MetaCart
(Show Context)
Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability. 1
Structured Sparsity in Structured Prediction
"... Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L ..."
Abstract
 Add to MetaCart
(Show Context)
Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability. 1
Structured Sparsity in Structured Prediction
"... Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L ..."
Abstract
 Add to MetaCart
(Show Context)
Linear models have enjoyed great success in structured prediction in NLP. While a lot of progress has been made on efficient training with several loss functions, the problem of endowing learners with a mechanism for feature selection is still unsolved. Common approaches employ ad hoc filtering or L1regularization; both ignore the structure of the feature space, preventing practicioners from encoding structural prior knowledge. We fill this gap by adopting regularizers that promote structured sparsity, along with efficient algorithms to handle them. Experiments on three tasks (chunking, entity recognition, and dependency parsing) show gains in performance, compactness, and model interpretability. 1