Results 1  10
of
193
A framework for learning predictive structures from multiple tasks and unlabeled data
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods ar ..."
Abstract

Cited by 443 (3 self)
 Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semisupervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semisupervised learning setting.
Regularized multitask learning
, 2004
"... This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize cl ..."
Abstract

Cited by 277 (2 self)
 Add to MetaCart
(Show Context)
This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
Convex multitask feature learning
 MACHINE LEARNING
, 2007
"... We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove th ..."
Abstract

Cited by 258 (25 self)
 Add to MetaCart
(Show Context)
We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the wellknown singletask 1norm regularization. It is based on a novel nonconvex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns taskspecific functions and in the latter step it learns commonacrosstasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks.
Learning Multiple Tasks with Kernel Methods
 Journal of Machine Learning Research
, 2005
"... Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis sh ..."
Abstract

Cited by 251 (10 self)
 Add to MetaCart
(Show Context)
Editor: John ShaweTaylor We study the problem of learning many related tasks simultaneously using kernel methods and regularization. The standard singletask kernel methods, such as support vector machines and regularization networks, are extended to the case of multitask learning. Our analysis shows that the problem of estimating many task functions with regularization can be cast as a single task learning problem if a family of multitask kernel functions we define is used. These kernels model relations among the tasks and are derived from a novel form of regularizers. Specific kernels that can be used for multitask learning are provided and experimentally tested on two real data sets. In agreement with past empirical work on multitask learning, the experiments show that learning multiple related tasks simultaneously using the proposed approach can significantly outperform standard singletask learning particularly when there are many related tasks but few data per task.
Multitask feature learning
 Advances in Neural Information Processing Systems 19
, 2007
"... We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that th ..."
Abstract

Cited by 240 (8 self)
 Add to MetaCart
(Show Context)
We present a method for learning a lowdimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacrosstasks representations and in the former step we learn taskspecific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks.
Predictive Representations of State
 In Advances In Neural Information Processing Systems 14
, 2001
"... We show that states of a dynamical system can be usefully represented by multistep, actionconditional predictions of future observations. ..."
Abstract

Cited by 223 (41 self)
 Add to MetaCart
(Show Context)
We show that states of a dynamical system can be usefully represented by multistep, actionconditional predictions of future observations.
Multitask Gaussian Process Prediction
"... In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies ..."
Abstract

Cited by 144 (6 self)
 Add to MetaCart
(Show Context)
In this paper we investigate multitask learning in the context of Gaussian Processes (GP). We propose a model that learns a shared covariance function on inputdependent features and a “freeform ” covariance matrix over tasks. This allows for good flexibility when modelling intertask dependencies while avoiding the need for large amounts of data for training. We show that under the assumption of noisefree observations and a block design, predictions for a given task only depend on its target values and therefore a cancellation of intertask transfer occurs. We evaluate the benefits of our model on two practical applications: a compiler performance prediction problem and an exam score prediction task. Additionally, we make use of GP approximations and properties of our model in order to provide scalability to large data sets. 1
Multitask learning for classification with dirichlet process priors
 Journal of Machine Learning Research
, 2007
"... Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asy ..."
Abstract

Cited by 140 (12 self)
 Add to MetaCart
Multitask learning (MTL) is considered for logisticregression classifiers, based on a Dirichlet process (DP) formulation. A symmetric MTL (SMTL) formulation is considered in which classifiers for multiple tasks are learned jointly, with a variational Bayesian (VB) solution. We also consider an asymmetric MTL (AMTL) formulation in which the posterior density function from the SMTL model parameters, from previous tasks, is used as a prior for a new task; this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Comparisons are also made to simpler approaches, such as singletask learning, pooling of data across tasks, and simplified approximations to DP. A comprehensive analysis of algorithm performance is addressed through consideration of two data sets that are matched to the MTL problem.
Exploiting Task Relatedness for Multiple Task Learning
, 2003
"... The approach of learning of multiple "related" tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point of previous work on multiple task learning has been that the tasks to be learnt jointly ..."
Abstract

Cited by 114 (1 self)
 Add to MetaCart
(Show Context)
The approach of learning of multiple "related" tasks simultaneously has proven quite successful in practice; however, theoretical justification for this success has remained elusive. The starting point of previous work on multiple task learning has been that the tasks to be learnt jointly are somehow "algorithmically related", in the sense that the results of applying a specific learning algorithm to these tasks are assumed to be similar. We take a logical step backwards and offer a data generating mechanism through which our notion of taskrelatedness is defined.