• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Learning multiple related tasks using latent independent component analysis. (2006)

by J Zhang, Z Ghahramani, Y Yang
Venue:NIPS,
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 53
Next 10 →

Bayesian Compressive Sensing

by Shihao Ji, Ya Xue, Lawrence Carin , 2007
"... The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basis-function coefficients associated with B. Compressive sensing ..."
Abstract - Cited by 330 (24 self) - Add to MetaCart
The data of interest are assumed to be represented as N-dimensional real vectors, and these vectors are compressible in some linear basis B, implying that the signal can be reconstructed accurately using only a small number M ≪ N of basis-function coefficients associated with B. Compressive sensing is a framework whereby one does not measure one of the aforementioned N-dimensional signals directly, but rather a set of related measurements, with the new measurements a linear combination of the original underlying N-dimensional signal. The number of required compressive-sensing measurements is typically much smaller than N, offering the potential to simplify the sensing system. Let f denote the unknown underlying N-dimensional signal, and g a vector of compressive-sensing measurements, then one may approximate f accurately by utilizing knowledge of the (under-determined) linear relationship between f and g, in addition to knowledge of the fact that f is compressible in B. In this paper we employ a Bayesian formalism for estimating the underlying signal f based on compressive-sensing measurements g. The proposed framework has the following properties: (i) in addition to estimating the underlying signal f, “error bars ” are also estimated, these giving a measure of confidence in the inverted signal; (ii) using knowledge of the error bars, a principled means is provided for determining when a sufficient
(Show Context)

Citation Context

...i satisfies ûi = Ψ ˆ θi. Typical approaches to information transfer among tasks include: sharing hidden nodes in neural networks [16]–[18], placing a common prior in hierarchical Bayesian models [19]–=-=[22]-=-, sharing parameters of Gaussian processes [23], sharing a common structure on the predictor space [24], and structured regularization in kernel methods [25], among others. In statistics, the problem ...

Convex multi-task feature learning

by Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil - MACHINE LEARNING , 2007
"... We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single-task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove th ..."
Abstract - Cited by 258 (25 self) - Add to MetaCart
We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single-task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the former step it learns task-specific functions and in the latter step it learns common-across-tasks sparse representations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report experiments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task independently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select – not learn – a few common variables across the tasks.
(Show Context)

Citation Context

...ed learning (e.g., using 1-norm regularization) or for unsupervised learning (e.g., using principal component analysis (PCA) or independent component analysis (ICA)), there has been only limited work =-=[3, 9, 31, 48]-=- in the multi-task supervised learning setting. In this paper, we present a novel method for learning sparse representations common across many supervised learning tasks. In particular, we develop a n...

Multi-task feature learning

by Andreas Argyriou, Theodoros Evgeniou, Massimiliano Pontil - Advances in Neural Information Processing Systems 19 , 2007
"... We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that th ..."
Abstract - Cited by 240 (8 self) - Add to MetaCart
We present a method for learning a low-dimensional representation which is shared across a set of multiple related tasks. The method builds upon the wellknown 1-norm regularization problem using a new regularizer which controls the number of learned features common for all the tasks. We show that this problem is equivalent to a convex optimization problem and develop an iterative algorithm for solving it. The algorithm has a simple interpretation: it alternately performs a supervised and an unsupervised step, where in the latter step we learn commonacross-tasks representations and in the former step we learn task-specific functions using these representations. We report experiments on a simulated and a real data set which demonstrate that the proposed method dramatically improves the performance relative to learning each task independently. Our algorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks.
(Show Context)

Citation Context

...lgorithm can also be used, as a special case, to simply select – not learn – a few common features across the tasks. 1 Introduction Learning multiple related tasks simultaneously has been empirically =-=[2, 3, 8, 9, 12, 18, 19, 20]-=- as well as theoretically [2, 4, 5] shown to often significantly improve performance relative to learning each task independently. This is the case, for example, when only a few data per task are avai...

A spectral regularization framework for multi-task structure learning

by Andreas Argyriou, Massimiliano Pontil, Charles A. Micchelli, Yiming Ying - In NIPS , 2008
"... Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, whi ..."
Abstract - Cited by 76 (9 self) - Add to MetaCart
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer. We analyze concrete examples of the framework, which are equivalent to regularization with Lp matrix norms. Experiments on two real data sets indicate that the algorithm scales well with the number of tasks and improves on state of the art statistical performance. 1
(Show Context)

Citation Context

...ithm scales well with the number of tasks and improves on state of the art statistical performance. 1 Introduction Recently, there has been renewed interest in the problem of multi-task learning, see =-=[2, 4, 5, 14, 16, 19]-=- and references therein. This problem is important in a variety of applications, ranging from conjoint analysis [12], to object detection in computer vision [18], to multiple microarray data set integ...

Extracting Shared Subspace for Multi-label Classification

by Shuiwang Ji, Lei Tang, Shipeng Yu, Jieping Ye
"... Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multipl ..."
Abstract - Cited by 54 (2 self) - Add to MetaCart
Multi-label problems arise in various domains such as multitopic document categorization and protein function prediction. One natural way to deal with such problems is to construct a binary classifier for each label, resulting in a set of independent binary classification problems. Since the multiple labels share the same input space, and the semantics conveyed by different labels are usually correlated, it is essential to exploit the correlation information contained in different labels. In this paper, we consider a general framework for extracting shared structures in multi-label classification. In this framework, a common subspace is assumed to be shared among multiple labels. We show that the optimal solution to the proposed formulation can be obtained by solving a generalized eigenvalue problem, though the problem is nonconvex. For high-dimensional problems, direct computation of the solution is expensive, and we develop an efficient algorithm for this case. One appealing feature of the proposed framework is that it includes several well-known algorithms as special cases, thus elucidating their intrinsic relationships. We have conducted extensive experiments on eleven multitopic web page categorization tasks, and results demonstrate the effectiveness of the proposed formulation in comparison with several representative algorithms.
(Show Context)

Citation Context

...orms multiple functions. In automated newswire categorization, multiple labels can be associated with a newswire story indicating its subject categories and the regional categories of reported events =-=[34]-=-. One common aspect of these problems is that multiple labels are associated with a single object, and they are hence called multi-label problems. Such problems are more general than the traditional m...

Towards weakly supervised semantic segmentation by means of multiple instance and multitask learning

by Alexander Vezhnevets, Joachim M. Buhmann - In CVPR , 2010
"... We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a sys-tem that predicts an object label for each pixel by making use of only image level labels during training – the informa-tion whether a certain object is present or not in the image. Such c ..."
Abstract - Cited by 32 (0 self) - Add to MetaCart
We address the task of learning a semantic segmentation from weakly supervised data. Our aim is to devise a sys-tem that predicts an object label for each pixel by making use of only image level labels during training – the informa-tion whether a certain object is present or not in the image. Such coarse tagging of images is faster and easier to obtain as opposed to the tedious task of pixelwise labeling required in state of the art systems. We cast this task naturally as a multiple instance learning (MIL) problem. We use Semantic Texton Forest (STF) as the basic framework and extend it for the MIL setting. We make use of multitask learning (MTL) to regularize our solution. Here, an external task of geo-metric context estimation is used to improve on the task of semantic segmentation. We report experimental results on the MSRC21 and the very challenging VOC2007 datasets. On MSRC21 dataset we are able, by using 276 weakly la-beled images, to achieve the performance of a supervised STF trained on pixelwise labeled training set of 56 images, which is a significant reduction in supervision needed. 1.
(Show Context)

Citation Context

...sk dependant, but a regularizer that penalizes the difference between G1 and G2 is introduced [2]. An even more general Bayesian approach can be taken, where only a prior on the classifiers is shared =-=[17]-=-. One can also be interested only in the prediction for one main task, as in our case, then F2 can be ignored later, although it is optimized during training. Here we show that MTL is very much applic...

Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

by Jianhui Chen, Ji Liu, Jieping Ye
"... We consider the problem of learning incoherent sparse and lowrank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a lowrank constraint, respectively. This f ..."
Abstract - Cited by 31 (7 self) - Add to MetaCart
We consider the problem of learning incoherent sparse and lowrank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a lowrank constraint, respectively. This formulation is non-convex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose to employ the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is non-differentiable and the feasible domain is non-trivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and an Euclidean projection subproblem. In addition, we present two projected gradient algorithms and discuss their rates of convergence. Experimental results on benchmark data sets demonstrate the effectiveness of the proposed multi-task learning formulation and the efficiency of the proposed projected gradient algorithms.
(Show Context)

Citation Context

...s from different perspectives. Hidden units of neural networks are shared among similar tasks [6, 13]; task relatedness are modeled using the common prior distribution in hierarchical Bayesian models =-=[5, 29, 40, 41]-=-; the parameters of Gaussian Process covariance are learned from multiple tasks [21]; kernel methods and regularization networks are extended to multi-task learning setting [16]; a convex formulation ...

Integrating Low-Rank and Group-Sparse Structures for Robust Multi-Task Learning

by Jianhui Chen, Jiayu Zhou, Jieping Ye - In KDD , 2011
"... Multi-task learning (MTL) aims at improving the generalization performance by utilizing the intrinsic relationships among multiple related tasks. A key assumption in most MTL algorithms is that all tasks are related, which, however, may not be the case in many realworld applications. In this paper, ..."
Abstract - Cited by 30 (2 self) - Add to MetaCart
Multi-task learning (MTL) aims at improving the generalization performance by utilizing the intrinsic relationships among multiple related tasks. A key assumption in most MTL algorithms is that all tasks are related, which, however, may not be the case in many realworld applications. In this paper, we propose a robust multi-task learning (RMTL) algorithm which learns multiple tasks simultaneously as well as identifies the irrelevant (outlier) tasks. Specifically, the proposed RMTL algorithm captures the task relationships using a low-rank structure, and simultaneously identifies the outlier tasks using a group-sparse structure. The proposed RMTL algorithm is formulated as a non-smooth convex (unconstrained) optimization problem. We propose to adopt the accelerated proximal method (APM) for solving such an optimization problem. The key component in APM is the computation of the proximal operator, which can be shown to admit an analytic solution. We also theoretically analyze the effectiveness of the RMTL algorithm. In particular, we derive a key property of the optimal solution to RMTL; moreover, based on this key property, we establish a theoretical bound for characterizing the learning performance of RMTL. Our experimental results on benchmark data sets demonstrate the effectiveness and efficiency of the proposed algorithm.
(Show Context)

Citation Context

...the existing MTL algorithms is that all tasks are correlated via a certain structure, which, for example, includes hidden units in neural networks [7], a common prior in a hierarchical Bayesian model =-=[3, 29, 35, 37]-=-, parameters in Gaussian process covariance [17], kernels and regularizations [10], and common feature representation [1, 9, 2, 27, 16, 34]. Under such an assumption, the knowledge learned from one ta...

It’s Who You Know: Graph Mining Using Recursive Structural Features

by Keith Henderson, Tina Eliassi-rad, Brian Gallagher, Hanghang Tong, Lei Li, Leman Akoglu, Christos Faloutsos
"... Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is ano ..."
Abstract - Cited by 29 (10 self) - Add to MetaCart
Given a graph, how can we extract good features for the nodes? For example, given two large graphs from the same domain, how can we use information in one to do classification in the other (i.e., perform across-network classification or transfer learning on graphs)? Also, if one of the graphs is anonymized, how can we use information in one to de-anonymize the other? The key step in all such graph mining tasks is to find effective node features. We propose ReFeX (Recursive Feature eXtraction), a novel algorithm, that recursively combines local (node-based) features with neighborhood (egonet-based) features; and outputs regional features – capturing “behavioral ” information. We demonstrate how these powerful regional features can be used in within-network and across-network classification and de-anonymization tasks – without relying on homophily, or the availability of class labels. The contributions of our work are as follows: (a) ReFeX is scalable and (b) it is effective, capturing regional (“behavioral”) information in large graphs. We report experiments on real graphs from various domains with over 1M edges, where ReFeX outperforms its competitors on typical graph mining tasks like network classification and de-anonymization.
(Show Context)

Citation Context

...availability of labels. Transfer Learning. Transfer learning (i.e., domain adaption) has been a very active research area in recent years. Representative work includes multi-label text classification =-=[30, 13]-=-, crossdomain sentiment prediction [4, 6, 15], intrusion detection [12], verb argument classification [19], cross-lingual classification [25], and cross-domain relation extraction [29]. In all of thes...

Learning multiple tasks with a sparse matrix-normal penalty

by Yi Zhang, Jeff Schneider - In Advances in Neural Information Processing Systems , 2010
"... In this paper, we propose a matrix-variate normal penalty with sparse inverse co-variances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the ma-trix correspond to tasks and features, respectively. Followi ..."
Abstract - Cited by 28 (3 self) - Add to MetaCart
In this paper, we propose a matrix-variate normal penalty with sparse inverse co-variances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the ma-trix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several re-cently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an ef-fective and flexible way to model various different structures of multiple tasks. 1
(Show Context)

Citation Context

... multiple tasks has been studied for more than a decade [6, 24, 11]. Research in the following two directions has drawn considerable interest: learning a common feature representation shared by tasks =-=[1, 12, 30, 2, 3, 9, 23]-=-, and directly inferring the relatedness of tasks [4, 26, 21, 29]. Both have a natural interpretation if we view learning multiple tasks as estimating a matrix of model parameters, where the rows and ...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University