## Semi-Supervised Multitask Learning (2007)

### Cached

### Download Links

Citations: | 24 - 5 self |

### BibTeX

@MISC{Liu07semi-supervisedmultitask,

author = {Qiuhua Liu and Xuejun Liao and Hui Li and Jason Stack and Lawrence Carin},

title = {Semi-Supervised Multitask Learning },

year = {2007}

}

### OpenURL

### Abstract

Context plays an important role when performing classification, and in this paper we examine context from two perspectives. First, the classification of items within a single task is placed within the context of distinct concurrent or previous classification tasks (multiple distinct data collections). This is referred to as multi-task learning (MTL), and is implemented here in a statistical manner, using a simplified form of the Dirichlet process. In addition, when performing many classification tasks one has simultaneous access to all unlabeled data that must be classified, and therefore there is an opportunity to place the classification of any one feature vector within the context of all unlabeled feature vectors; this is referred to as semi-supervised learning. In this paper we integrate MTL and semi-supervised learning into a single framework, thereby exploiting two forms of contextual information. Results are presented on a “toy” example, to demonstrate the concept, and the algorithm is also applied to three real data sets.

### Citations

1311 | Combining labeled and unlabeled data with co-training - Blum, Mitchell - 1998 |

1280 | Information Theory, Inference, and Learning Algorithms - MacKay - 2003 |

1043 |
Spectral Graph Theory
- Chung
- 1997
(Show Context)
Citation Context ...(X , W) is characterized by a matrix of one-step transition probabilities A = [aij]n×n, where aij is the probability of transiting from xi to xj via a single step and is given by aij = wij �n k=1 wik =-=[4]-=-. Let B = [bij]n×n = At . Then (i, j)-th element bij represents the probability of transiting from xi to xj in t steps. Data point xj is said to be a t-step neighbor of xi if bij > 0. The t-step neigh... |

854 | Text classification from labeled and unlabeled documents using EM - Nigam, McCallum, et al. - 2000 |

777 |
A Bayesian analysis of some nonparametric problems
- Ferguson
- 1973
(Show Context)
Citation Context ... information. Since the information transferred from related tasks is also often represented by a prior, the two priors will compete and need be balanced; moreover, this precludes a Dirichlet process =-=[5]-=- or its variants to represent the sharing prior across tasks, because the base distribution of a Dirichlet process cannot be dependent on any particular manifold. We develop a new semi-supervised form... |

725 | Transductive inference for text classification using support vector machines
- Joachims
- 1999
(Show Context)
Citation Context ... much recent work on improving the generalization of classifiers based on using information sources beyond the labeled data. These studies fall into two major categories: (i) semi-supervised learning =-=[8, 11, 14, 9]-=- and (ii) multitask learning (MTL) [3, 1, 12]. The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited... |

707 | UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html - Newman, Hettich, et al. - 1998 |

607 |
The meaning and use of the area under a receiver operating characteristic (ROC) curve
- Hanley, McNeil
- 1982
(Show Context)
Citation Context ...ing algorithm in semi-supervised STL. To replicate the experiments in [12], we employ AUC as the performance measure, where AUC stands for area under the receiver operation characteristic (ROC) curve =-=[6]-=-. The basic setup of the semi-supervised MTL algorithm is as follows. The tasks are ordered as they are when the data are provided to the experimenter (we have randomly permutated the tasks and found ... |

531 | Semi-supervised learning using Gaussian fields and harmonic functions
- Zhu, Ghaharamani, et al.
- 2003
(Show Context)
Citation Context ... much recent work on improving the generalization of classifiers based on using information sources beyond the labeled data. These studies fall into two major categories: (i) semi-supervised learning =-=[8, 11, 14, 9]-=- and (ii) multitask learning (MTL) [3, 1, 12]. The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited... |

498 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ... of classifiers based on using information sources beyond the labeled data. These studies fall into two major categories: (i) semi-supervised learning [8, 11, 14, 9] and (ii) multitask learning (MTL) =-=[3, 1, 12]-=-. The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited, while the latter leverages information from... |

443 | Learning with Kernels: Support Vector - Scholkopf, Smola - 2002 |

333 | A framework for learning predictive structure from multiple tasks and unlabeled data - Ando, Zhang - 2005 |

302 |
Learning and relearning in boltzmann machines
- Hilton, Sejnowski
- 1968
(Show Context)
Citation Context ...ere are significant benefits offered by sharing across the tasks, which partially explains why supervised MTL eventually catches up with semi-supervised MTL. We plot in Figure 3(b) the Hinton diagram =-=[7]-=- of the between-task sharing matrix (an average over the 100 trials) found by the semi-supervised MTL when there are 140 labeled data in each task. 2 �θm−θl� The (m, l)-th element of similarity matrix... |

295 |
Ferguson Distributions via Polya Urn Schemes
- Blackwell, MacQueen
- 1973
(Show Context)
Citation Context ..., which is in agreement with our intuition, since the information from previous tasks increase with m. The formulation in (5) is suggestive of the polya urn representation of a Dirichlet process (DP) =-=[2]-=-. The difference here is that we have used a normal distribution to replace Dirac delta in Dirichlet processes. Since N(θm|θl, η 2 I) approaches Dirac delta δ(θm − θl) as η 2 → 0, we recover the Diric... |

229 | The relevance vector machine - Tipping - 2000 |

218 | Partially labeled classification with markov random walks
- Szummer, Jaakkola
- 2002
(Show Context)
Citation Context ... much recent work on improving the generalization of classifiers based on using information sources beyond the labeled data. These studies fall into two major categories: (i) semi-supervised learning =-=[8, 11, 14, 9]-=- and (ii) multitask learning (MTL) [3, 1, 12]. The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited... |

168 | Regularized multi-task learning
- Evgeniou, Pontil
- 2004
(Show Context)
Citation Context ...distribution with mean θl and covariance matrix η 2 I. As discussed below, the prior in (4) is linked to Dirichlet processes and thus is more general than a parametric prior, as used, for example, in =-=[5]-=-. Each normal distribution represents the prior transferred from a previous task; it is the metaknowledge indicating how the present task should be learned, based on the experience with a previous tas... |

160 | Learning multiple tasks with kernel methods - Evgeniou, Micchelli, et al. - 2005 |

151 | A model of inductive bias learning - Baxter - 2000 |

116 | Regularization and semisupervised learning on large graphs - Belkin, Niyogi - 2004 |

116 | Primary, secondary, and meta-analysis of research - Glass - 1976 |

113 | Task clustering and gating for bayesian multitask learning
- Bakker, Heskes
- 2003
(Show Context)
Citation Context ... of classifiers based on using information sources beyond the labeled data. These studies fall into two major categories: (i) semi-supervised learning [8, 11, 14, 9] and (ii) multitask learning (MTL) =-=[3, 1, 12]-=-. The former employs the information from the data manifold, in which the manifold information provided by the usually abundant unlabeled data is exploited, while the latter leverages information from... |

108 | AUC Optimization vs. Error Rate Minimization - Cortes, Mohri |

100 | Multi-task learning for classification with dirichlet process priors
- Xue, Liao, et al.
(Show Context)
Citation Context ...nd established its effectiveness, we will not repeat the evaluation here and employ PNBC as a representative semi-supervised learning algorithm in semi-supervised STL. To replicate the experiments in =-=[12]-=-, we employ AUC as the performance measure, where AUC stands for area under the receiver operation characteristic (ROC) curve [6]. The basic setup of the semi-supervised MTL algorithm is as follows. T... |

98 | Learning gaussian processes from multiple tasks - Yu, Tresp, et al. - 2005 |

91 | Learning internal representations - Baxter - 1995 |

90 | Discovering structure in multiple learning tasks: The tc algorithm - Thrun, Sullivan - 1996 |

50 | Learning to learn with the informative vector machine - Lawrence, Platt - 2004 |

40 | On semi-supervised classification
- Krishnapuram, Williams, et al.
- 2005
(Show Context)
Citation Context |

36 | A method for combining inference across related nonparametric Bayesian models - Müller, Quintana, et al. - 2004 |

31 | Dirichlet process mixed generalized linear models - Mukhopadhyay, Gelfand - 1997 |

23 | Collaborative ensemble learning: Combining collaborative and content-based information filtering via hierarchical bayes - Yu, Schwaighofer, et al. - 2003 |

23 | A nonparametric hierarchical bayesian framework for information filtering - Yu, Tresp, et al. - 2004 |

17 | A Bayesian semiparametric model for random-effects meta-analysis - Burr, Doss - 2005 |

12 | Combining Information From Several Experiments With Nonparametric Priors - Mallick, Walker - 1997 |

10 | Classification and mixture approach to clustering via maximum likelihood - Ganesalingam - 1989 |

10 | Nonparametric modeling of hierarchically exchangeable data - Hoff - 2003 |

8 | K Reckhow, Combining information from related regressions - Dominici, Parmigiani, et al. - 1997 |

8 | An introduction to algorithms for nonlinear optimization - Gould, Leyffer - 2002 |

6 | Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO - Zhang, Liao, et al. - 2004 |

1 | Learning classifiers on a partially labeled data manifold - Liu, Liao, et al. - 2007 |

1 | Multi-task learning for underwater object classification - Stack, Crosby, et al. - 1996 |