Results 1 - 10
of
13
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty
"... In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
In this paper, we propose a matrix-variate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrix-variate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrix-normal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two real-world problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks. 1
Submodular Multi-Label Learning
"... In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F-score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of label ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F-score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
Mach Learn manuscript No. (will be inserted by the editor) Efficient Max-Margin Multi-Label Classification with Applications to Zero-Shot Learning
, 2010
"... Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large ..."
Abstract
- Add to MetaCart
Abstract The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We
Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence Multi-Kernel Multi-Label Learning with Max-Margin Concept Network
"... In this paper, a novel method is developed for enabling Multi-Kernel Multi-Label Learning. Interlabel dependency and similarity diversity are simultaneously leveraged in the proposed method. A concept network is constructed to capture the inter-label correlations for classifier training. Maximal mar ..."
Abstract
- Add to MetaCart
In this paper, a novel method is developed for enabling Multi-Kernel Multi-Label Learning. Interlabel dependency and similarity diversity are simultaneously leveraged in the proposed method. A concept network is constructed to capture the inter-label correlations for classifier training. Maximal margin approach is used to effectively formulate the feature-label associations and the labellabel correlations. Specific kernels are learned not only for each label but also for each pair of the inter-related labels. By learning the eigenfunctions of the kernels, the similarity between a new data point and the training samples can be computed in the online mode. Our experimental results on real datasets (web pages, images, music, and bioinformatics) have demonstrated the effectiveness of our method. 1
Supervision Reduction by Encoding Extra Information about Models, Features and Labels
, 2011
"... Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of ..."
Abstract
- Add to MetaCart
Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of extra information in learning. Extra information about the model space can be encoded as compression operations and used to regularize models in terms of compressibility. This leads to learning compressible models. Examples of model compressibility include local smoothness, compacted energy in frequency domains, and parameter correlation. When multiple related tasks are learned together, such a compact representation can be automatically inferred as a matrix-variate normal distribution with sparse inverse covariances on the parameter matrix, which simultaneously captures both task relations and feature structures. Extra information about the feature space can usually be conveyed by certain feature reduction. We propose the projection penalty to encode any feature reduction without the risk of discarding useful information: a reduction of the feature space can be viewed as a restriction of the model search to certain model subspace, and instead of directly imposing such a restriction, we can search in the full model space but penalize the projection distance to the model subspace. In multi-view learning, the projection penalty framework provides
Multi-Kernel Multi-Label Learning with Max-Margin Concept Network
- PROCEEDINGS OF THE TWENTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE
"... In this paper, a novel method is developed for enabling Multi-Kernel Multi-Label Learning. Interlabel dependency and similarity diversity are simultaneously leveraged in the proposed method. A concept network is constructed to capture the inter-label correlations for classifier training. Maximal mar ..."
Abstract
- Add to MetaCart
In this paper, a novel method is developed for enabling Multi-Kernel Multi-Label Learning. Interlabel dependency and similarity diversity are simultaneously leveraged in the proposed method. A concept network is constructed to capture the inter-label correlations for classifier training. Maximal margin approach is used to effectively formulate the feature-label associations and the labellabel correlations. Specific kernels are learned not only for each label but also for each pair of the inter-related labels. By learning the eigenfunctions of the kernels, the similarity between a new data point and the training samples can be computed in the online mode. Our experimental results on real datasets (web pages, images, music, and bioinformatics) have demonstrated the effectiveness of our method.
Correlated Multi-Label Feature Selection
"... Multi-label learning studies the problem where each instance is associated with a set of labels. There are two challenges in multi-label learning: (1) the labels are interdependent and correlated, and (2) the data are of high dimensionality. In this paper, we aim to tackle these challenges in one sh ..."
Abstract
- Add to MetaCart
Multi-label learning studies the problem where each instance is associated with a set of labels. There are two challenges in multi-label learning: (1) the labels are interdependent and correlated, and (2) the data are of high dimensionality. In this paper, we aim to tackle these challenges in one shot. In particular, we propose to learn the label correlation and do feature selection simultaneously. We introduce a matrix-variate Normal prior distribution on the weight vectors of the classifier to model the label correlation. Our goal is to find a subset of features, based on which the label correlation regularized loss of label ranking is minimized. The resulting multi-label feature selection problem is a mixed integer programming, which is reformulated as quadratically constrained linear programming (QCLP). It can be solved by cutting plane algorithm, in each iteration of which a minimax optimization problem is solved by dual coordinate descent and projected sub-gradient descent alternatively. Experiments on benchmark data sets illustrate that the proposed methods outperform single-label feature selection method and many other state-of-the-art multi-label learning methods.
Vector-valued Manifold Regularization
"... We consider the general problem of learning an unknown functional dependency, f: X ↦ → Y, between a structured input space X and a structured output space Y, fromlabeled and unlabeled examples. We formulate this problem in terms of data-dependent regularization in Vector-valued Reproducing Kernel Hi ..."
Abstract
- Add to MetaCart
We consider the general problem of learning an unknown functional dependency, f: X ↦ → Y, between a structured input space X and a structured output space Y, fromlabeled and unlabeled examples. We formulate this problem in terms of data-dependent regularization in Vector-valued Reproducing Kernel Hilbert Spaces (Micchelli & Pontil, 2005) whichelegantlyextendfamiliarscalarvalued kernel methods to the general setting where Y has a Hilbert space structure. Our methods provide a natural extension of Manifold Regularization (Belkin et al., 2006) algorithms to also exploit output inter-dependencies while enforcing smoothness with respect to input data geometry. We propose a class of matrix-valued kernels which allow efficient implementations of our algorithms via the use of numerical solvers for Sylvester matrix equations. On multilabel image annotation and text classification problems, we find favorable empirical comparisons against several competing alternatives. 1.
Adaptive Large Margin Training for Multilabel Classification
"... Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without ..."
Abstract
- Add to MetaCart
Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate novel loss functions for multilabel training within a large margin framework—identifying a simple alternative that yields improved generalization while still allowing efficient training. We furthermore show how covariances between the label models can be learned simultaneously with the classification model itself, in a jointly convex formulation, without compromising scalability. The resulting combination yields state of the art accuracy in multilabel webpage classification.
Maximum Margin Multi-Label Structured Prediction
"... We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable i ..."
Abstract
- Add to MetaCart
We study multi-label prediction for structured output sets, a problem that occurs, for example, in object detection in images, secondary structure prediction in computational biology, and graph matching with symmetries. Conventional multilabel classification techniques are typically not applicable in this situation, because they require explicit enumeration of the label set, which is infeasible in case of structured outputs. Relying on techniques originally designed for single-label structured prediction, in particular structured support vector machines, results in reduced prediction accuracy, or leads to infeasible optimization problems. In this work we derive a maximum-margin training formulation for multi-label structured prediction that remains computationally tractable while achieving high prediction accuracy. It also shares most beneficial properties with single-label maximum-margin approaches, in particular formulation as a convex optimization problem, efficient working set training, and PAC-Bayesian generalization bounds. 1

