Results 1  10
of
40
Multiinstance multilabel learning
 Artificial Intelligence
"... In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicate ..."
Abstract

Cited by 38 (16 self)
 Add to MetaCart
(Show Context)
In this paper, we propose the MIML (MultiInstance MultiLabel learning) framework where an example is described by multiple instances and associated with multiple class labels. Compared to traditional learning frameworks, the MIML framework is more convenient and natural for representing complicated objects which have multiple semantic meanings. To learn from MIML examples, we propose the MimlBoost and MimlSvm algorithms based on a simple degeneration strategy, and experiments show that solving problems involving complicated objects with multiple semantic meanings in the MIML framework can lead to good performance. Consideringthat the degeneration process may lose information, we propose the DMimlSvm algorithm which tackles MIML problems directly in a regularization framework. Moreover, we show that even when we do not have access to the real objects and thus cannot capture more information from real objects by using the MIML representation, MIML is still useful. We propose the InsDif and SubCod algorithms. InsDif works by transforming singleinstances into the MIML representation for learning, while SubCod works by transforming singlelabel examples into the MIML representation for learning. Experiments show that in some tasks they are able to achieve better performance than learning the singleinstances or singlelabel examples directly.
Learning multiple tasks with a sparse matrixnormal penalty
 In Advances in Neural Information Processing Systems
, 2010
"... In this paper, we propose a matrixvariate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Followi ..."
Abstract

Cited by 28 (3 self)
 Add to MetaCart
(Show Context)
In this paper, we propose a matrixvariate normal penalty with sparse inverse covariances to couple multiple tasks. Learning multiple (parametric) models can be viewed as estimating a matrix of parameters, where rows and columns of the matrix correspond to tasks and features, respectively. Following the matrixvariate normal density, we design a penalty that decomposes the full covariance of matrix elements into the Kronecker product of row covariance and column covariance, which characterizes both task relatedness and feature representation. Several recently proposed methods are variants of the special cases of this formulation. To address the overfitting issue and select meaningful task and feature structures, we include sparse covariance selection into our matrixnormal regularization via ℓ1 penalties on task and feature inverse covariances. We empirically study the proposed method and compare with related models in two realworld problems: detecting landmines in multiple fields and recognizing faces between different subjects. Experimental results show that the proposed framework provides an effective and flexible way to model various different structures of multiple tasks. 1
Submodular MultiLabel Learning
"... In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of label ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
(Show Context)
In this paper we present an algorithm to learn a multilabel classifier which attempts at directly optimising the Fscore. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the coocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graphcuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multilabel datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
Adaptive Large Margin Training for Multilabel Classification
"... Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
(Show Context)
Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate novel loss functions for multilabel training within a large margin framework—identifying a simple alternative that yields improved generalization while still allowing efficient training. We furthermore show how covariances between the label models can be learned simultaneously with the classification model itself, in a jointly convex formulation, without compromising scalability. The resulting combination yields state of the art accuracy in multilabel webpage classification.
Multilabel learning by exploiting label correlations locally
 In AAAI
, 2012
"... It is well known that exploiting label correlations is important for multilabel learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In realworld tasks, however, different instances may share differe ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
It is well known that exploiting label correlations is important for multilabel learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In realworld tasks, however, different instances may share different label correlations, and few correlations are globally applicable. In this paper, we propose the MLLOC approach which allows label correlations to be exploited locally. To encode the local influence of label correlations, we derive a LOC code to enhance the feature representation of each instance. The global discrimination fitting and local correlation sensitivity are incorporated into a unified framework, and an alternating solution is developed for the optimization. Experimental results on a number of image, text and gene data sets validate the effectiveness of our approach.
Largescale Multilabel Learning with Missing Labels
"... The multilabel classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) scaling up to problems with a large number (say millions) of labels, and (b) handling data with missing labels. In this paper, ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
The multilabel classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) scaling up to problems with a large number (say millions) of labels, and (b) handling data with missing labels. In this paper, we directly address both these problems by studying the multilabel problem in a generic empirical risk minimization (ERM) framework. Our framework, despite being simple, is surprisingly able to encompass several recent labelcompression based methods which can be derived as special cases of our method. To optimize the ERM problem, we develop techniques that exploit the structure of specific loss functionssuch as the squared loss function to obtain efficient algorithms. We further show that our learning framework admits excess risk bounds even in the presence of missing labels. Our bounds are tight and demonstrate better generalization performance for lowrank promoting tracenorm regularization when compared to (rank insensitive) Frobenius norm regularization. Finally, we present extensive empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods and can scale up to very large datasets such as a Wikipedia dataset that has more than 200,000 labels. 1.
Efficient MaxMargin MultiLabel Classification with Applications to ZeroShot Learning
 MACH LEARN
, 2010
"... The goal in multilabel classification is to tag a data point with the subset of relevant labels from a prespecified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label spa ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The goal in multilabel classification is to tag a data point with the subset of relevant labels from a prespecified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multilabel classification when the labels are densely correlated. In particular, we are interested in the zeroshot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a maxmargin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1vsAll method and we provide a principled interpretation of the 1vsAll technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multilabel classification and show that, with some bookkeeping, we can reduce the training time from being superquadratic to almost linear in the number of labels. Furthermore, by effectively reutilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing stateoftheart algorithms. We
Multilabel hypothesis reuse
 In KDD
"... Multilabel learning arises in many realworld tasks where an object is naturally associated with multiple concepts. It is wellaccepted that, in order to achieve a good performance, the relationship among labels should be exploited. Most existing approaches require the label relationship as prior k ..."
Abstract

Cited by 10 (4 self)
 Add to MetaCart
(Show Context)
Multilabel learning arises in many realworld tasks where an object is naturally associated with multiple concepts. It is wellaccepted that, in order to achieve a good performance, the relationship among labels should be exploited. Most existing approaches require the label relationship as prior knowledge, or exploit by counting the label cooccurrence. In this paper, we propose the MAHR approach, which is able to automatically discover and exploit label relationship. Our basic idea is that, if two labels are related, the hypothesis generated for one label can be helpful for the other label. MAHR implements the idea as a boosting approach with a hypothesis reuse mechanism. In each boosting round, the base learner for a label is generated by not only learning on its own task but also reusing the hypotheses from other labels, and the amount of reuse across labels provides an estimate of the label relationship. Extensive experimental results validate that MAHR is able to achieve superior performance and discover reasonable label relationship. Moreover, we disclose that the label relationship is usually asymmetric.
Vectorvalued Manifold Regularization
"... We consider the general problem of learning an unknown functional dependency, f: X ↦ → Y, between a structured input space X and a structured output space Y, fromlabeled and unlabeled examples. We formulate this problem in terms of datadependent regularization in Vectorvalued Reproducing Kernel Hi ..."
Abstract

Cited by 7 (4 self)
 Add to MetaCart
We consider the general problem of learning an unknown functional dependency, f: X ↦ → Y, between a structured input space X and a structured output space Y, fromlabeled and unlabeled examples. We formulate this problem in terms of datadependent regularization in Vectorvalued Reproducing Kernel Hilbert Spaces (Micchelli & Pontil, 2005) whichelegantlyextendfamiliarscalarvalued kernel methods to the general setting where Y has a Hilbert space structure. Our methods provide a natural extension of Manifold Regularization (Belkin et al., 2006) algorithms to also exploit output interdependencies while enforcing smoothness with respect to input data geometry. We propose a class of matrixvalued kernels which allow efficient implementations of our algorithms via the use of numerical solvers for Sylvester matrix equations. On multilabel image annotation and text classification problems, we find favorable empirical comparisons against several competing alternatives. 1.
Multilabel Classification using Bayesian Compressed Sensing
"... In this paper, we present a Bayesian framework for multilabel classification using compressed sensing. The key idea in compressed sensing for multilabel classification is to first project the label vector to a lower dimensional space using a random transformation and then learn regression functions ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
In this paper, we present a Bayesian framework for multilabel classification using compressed sensing. The key idea in compressed sensing for multilabel classification is to first project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efficient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key benefits of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show significant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1