Results 11  20
of
94
MultiLabel Output Codes using Canonical Correlation Analysis
"... Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
Traditional errorcorrectingoutput codes (ECOCs) decompose a multiclass classification problem into many binary problems. Although it seems natural to use ECOCs for multilabel problems as well, doing so naively createsissues related to: the validity of the encoding, the efficiency of the decoding, the predictabilityofthegeneratedcodeword,and the exploitation of the label dependency. Using canonical correlation analysis, we propose an errorcorrecting code for multilabel classification. Labeldependencyischaracterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by meanfield approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition. 1
MultiLabel Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages
"... Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on maki ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. Most approaches have found it infeasible to determine the relevance of all possible queries to a given ad landing page and have focussed on making recommendations from a small set of phrases extracted (and expanded) from the page using NLP and ranking based techniques. In this paper, we eschew this paradigm, and demonstrate that it is possible to efficiently predicttherelevantsubsetofqueriesfromalargesetofmonetizable ones by posing the problem as a multilabel learning task with each query being represented by a separate label. We develop Multilabel Random Forests to tackle problems with millions of labels. Our proposed classifier has prediction costs that are logarithmic in the number of labels and can make predictions in a few milliseconds using 10 Gb of RAM. We demonstrate that it is possible to generate training data for our classifier automatically from click logs withoutanyhumanannotationorintervention. Wetrainour classifier on tens of millions of labels, features and training points in less than two days on a thousand node cluster. We develop a sparse semisupervised multilabel learning formulation to deal with training set biases and noisy labels harvested automatically from the click logs. This formulation is used to infer a belief in the state of each label for each training ad and the random forest classifier is extended to train on these beliefs rather than the given labels. Experiments reveal significant gains over ranking and NLP based techniques on a large test set of 5 million ads using multiple metrics.
Two are better than one: Fundamental parameters of frame coherence
, 2011
"... This paper investigates two parameters that measure the coherence of a frame: worstcase and average coherence. We first use worstcase and average coherence to derive nearoptimal probabilistic guarantees on both sparse signal detection and reconstruction in the presence of noise. Next, we provide ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
This paper investigates two parameters that measure the coherence of a frame: worstcase and average coherence. We first use worstcase and average coherence to derive nearoptimal probabilistic guarantees on both sparse signal detection and reconstruction in the presence of noise. Next, we provide a catalog of nearly tight frames with small worstcase and average coherence. Later, we find a new lower bound on worstcase coherence; we compare it to the Welch bound and use it to interpret recently reported signal reconstruction results. Finally, we give an algorithm that transforms frames in a way that decreases average coherence without changing the spectral norm or worstcase coherence.
Maximum Margin Output Coding
"... In this paper we study output coding for multilabel prediction. For a multilabel output coding to be discriminative, it is important that codewords for different label vectors are significantly different from each other. In the meantime, unlike in traditional coding theory, codewords in output cod ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
(Show Context)
In this paper we study output coding for multilabel prediction. For a multilabel output coding to be discriminative, it is important that codewords for different label vectors are significantly different from each other. In the meantime, unlike in traditional coding theory, codewords in output coding are to be predicted from the input, so it is also critical to have a predictable label encoding. To find output codes that are both discriminative and predictable, we first propose a maxmargin formulation that naturally captures these two properties. We then convert it to a metric learning formulation, but with an exponentially large number of constraints as commonly encountered in structured prediction problems. Without a label structure for tractable inference, we use overgenerating (i.e., relaxation) techniques combined with the cutting plane method for optimization. In our empirical study, the proposed output coding scheme outperforms a variety of existing multilabel prediction methods for image, text and music classification. 1.
MulticlassMultilabel Classification with More Classes than Examples
"... We discuss multiclassmultilabel classification problems in which the set of classes is extremely large. Most existing multiclassmultilabel learning algorithms expect to observe a reasonably large sample from each class, and fail if they receive only a handful of examples per class. We propose and ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
We discuss multiclassmultilabel classification problems in which the set of classes is extremely large. Most existing multiclassmultilabel learning algorithms expect to observe a reasonably large sample from each class, and fail if they receive only a handful of examples per class. We propose and analyze the following twostage approach: first use an arbitrary (perhaps heuristic) classification algorithm to construct an initial classifier, then apply a simple but principled method to augment this classifier by removing harmful labels from its output. A careful theoretical analysis allows us to justify our approach under some reasonable conditions (such as label sparsity and powerlaw distribution of class frequencies), even when the training set does not provide a statistically accurate representation of most classes. Surprisingly, our theoretical analysis continues to hold even when the number of classes exceeds the sample size. We demonstrate the merits of our approach on the ambitious task of categorizing the entire web using the 1.5 million categories defined on Wikipedia. 1
Featureaware label space dimension reduction for multilabel classification problem
, 2012
"... Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Label space dimension reduction (LSDR) is an efficient and effective paradigm for multilabel classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many realworld datasets. 1
Adaptive Large Margin Training for Multilabel Classification
"... Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without ..."
Abstract

Cited by 13 (10 self)
 Add to MetaCart
(Show Context)
Multilabel classification is a central problem in many areas of data analysis, including text and multimedia categorization, where individual data objects need to be assigned multiple labels. A key challenge in these tasks is to learn a classifier that can properly exploit label correlations without requiring exponential enumeration of label subsets during training or testing. We investigate novel loss functions for multilabel training within a large margin frameworkâ€”identifying a simple alternative that yields improved generalization while still allowing efficient training. We furthermore show how covariances between the label models can be learned simultaneously with the classification model itself, in a jointly convex formulation, without compromising scalability. The resulting combination yields state of the art accuracy in multilabel webpage classification.
Multilabel learning by exploiting label correlations locally
 In AAAI
, 2012
"... It is well known that exploiting label correlations is important for multilabel learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In realworld tasks, however, different instances may share differe ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
It is well known that exploiting label correlations is important for multilabel learning. Existing approaches typically exploit label correlations globally, by assuming that the label correlations are shared by all the instances. In realworld tasks, however, different instances may share different label correlations, and few correlations are globally applicable. In this paper, we propose the MLLOC approach which allows label correlations to be exploited locally. To encode the local influence of label correlations, we derive a LOC code to enhance the feature representation of each instance. The global discrimination fitting and local correlation sensitivity are incorporated into a unified framework, and an alternating solution is developed for the optimization. Experimental results on a number of image, text and gene data sets validate the effectiveness of our approach.
Multilabel classification with principle label space transformation
, 2010
"... We propose a novel hypercube view that perceives the label space of multilabel classification problems geometrically. The view allows us to not only unify many existing multilabel classification approaches, but also design a novel algorithm, Principle Label Space Transformation (PLST), which s ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
(Show Context)
We propose a novel hypercube view that perceives the label space of multilabel classification problems geometrically. The view allows us to not only unify many existing multilabel classification approaches, but also design a novel algorithm, Principle Label Space Transformation (PLST), which seeks important correlations between labels before learning. The simple and efficient PLST relies on only singular value decomposition as the key step. Experimental results demonstrate that PLST is faster than the traditional Binary Relevance approach and is superior to the modern Compressive Sensing approach in terms of both performance and efficiency. 1.
Efficient MaxMargin MultiLabel Classification with Applications to ZeroShot Learning
 MACH LEARN
, 2010
"... The goal in multilabel classification is to tag a data point with the subset of relevant labels from a prespecified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label spa ..."
Abstract

Cited by 12 (2 self)
 Add to MetaCart
The goal in multilabel classification is to tag a data point with the subset of relevant labels from a prespecified set. Given a set of L labels, a data point can be tagged with any of the 2 L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations. Our objective, in this paper, is to design efficient algorithms for multilabel classification when the labels are densely correlated. In particular, we are interested in the zeroshot learning scenario where the label correlations on the training set might be significantly different from those on the test set. We propose a maxmargin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1vsAll method and we provide a principled interpretation of the 1vsAll technique. We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multilabel classification and show that, with some bookkeeping, we can reduce the training time from being superquadratic to almost linear in the number of labels. Furthermore, by effectively reutilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing stateoftheart algorithms. We