Results 1  10
of
21
Largescale Multilabel Learning with Missing Labels
"... The multilabel classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) scaling up to problems with a large number (say millions) of labels, and (b) handling data with missing labels. In this paper, ..."
Abstract

Cited by 16 (4 self)
 Add to MetaCart
(Show Context)
The multilabel classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) scaling up to problems with a large number (say millions) of labels, and (b) handling data with missing labels. In this paper, we directly address both these problems by studying the multilabel problem in a generic empirical risk minimization (ERM) framework. Our framework, despite being simple, is surprisingly able to encompass several recent labelcompression based methods which can be derived as special cases of our method. To optimize the ERM problem, we develop techniques that exploit the structure of specific loss functionssuch as the squared loss function to obtain efficient algorithms. We further show that our learning framework admits excess risk bounds even in the presence of missing labels. Our bounds are tight and demonstrate better generalization performance for lowrank promoting tracenorm regularization when compared to (rank insensitive) Frobenius norm regularization. Finally, we present extensive empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods and can scale up to very large datasets such as a Wikipedia dataset that has more than 200,000 labels. 1.
A convex formulation for semisupervised multilabel feature selection
 In Proceedings of the 28th AAAI Conference on Artificial Intelligence
, 2014
"... Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semisupervised feature selection algorithms have been proposed to ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Explosive growth of multimedia data has brought challenge of how to efficiently browse, retrieve and organize these data. Under this circumstance, different approaches have been proposed to facilitate multimedia analysis. Several semisupervised feature selection algorithms have been proposed to exploit both labeled and unlabeled data. However, they are implemented based on graphs, such that they cannot handle largescale datasets. How to conduct semisupervised feature selection on largescale datasets has become a challenging research problem. Moreover, existing multilabel feature selection algorithms rely on eigendecomposition with heavy computational burden, which further prevent current feature selection algorithms from being applied for big data. In this paper, we propose a novel convex semisupervised multilabel feature selection algorithm, which can be applied to largescale datasets. We evaluate performance of the proposed algorithm over five benchmark datasets and compare the results with stateoftheart supervised and semisupervised feature selection algorithms as well as baseline using all features. The experimental results demonstrate that our proposed algorithm consistently achieve superiors performances.
Fastxml: a fast, accurate and stable treeclassifier for extreme multilabel learning.
 In KDD,
, 2014
"... ABSTRACT The objective in extreme multilabel classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multilabel classification is an important research problem since not only does it enable the tacklin ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
ABSTRACT The objective in extreme multilabel classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multilabel classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking problems with certain advantages over existing formulations. Our objective, in this paper, is to develop an extreme multilabel classifier that is faster to train and more accurate at prediction than the stateoftheart Multilabel Random Forest (MLRF) algorithm [2] and the Label Partitioning for Sublinear Ranking (LPSR) algorithm
Openvocabulary Object Retrieval
"... Abstract—In this paper, we address the problem of retrieving objects based on openvocabulary natural language queries: Given a phrase describing a specific object, e.g., “the corn flakes box”, the task is to find the best match in a set of images containing candidate objects. When naming objects, h ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract—In this paper, we address the problem of retrieving objects based on openvocabulary natural language queries: Given a phrase describing a specific object, e.g., “the corn flakes box”, the task is to find the best match in a set of images containing candidate objects. When naming objects, humans tend to use natural language with rich semantics, including basiclevel categories, finegrained categories, and instancelevel concepts such as brand names. Existing approaches to largescale object recognition fail in this scenario, as they expect queries that map directly to a fixed set of pretrained visual categories, e.g. ImageNet synset tags. We address this limitation by introducing a novel object retrieval method. Given a candidate object image, we first map it to a set of words that are likely to describe it, using several learned imagetotext projections. We also propose a method for handling openvocabularies, i.e., words not contained in the training data. We then compare the natural language query to the sets of words predicted for each candidate and select the best match. Our method can combine category and instancelevel semantics in a common representation. We present extensive experimental results on several datasets using both instancelevel and categorylevel matching and show that our approach can accurately retrieve objects based on extremely varied openvocabulary queries. The source code of our approach will be publicly available together with pretrained models at
Logarithmic time online multiclass prediction. arXiv preprint arXiv:1406.1822
, 2014
"... We study the problem of multiclass classification with an extremely large number of classes, with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop topdown tree construction approaches for constructing logarithmic depth trees. On the theoretical f ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
We study the problem of multiclass classification with an extremely large number of classes, with the goal of obtaining train and test time complexity logarithmic in the number of classes. We develop topdown tree construction approaches for constructing logarithmic depth trees. On the theoretical front, we formulate a new objective function, which is optimized at each node of the tree and creates dynamic partitions of the data which are both pure (in terms of class labels) and balanced. We demonstrate that under favorable conditions, we can construct logarithmic depth trees that have leaves with low label entropy. However, the objective function at the nodes is challenging to optimize computationally. We address the empirical problem with a new online decision tree construction procedure. Experiments demonstrate that this online algorithm quickly achieves small error rates relative to more common O(k) approaches and simultaneously achieves significant improvement in test error compared to other logarithmic training time approaches. 1
Active Learning for Sparse Bayesian Multilabel Classification
"... We study the problem of active learning for multilabel classification. We focus on the realworld scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based nearoptimal active learning in this s ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
(Show Context)
We study the problem of active learning for multilabel classification. We focus on the realworld scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based nearoptimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out efficient, nonmyopic, and nearoptimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.
Provable inductive matrix completion
 CoRR
"... Consider a movie recommendation system where apart from the ratings information, side information such as user’s age or movie’s genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the prob ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
Consider a movie recommendation system where apart from the ratings information, side information such as user’s age or movie’s genre is also available. Unlike standard matrix completion, in this setting one should be able to predict inductively on new users/movies. In this paper, we study the problem of inductive matrix completion in the exact recovery setting. That is, we assume that the ratings matrix is generated by applying feature vectors to a lowrank matrix and the goal is to recover back the underlying matrix. Furthermore, we generalize the problem to that of lowrank matrix estimation using rank1 measurements. We study this generic problem and provide conditions that the set of measurements should satisfy so that the alternating minimization method (which otherwise is a nonconvex method with no convergence guarantees) is able to recover back the exact underlying lowrank matrix. In addition to inductive matrix completion, we show that two other lowrank estimation problems can be studied in our framework: a) general lowrank matrix sensing using rank1 measurements, and b) multilabel regression with missing labels. For both the problems, we provide novel and interesting bounds on the number of measurements required by alternating minimization to provably converges to the exact lowrank matrix. In particular, our analysis for the general low rank matrix sensing problem significantly improves the required storage and computational cost than that required by the RIPbased matrix sensing methods [1]. Finally, we provide empirical validation of our approach and demonstrate that alternating minimization is able to recover the true matrix for the above mentioned problems using a small number of measurements. 1
Scaling Graphbased Semi Supervised Learning to Large Number of Labels Using CountMin Sketch
"... Graphbased Semisupervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propagating label information over the structure of graph starting from seed nodes. Graphbased SSL algorithms usually scale linea ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
(Show Context)
Graphbased Semisupervised learning (SSL) algorithms have been successfully used in a large number of applications. These methods classify initially unlabeled nodes by propagating label information over the structure of graph starting from seed nodes. Graphbased SSL algorithms usually scale linearly with the number of distinct labels (m), and require O(m) space on each node. Unfortunately, there exist many applications of practical significance with very large m over large graphs, demanding better space and time complexity. In this paper, we propose MADSketch, a novel graphbased SSL algorithm which compactly stores label distribution on each node using Countmin Sketch, a randomized data structure. We present theoretical analysis showing that under mild conditions, MADSketch can reduce space complexity at each node from O(m) to O(logm), and achieve similar savings in time complexity as well. We support our analysis through experiments on multiple real world datasets. We observe that MADSketch achieves similar performance as existing stateoftheart graphbased SSL algorithms, while requiring smaller memory footprint and at the same time achieving up to 10x speedup. We find that MADSketch is able to scale to datasets with one million labels, which is beyond the scope of existing graphbased SSL algorithms.
Predicting useful neighborhoods for lazy local learning
 IN: ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS. (2014) 1916–1924
, 2014
"... Lazy local learning methods train a classifier “on the fly” at test time, using only a subset of the training instances that are most relevant to the novel test example. The goal is to tailor the classifier to the properties of the data surrounding the test example. Existing methods assume that the ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Lazy local learning methods train a classifier “on the fly” at test time, using only a subset of the training instances that are most relevant to the novel test example. The goal is to tailor the classifier to the properties of the data surrounding the test example. Existing methods assume that the instances most useful for building the local model are strictly those closest to the test example. However, this fails to account for the fact that the success of the resulting classifier depends on the full distribution of selected training instances. Rather than simply gathering the test example’s nearest neighbors, we propose to predict the subset of training data that is jointly relevant to training its local model. We develop an approach to discover patterns between queries and their “good ” neighborhoods using largescale multilabel classification with compressed sensing. Given a novel test point, we estimate both the composition and size of the training subset likely to yield an accurate local model. We demonstrate the approach on image classification tasks on SUN and aPascal and show its advantages over traditional global and local approaches.
Diverse Yet Efficient Retrieval using Hash Functions
"... ABSTRACT Typical retrieval systems have three requirements: a) Accurate retrieval i.e., the method should have high precision, b) Diverse retrieval, i.e., the obtained set of points should be diverse, c) Retrieval time should be small. However, most of the existing methods address only one or two o ..."
Abstract
 Add to MetaCart
(Show Context)
ABSTRACT Typical retrieval systems have three requirements: a) Accurate retrieval i.e., the method should have high precision, b) Diverse retrieval, i.e., the obtained set of points should be diverse, c) Retrieval time should be small. However, most of the existing methods address only one or two of the above mentioned requirements. In this work, we present a method based on randomized locality sensitive hashing which tries to address all of the above requirements simultaneously. While earlier hashing approaches considered approximate retrieval to be acceptable only for the sake of efficiency, we argue that one can further exploit approximate retrieval to provide impressive tradeoffs between accuracy and diversity. We extend our method to the problem of multilabel prediction, where the goal is to output a diverse and accurate set of labels for a given document in realtime. Moreover, we introduce a new notion to simultaneously evaluate a method's performance for both the precision and diversity measures. Finally, we present empirical results on several different retrieval tasks and show that our method retrieves diverse and accurate images/labels while ensuring 100xspeedup over the existing diverse retrieval approaches.