### Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation

"... Abstract Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges |E| and distinct labels m. To deal with the large label size problem, recent works propose sketch-based methods to approximate the label distribution per node thereby achieving a space reduction from O(m) to O(log m), under certain conditions. In this paper, we present a novel streaming graphbased SSL approximation that effectively captures the sparsity of the label distribution and further reduces the space complexity per node to O(1). We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. Finally, we propose a robust graph augmentation strategy using unsupervised deep learning architectures that yields further significant quality gains for SSL in natural language applications.

### Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings

"... Abstract We present a scalable Bayesian multi-label learning model based on learning lowdimensional label embeddings. Our model assumes that each label vector is generated as a weighted combination of a set of topics (each topic being a distribution over labels), where the combination weights (i.e. ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract We present a scalable Bayesian multi-label learning model based on learning lowdimensional label embeddings. Our model assumes that each label vector is generated as a weighted combination of a set of topics (each topic being a distribution over labels), where the combination weights (i.e., the embeddings) for each label vector are conditioned on the observed feature vector. This construction, coupled with a Bernoulli-Poisson link function for each label of the binary label vector, leads to a model with a computational cost that scales in the number of positive labels in the label matrix. This makes the model particularly appealing for real-world multi-label learning problems where the label matrix is usually very massive but highly sparse. Using a data-augmentation strategy leads to full local conjugacy in our model, facilitating simple and very efficient Gibbs sampling, as well as an Expectation Maximization algorithm for inference. Also, predicting the label vector at test time does not require doing an inference for the label embeddings and can be done in closed form. We report results on several benchmark data sets, comparing our model with various state-of-the art methods.

### DATA-DRIVEN TAXONOMY FOREST FOR FINE-GRAINED IMAGE CATEGORIZATION

"... Fine-grained image categorization must handle huge cross-class ambiguities and a large number of classes. Inspired by the success of rigid hierarchical classification, we pro-pose a new flexible hierarchical classification method, called a data-driven taxonomy forest. It constructs a multitude of ta ..."

Abstract
- Add to MetaCart

Fine-grained image categorization must handle huge cross-class ambiguities and a large number of classes. Inspired by the success of rigid hierarchical classification, we pro-pose a new flexible hierarchical classification method, called a data-driven taxonomy forest. It constructs a multitude of taxonomies, each of which converts a complex multi-class problem to a more easily tractable path-finding problem. We demonstrate how a stochastic representation of local classifi-cation hypotheses incorporated in multiple taxonomies deals skillfully with error propagation and over-fitting. Various strategies for instance space decomposition are investigated from the viewpoint of taxonomy complexity. We comprehen-sively evaluate our data-driven taxonomy forest using Oxford Flower 102 and Oxford Pet benchmarks and show its superi-ority in effectiveness and generality to rigid hierarchical clas-sification in fine-grained image categorization tasks. Index Terms — Fine-grained image categorization, hier-archical classification, error propagation, bagging

### Locally Non-linear Embeddings for Extreme Multi-label Learning

, 2015

"... The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix ..."

Abstract
- Add to MetaCart

(Show Context)
The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have been unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X1 classifier to address both limitations. The main technical contribution in X1 is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X1 to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X1 can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X1 can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods. 1

### unknown title

"... Random forests with random projections of the output space for high dimensional multi-label classification ..."

Abstract
- Add to MetaCart

(Show Context)
Random forests with random projections of the output space for high dimensional multi-label classification

### Multi-Label Learning with Posterior Regularization

"... In many multi-label learning problems, especially as the number of labels grow, it is challeng-ing to gather completely annotated data. This work presents a new approach for multi-label learning from incomplete annotations. The main assumption is that because of label correla-tion, the true label ma ..."

Abstract
- Add to MetaCart

(Show Context)
In many multi-label learning problems, especially as the number of labels grow, it is challeng-ing to gather completely annotated data. This work presents a new approach for multi-label learning from incomplete annotations. The main assumption is that because of label correla-tion, the true label matrix as well as the soft predictions of classifiers shall be approximately low rank. We introduce a posterior regularization technique which enforces soft constraints on the classifiers, regularizing them to prefer sparse and low-rank predictions. Avoiding strict low-rank constraints results in classifiers which better fit the real data. The model can be trained efficiently using EM and stochastic gradient descent. Experiments in both the image and text domains demonstrate the contributions of each modeling assumption and show that the pro-posed approach achieves state-of-the-art performance on a number of challenging datasets. 1

### Large-Scale Bayesian Multi-Label Learning via Topic-Based Label Embeddings

"... We present a scalable Bayesian multi-label learning model based on learning low-dimensional label embeddings. Our model assumes that each label vector is gen-erated as a weighted combination of a set of topics (each topic being a distribution over labels), where the combination weights (i.e., the em ..."

Abstract
- Add to MetaCart

(Show Context)
We present a scalable Bayesian multi-label learning model based on learning low-dimensional label embeddings. Our model assumes that each label vector is gen-erated as a weighted combination of a set of topics (each topic being a distribution over labels), where the combination weights (i.e., the embeddings) for each label vector are conditioned on the observed feature vector. This construction, coupled with a Bernoulli-Poisson link function for each label of the binary label vector, leads to a model with a computational cost that scales in the number of posi-tive labels in the label matrix. This makes the model particularly appealing for real-world multi-label learning problems where the label matrix is usually very massive but highly sparse. Using a data-augmentation strategy leads to full local conjugacy in our model, facilitating simple and very efficient Gibbs sampling, as well as an Expectation Maximization algorithm for inference. Also, predicting the label vector at test time does not require doing an inference for the label em-beddings and can be done in closed form. We report results on several benchmark data sets, comparing our model with various state-of-the art methods. 1

### Open-vocabulary Object Retrieval

"... Abstract—In this paper, we address the problem of retrieving objects based on open-vocabulary natural language queries: Given a phrase describing a specific object, e.g., “the corn flakes box”, the task is to find the best match in a set of images containing candidate objects. When naming objects, h ..."

Abstract
- Add to MetaCart

(Show Context)
Abstract—In this paper, we address the problem of retrieving objects based on open-vocabulary natural language queries: Given a phrase describing a specific object, e.g., “the corn flakes box”, the task is to find the best match in a set of images containing candidate objects. When naming objects, humans tend to use natural language with rich semantics, including basic-level categories, fine-grained categories, and instance-level concepts such as brand names. Existing approaches to large-scale object recognition fail in this scenario, as they expect queries that map directly to a fixed set of pre-trained visual categories, e.g. ImageNet synset tags. We address this limitation by introducing a novel object retrieval method. Given a candidate object image, we first map it to a set of words that are likely to describe it, using several learned image-to-text projections. We also propose a method for handling open-vocabularies, i.e., words not contained in the training data. We then compare the natural language query to the sets of words predicted for each candidate and select the best match. Our method can combine category- and instance-level semantics in a common representation. We present extensive experimental results on several datasets using both instance-level and category-level matching and show that our approach can accurately retrieve objects based on extremely varied open-vocabulary queries. The source code of our approach will be publicly available together with pre-trained models at