Results 1  10
of
31
Good Practice in LargeScale Learning for Image Classification
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (TPAMI)
, 2013
"... We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods i ..."
Abstract

Cited by 51 (7 self)
 Add to MetaCart
(Show Context)
We benchmark several SVM objective functions for largescale image classification. We consider onevsrest, multiclass, ranking, and weighted approximate ranking SVMs. A comparison of online and batch methods for optimizing the objectives shows that online methods perform as well as batch methods in terms of classification accuracy, but with a significant gain in training speed. Using stochastic gradient descent, we can scale the training to millions of images and thousands of classes. Our experimental evaluation shows that rankingbased algorithms do not outperform the onevsrest strategy when a large number of training examples are used. Furthermore, the gap in accuracy between the different algorithms shrinks as the dimension of the features increases. We also show that learning through crossvalidation the optimal rebalancing of positive and negative examples can result in a significant improvement for the onevsrest strategy. Finally, early stopping can be used as an effective regularization strategy when training with online algorithms. Following these “good practices”, we were able to improve the stateoftheart on a large subset of 10K classes and 9M images of ImageNet from 16.7 % Top1 accuracy to 19.1%.
Learning to Rank with (a Lot of) Word Features
, 2009
"... In this article we present Supervised Semantic Indexing (SSI) which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word content in a querydocument or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our mod ..."
Abstract

Cited by 25 (2 self)
 Add to MetaCart
In this article we present Supervised Semantic Indexing (SSI) which defines a class of nonlinear (quadratic) models that are discriminatively trained to directly map from the word content in a querydocument or documentdocument pair to a ranking score. Like Latent Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our models are trained from a supervised signal directly on the ranking task of interest, which we argue is the reason for our superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval tasks, such as crosslanguage retrieval or online advertising placement. Dealing with models on all pairs of words features is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low rank (but diagonal preserving) representations, correlated feature hashing (CFH) and sparsification. We provide an empirical study of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain stateoftheart performance while providing realistically scalable methods.
Decomposing background topic from keywords by principle component pursuit
 Proceeding of the 19th ACM interational conference on Information and knowledge
, 2010
"... Lowdimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools such as Principal Component Analysis or Latent Semantic Indexing (LSI) have been widely adopted for document modeling, ana ..."
Abstract

Cited by 23 (3 self)
 Add to MetaCart
(Show Context)
Lowdimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools such as Principal Component Analysis or Latent Semantic Indexing (LSI) have been widely adopted for document modeling, analysis, and retrieval. In this paper, we contend that a more pertinent model for a document corpus as the combination of an (approximately) lowdimensional topic model for the corpus and a sparse model for the keywords of individual documents. For such a joint topicdocument model, LSI or PCA is no longer appropriate to analyze the corpus data. We hence introduce a powerful new tool called Principal Component Pursuit that can effectively decompose the lowdimensional and the sparse components of such corpus data. We give empirical results on data synthesized with a Latent Dirichlet Allocation (LDA) mode to validate the new model. We then show that for real document data analysis, the new tool significantly reduces the perplexity and improves retrieval performance compared to classical baselines.
Polynomial Semantic Indexing
"... We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a querydocument or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a lowrank (but diag ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
(Show Context)
We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a querydocument or documentdocument pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a lowrank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain stateoftheart performance while providing realistically scalable methods. 1
Online Learning in the Embedded Manifold of Lowrank Matrices
"... When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of lowrank matrices are eithe ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of lowrank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the lowrank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a secondorder retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O((n+m)k) for a rankk matrix of dimension m×n, when using an online procedure with rankone gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passiveaggressive approach in a factorized model, and also improves over a full model trained on preselected features using the same memory requirements. We further adapt LORETA to learn positive semidefinite lowrank matrices, providing an online algorithm for lowrank metric learning. LORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multilabel image classification task.
Online Learning in the Manifold of LowRank Matrices
"... When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of lowrank matrices are eith ..."
Abstract

Cited by 9 (0 self)
 Add to MetaCart
(Show Context)
When learning models that are represented in matrix forms, enforcing a lowrank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of lowrank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a secondorder retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another secondorder retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rankk matrix of dimension m × n, given rankone gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive aggressive approach in a factorized model, and also improves over a full model trained over preselected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multilabel image classification task. 1
Sentiment Classification Based on Supervised Latent ngram Analysis
"... In this paper, we propose an efficient embedding for modeling higherorder (ngram) phrases that projects the ngrams to lowdimensional latent semantic space, where a classification function can be defined. We utilize a deep neural network to build a unified discriminative framework that allows for ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we propose an efficient embedding for modeling higherorder (ngram) phrases that projects the ngrams to lowdimensional latent semantic space, where a classification function can be defined. We utilize a deep neural network to build a unified discriminative framework that allows for estimating the parameters of the latent space as well as the classification function with a bias for the target classification task at hand. We apply the framework to largescale sentimental classification task. We present comparative evaluation of the proposed method on two (large) benchmark data sets for online product reviews. The proposed method achieves superior performance in comparison to the state of the art.
Exploiting the Forgiving Nature of Applications for Scalable Parallel Execution. IPDPS
, 2010
"... Abstract—It is widely believed that most Recognition and Mining (RM) workloads can easily take advantage of parallel computing platforms because these workloads are dataparallel. Contrary to this popular belief, we present RM workloads for which conventional parallel implementations scale poorly on ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—It is widely believed that most Recognition and Mining (RM) workloads can easily take advantage of parallel computing platforms because these workloads are dataparallel. Contrary to this popular belief, we present RM workloads for which conventional parallel implementations scale poorly on multicore platforms. We identify offchip memory transfers and overheads in the parallel runtime library as the primary bottlenecks that limit speedups to be well below the ideal linear speedup expected for dataparallel workloads. To achieve improved parallel scalability, we identify and exploit several interesting properties of RM workloads — sparsity of model updates, low spatial locality among model updates, presence of insignificant computations, and the inherently selfhealing nature of these algorithms in the presence of errors. We leverage these domainspecific characteristics to
A deep architecture for matching short texts
 In: Advances in Neural Information Processing Systems
, 2013
"... Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling eff ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufficient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More specifically, we apply this model to matching tasks in natural language, e.g., finding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the stateoftheart models. 1
From sBoW to dCoT: Marginalized Encoders for Text Representation
 21st ACM Conference on Information and Knowledge Management (CIKM
, 2012
"... In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TFIDF [1]). Although simple and intuitive, sBoW style representations suffer from their inherent oversparsity and fail to capture wordl ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TFIDF [1]). Although simple and intuitive, sBoW style representations suffer from their inherent oversparsity and fail to capture wordlevel synonymy and polysemy. Especially when labeled data is limited (e.g. in document classification), or the text documents are short (e.g. emails or abstracts), many features are rarely observed within the training corpus. This leads to overfitting and reduced generalization accuracy. In this paper we propose Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features. dCoT explicitly models absent words by removing and reconstructing random subsets of words in the unlabeled corpus. With this approach, dCoT learns to reconstruct frequent words from cooccurring infrequent words and maps the high dimensional sparse sBoW vectors into a lowdimensional dense representation. We show that the feature removal can be marginalized out and that the reconstruction can be solved for in closedform. We demonstrate empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks. A full version of this paper is available at