Results 11 - 20
of
53
Learning Query-dependent Prefilters for Scalable Image Retrieval
"... We describe an algorithm for similar-image search which is designed to be efficient for extremely large collections of images. For each query, a small response set is selected by a fast prefilter, after which a more accurate ranker may be applied to each image in the response set. We consider a clas ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
We describe an algorithm for similar-image search which is designed to be efficient for extremely large collections of images. For each query, a small response set is selected by a fast prefilter, after which a more accurate ranker may be applied to each image in the response set. We consider a class of prefilters comprising disjunctions of conjunctions (“ORs of ANDs”) of Boolean features. AND filters can be implemented efficiently using skipped inverted files, a key component of web-scale text search engines. These structures permit search in time proportional to the response set size. The prefilters are learned from training examples, and refined at query time to produce an approximately bounded response set. We cast prefiltering as an optimization problem: for each test query, select the OR-of-AND filter which maximizes training-set recall for an adjustable bound on response set size. This may be efficiently implemented by selecting from a large pool of candidate conjunctions of Boolean features using a linear program relaxation. Tests on object class recognition show that this relatively simple filter is nevertheless powerful enough to capture some semantic information. 1.
Empowering Visual Categorization With the GPU
, 2011
"... Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Visual categorization is important to manage large collections of digital images and video, where textual metadata is often incomplete or simply unavailable. The bag-of-words model has become the most powerful method for visual categorization of images and video. Despite its high accuracy, a severe drawback of this model is its high computational cost. As the trend to increase computational power in newer CPU and GPU architectures is to increase their level of parallelism, exploiting this parallelism becomes an important direction to handle the computational cost of the bag-of-words approach. When optimizing a system based on the bag-of-words approach, the goal is to minimize the time it takes to process batches of images. In this paper, we analyze the bag-of-words model for visual categorization in terms of computational cost and identify two major bottlenecks: the quantization step and the classification step. We address these two bottlenecks by proposing two efficient algorithms for quantization and classification by exploiting the GPU hardware and the CUDA parallel programming model. The algorithms are designed to 1) keep categorization accuracy intact, 2) decompose the problem, and 3) give the same numerical results. In the experiments on large scale datasets, it is shown that, by using a parallel implementation on the Geforce GTX260 GPU, classifying unseen images is 4.8 times faster than a quad-core CPU version on the Core i7 920, while giving the exact same numerical results. In addition, we show how the algorithms can be generalized to other applications, such as text retrieval and video retrieval. Moreover, when the obtained speedup is used to process extra video frames in a video retrieval benchmark, the accuracy of visual categorization is improved by 29%.
Co-training with Noisy Perceptual Observations
"... Many perception problems involve datasets that are naturally comprised of multiple streams or modalities for which supervised training data is only sparsely available. In cases where there is a degree of conditional independence between such views, a class of semisupervised learning techniques that ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Many perception problems involve datasets that are naturally comprised of multiple streams or modalities for which supervised training data is only sparsely available. In cases where there is a degree of conditional independence between such views, a class of semisupervised learning techniques that are based on maximizing view agreement over unlabeled data has been proven successful in a wide range of machine learning domains. However, these ‘co-training ’ or ‘multi-view’ learning methods have had relatively limited application in vision, due in part to the assumption of constant perchannel noise models. In this paper we propose a probabilistic heteroscedastic approach to co-training that simultaneously discovers the amount of noise on a persample basis, while solving the classification task. This results in high performance in the presence of occlusion or other complex observation noise processes. We demonstrate our approach in two domains, multi-view object recognition from low-fidelity sensor networks and audio-visual classification. 1.
Critical Nets and Beta-Stable Features for Image Matching
"... Abstract. We propose new ideas and efficient algorithms towards bridging the gap between bag-of-features and constellation descriptors for image matching. Specifically, we show how to compute connections between local image features in the form of a critical net whose construction is repeatable acro ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Abstract. We propose new ideas and efficient algorithms towards bridging the gap between bag-of-features and constellation descriptors for image matching. Specifically, we show how to compute connections between local image features in the form of a critical net whose construction is repeatable across changes of viewing conditions or scene configuration. Arcs of the net provide a more reliable frame of reference than individual features do for the purpose of invariance. In addition, regions associated with either small stars or loops in the critical net can be used as parts for recognition or retrieval, and subgraphs of the critical net that are matched across images exhibit common structures shared by different images. We also introduce the notion of beta-stable features, a variation on the notion of feature lifetime from the literature of scale space. Our experiments show that arc-based SIFT-like descriptors of beta-stable features are more repeatable and more accurate than competing descriptors. We also provide anecdotal evidence of the usefulness of image parts and of the structures that are found to be common across images. Key words: bag-of-features, constellation, image matching 1
Spatial Pyramid Matching
"... This chapter deals with the problem of whole-image categorization. We may want to classify a photograph based on a high-level semantic attribute (e.g., indoor or outdoor), scene type (forest, street, office, etc.), or object category (car, face, etc.). Our philosophy is that such global image tasks ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This chapter deals with the problem of whole-image categorization. We may want to classify a photograph based on a high-level semantic attribute (e.g., indoor or outdoor), scene type (forest, street, office, etc.), or object category (car, face, etc.). Our philosophy is that such global image tasks can be approached in a holistic fashion: It should be possible to develop image representations that use low-level features to directly infer high-level semantic information about the scene without going through the intermediate step of segmenting the image into more “basic” semantic entities. For example, we should be able to recognize that an image contains a beach scene without first segmenting and identifying its separate components, such as sand, water, sky, or bathers. This philosophy is inspired by psychophysical and psychological evidence that people can recognize scenes by considering them in a “holistic ” manner, while overlooking most of the details of the constituent objects (Oliva and Torralba, 2001). It has been shown that human subjects can perform high-level categorization tasks extremely rapidly
Metric and Kernel Learning Using a Linear Transformation
"... Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Metric and kernel learning arise in several machine learning applications. However, most existing metric learning algorithms are limited to learning metrics over low-dimensional data, while existing kernel learning algorithms are often limited to the transductive setting and do not generalize to new data points. In this paper, we study the connections between metric learning and kernel learning that arise when studying metric learning as a linear transformation learning problem. In particular, we propose a general optimization framework for learning metrics via linear transformations, and analyze in detail a special case of our framework—that of minimizing the LogDet divergence subject to linear constraints. We then propose a general regularized framework for learning a kernel matrix, and show it to be equivalent to our metric learning framework. Our theoretical connections between metric and kernel learning have two main consequences: 1) the learned kernel matrix parameterizes a linear transformation kernel function and can be applied inductively to new data points, 2) our result yields a constructive method for kernelizing most existing Mahalanobis metric learning formulations. We demonstrate our learning approach by applying it to large-scale real world problems in computer vision, text mining and semi-supervised kernel dimensionality reduction. Keywords: divergence metric learning, kernel learning, linear transformation, matrix divergences, logdet 1.
Context-dependent kernel design for object matching and recognition
- Research Report N 2007D018, ENST Paris, ParisTech
, 2007
"... The success of kernel methods including support vector networks (SVMs) strongly depends on the design of appropriate kernels. While initially kernels were designed in order to handle fixed-length data, their extension to unordered, variable-length data became more than necessary for real pattern rec ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The success of kernel methods including support vector networks (SVMs) strongly depends on the design of appropriate kernels. While initially kernels were designed in order to handle fixed-length data, their extension to unordered, variable-length data became more than necessary for real pattern recognition problems such as object recognition and bioinformatics. We focus in this paper on object recognition using a new type of kernel referred to as “context-dependent”. Objects, seen as constellations of local features (interest points, regions, etc.), are matched by minimizing an energy function mixing (1) a fidelity term which measures the quality of feature matching, (2) a neighborhood criteria which captures the object geometry and (3) a regularization term. We will show that the fixed-point of this energy is a “contextdependent” kernel (“CDK”) which also satisfies the Mercer condition. Experiments conducted on object recognition show that when plugging our kernel in SVMs, we clearly outperform SVMs with “context-free ” kernels. 1.
Acquiring vocabulary through human robot interaction: a learning architecture for grounding words with multiple meanings. AAAI-FSS-10 on Dialogue with Robots
, 2010
"... This paper presents a robust methodology for grounding vocabulary in robots. A social language grounding experiment is designed, where, a human instructor teaches a robotic agent the names of the objects present in a visually shared environment. Any system for grounding vocabulary has to incorporate ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper presents a robust methodology for grounding vocabulary in robots. A social language grounding experiment is designed, where, a human instructor teaches a robotic agent the names of the objects present in a visually shared environment. Any system for grounding vocabulary has to incorporate the properties of gradual evolution and lifelong learning. The learning model of the robot is adopted from an ongoing work on developing systems that conform to these properties. Significant modifications have been introduced to the adopted model, especially to handle words with multiple meanings. A novel classification strategy has been developed for improving the performance of each classifier for each learned category. A set of six new nearest-neighbor based classifiers have also been integrated into the agent architecture. A series of experiments were conducted to test the performance of the new model on vocabulary acquisition. The robot was shown to be robust at acquiring vocabulary and has the potential to learn a far greater number of words (with either single or multiple meanings).
Inductive regularized learning of kernel functions
- In Proccedings of Advances in Neural Information Processing Systems (NIPS
, 2010
"... In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning pro ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In this paper we consider the problem of semi-supervised kernel function learning. We first propose a general regularized framework for learning a kernel matrix, and then demonstrate an equivalence between our proposed kernel matrix learning framework and a general linear transformation learning problem. Our result shows that the learned kernel matrices parameterize a linear transformation kernel function and can be applied inductively to new data points. Furthermore, our result gives a constructive method for kernelizing most existing Mahalanobis metric learning formulations. To make our results practical for large-scale data, we modify our framework to limit the number of parameters in the optimization process. We also consider the problem of kernelized inductive dimensionality reduction in the semi-supervised setting. To this end, we introduce a novel method for this problem by considering a special case of our general kernel learning framework where we select the trace norm function as the regularizer. We empirically demonstrate that our framework learns useful kernel functions, improving the k-NN classification accuracy significantly in a variety of domains. Furthermore, our kernelized dimensionality reduction technique significantly reduces the dimensionality of the feature space while achieving competitive classification accuracies. 1
Random Fourier Approximations for Skewed Multiplicative Histogram Kernels
"... Abstract. Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale kernel machines [4]. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections with inner ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
Abstract. Approximations based on random Fourier features have recently emerged as an efficient and elegant methodology for designing large-scale kernel machines [4]. By expressing the kernel as a Fourier expansion, features are generated based on a finite set of random basis projections with inner products that are Monte Carlo approximations to the original kernel. However, the original Fourier features are only applicable to translation-invariant kernels and are not suitable for histograms that are always non-negative. This paper extends the concept of translation-invariance and the random Fourier feature methodology to arbitrary, locally compact Abelian groups. Based on empirical observations drawn from the exponentiated χ 2 kernel, the state-of-the-art for histogram descriptors, we propose a new group called the skewedmultiplicative group and design translation-invariant kernels on it. Experiments show that the proposed kernels outperform other kernels that can be similarly approximated. In a semantic segmentation experiment on the PASCAL VOC 2009 dataset, the approximation allows us to train large-scale learning machines more than two orders of magnitude faster than previous nonlinear SVMs. 1

