Results 1 - 10
of
52
Distance metric learning for large margin nearest neighbor classification
- In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract
-
Cited by 177 (7 self)
- Add to MetaCart
We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Image retrieval: ideas, influences, and trends of the new age
- ACM COMPUTING SURVEYS
, 2008
"... We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger ass ..."
Abstract
-
Cited by 157 (3 self)
- Add to MetaCart
We have witnessed great interest and a wealth of promise in content-based image retrieval as an emerging technology. While the last decade laid foundation to such promise, it also paved the way for a large number of new techniques and systems, got many new people involved, and triggered stronger association of weakly related fields. In this article, we survey almost 300 key theoretical and empirical contributions in the current decade related to image retrieval and automatic image annotation, and in the process discuss the spawning of related subfields. We also discuss significant challenges involved in the adaptation of existing image retrieval techniques to build systems that can be useful in the real world. In retrospect of what has been achieved so far, we also conjecture what the future may hold for image retrieval research.
Fast Image Search for Learned Metrics
"... We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search unde ..."
Abstract
-
Cited by 39 (7 self)
- Add to MetaCart
We introduce a method that enables scalable image search for learned metrics. Given pairwise similarity and dissimilarity constraints between some images, we learn a Mahalanobis distance function that captures the images’ underlying relationships well. To allow sub-linear time similarity search under the learned metric, we show how to encode the learned metric parameterization into randomized locality-sensitive hash functions. We further formulate an indirect solution that enables metric learning and hashing for vector spaces whose high dimensionality make it infeasible to learn an explicit weighting over the feature dimensions. We demonstrate the approach applied to a variety of image datasets. Our learned metrics improve accuracy relative to commonly-used metric baselines, while our hashing construction enables efficient indexing with learned distances and very large databases.
Learning visual similarity measures for comparing never seen objects
- Proc. IEEE CVPR
, 2007
"... In this paper we propose and evaluate an algorithm that learns a similarity measure for comparing never seen objects. The measure is learned from pairs of training images labeled “same ” or “different”. This is far less informative than the commonly used individual image labels (e.g. “car model X”), ..."
Abstract
-
Cited by 32 (0 self)
- Add to MetaCart
In this paper we propose and evaluate an algorithm that learns a similarity measure for comparing never seen objects. The measure is learned from pairs of training images labeled “same ” or “different”. This is far less informative than the commonly used individual image labels (e.g. “car model X”), but it is cheaper to obtain. The proposed algorithm learns the characteristic differences between local descriptors sampled from pairs of “same ” and “different” images. These differences are vector quantized by an ensemble of extremely randomized binary trees, and the similarity measure is computed from the quantized differences. The extremely randomized trees are fast to learn, robust due to the redundant information they carry and they have been proved to be very good clusterers. Furthermore, the trees efficiently combine different feature types (SIFT and geometry). We evaluate our innovative similarity measure on four very different datasets and consistantly outperform the state-of-the-art competitive approaches. 1.
Learning distance metrics with contextual constraints for image retrieval
- Proc. Computer Vision and Pattern Recognition
, 2006
"... Relevant Component Analysis (RCA) has been proposed for learning distance metrics with contextual constraints for image retrieval. However, RCA has two important disadvantages. One is the lack of exploiting negative constraints which can also be informative, and the other is its incapability of capt ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
Relevant Component Analysis (RCA) has been proposed for learning distance metrics with contextual constraints for image retrieval. However, RCA has two important disadvantages. One is the lack of exploiting negative constraints which can also be informative, and the other is its incapability of capturing complex nonlinear relationships between data instances with the contextual information. In this paper, we propose two algorithms to overcome these two disadvantages, i.e., Discriminative Component Analysis (DCA) and Kernel DCA. Compared with other complicated methods for distance metric learning, our algorithms are rather simple to understand and very easy to solve. We evaluate the performance of our algorithms on image retrieval in which experimental results show that our algorithms are effective and promising in learning good quality distance metrics for image retrieval. 1
Measuring constraint-set utility for partitional clustering algorithms
- In: Proceedings of the Tenth European Conference on Principles and Practice of Knowledge Discovery in Databases
, 2006
"... Abstract. Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Abstract. Clustering with constraints is an active area of machine learning and data mining research. Previous empirical work has convincingly shown that adding constraints to clustering improves the performance of a variety of algorithms. However, in most of these experiments, results are averaged over different randomly chosen constraint sets from a given set of labels, thereby masking interesting properties of individual sets. We demonstrate that constraint sets vary significantly in how useful they are for constrained clustering; some constraint sets can actually decrease algorithm performance. We create two quantitative measures, informativeness and coherence, that can be used to identify useful constraint sets. We show that these measures can also help explain differences in performance for four particular constrained clustering algorithms. 1
Semi-supervised dimensionality reduction
- In: Proceedings of the 7th SIAM International Conference on Data Mining
, 2007
"... Dimensionality reduction is among the keys in mining highdimensional data. This paper studies semi-supervised dimensionality reduction. In this setting, besides abundant unlabeled examples, domain knowledge in the form of pairwise constraints are available, which specifies whether a pair of instance ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
Dimensionality reduction is among the keys in mining highdimensional data. This paper studies semi-supervised dimensionality reduction. In this setting, besides abundant unlabeled examples, domain knowledge in the form of pairwise constraints are available, which specifies whether a pair of instances belong to the same class (must-link constraints) or different classes (cannot-link constraints). We propose the SSDR algorithm, which can preserve the intrinsic structure of the unlabeled data as well as both the must-link and cannot-link constraints defined on the labeled examples in the projected low-dimensional space. The SSDR algorithm is efficient and has a closed form solution. Experiments on a broad range of data sets show that SSDR is superior to many established dimensionality reduction methods. 1
Discriminant Analysis in Correlation Similarity Measure Space
"... Correlation is one of the most widely used similarity measures in machine learning like Euclidean and Mahalanobis distances. However, compared with proposed numerous discriminant learning algorithms in distance metric space, only a very little work has been conducted on this topic using correlation ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
Correlation is one of the most widely used similarity measures in machine learning like Euclidean and Mahalanobis distances. However, compared with proposed numerous discriminant learning algorithms in distance metric space, only a very little work has been conducted on this topic using correlation similarity measure. In this paper, we propose a novel discriminant learning algorithm in correlation measure space, Correlation Discriminant Analysis (CDA). In this framework, based on the definitions of within-class correlation and between-class correlation, the optimum transformation can be sought for to maximize the difference between them, which is in accordance with good classification performance empirically. Under different cases of the transformation, different implementations of the algorithm are given. Extensive empirical evaluations of CDA demonstrate its advantage over alternative methods. 1.
S.: Local distance functions: A taxonomy, new algorithms, and an evaluation
- In: Proc. ICCV (2009
"... We present a taxonomy for local distance functions where most existing algorithms can be regarded as approximations of the geodesic distance defined by a metric tensor. We categorize existing algorithms by how, where and when they estimate the metric tensor. We also extend the taxonomy along each ax ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
We present a taxonomy for local distance functions where most existing algorithms can be regarded as approximations of the geodesic distance defined by a metric tensor. We categorize existing algorithms by how, where and when they estimate the metric tensor. We also extend the taxonomy along each axis. How: We introduce hybrid algorithms that use a combination of dimensionality reduction and metric learning to ameliorate over-fitting. Where: We present an exact polynomial time algorithm to integrate the metric tensor along the lines between the test and training points under the assumption that the metric tensor is piecewise constant. When: We propose an interpolation algorithm where the metric tensor is sampled at a number of references points during the offline phase, which are then interpolated during online classification. We also present a comprehensive evaluation of all the algorithms on tasks in face recognition, object recognition, and digit recognition. 1.
LEARNING A METRIC FOR MUSIC SIMILARITY
"... This paper describe five different principled ways to embed songs into a Euclidean metric space. In particular, we learn embeddings so that the pairwise Euclidean distance between two songs reflects semantic dissimilarity. This allows distance-based analysis, such as for example straightforward near ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
This paper describe five different principled ways to embed songs into a Euclidean metric space. In particular, we learn embeddings so that the pairwise Euclidean distance between two songs reflects semantic dissimilarity. This allows distance-based analysis, such as for example straightforward nearest-neighbor classification, to detect and potentially suggest similar songs within a collection. Each of the six approaches (baseline, whitening, LDA, NCA, LMNN and RCA) rotate and scale the raw feature space with a linear transform. We tune the parameters of these models using a song-classification task with content-based features. 1

