Results 1  10
of
112
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 685 (15 self)
 Add to MetaCart
(Show Context)
We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification—for example, achieving a test error rate of 1.3 % on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Fast approximate nearest neighbors with automatic algorithm configuration
 In VISAPP International Conference on Computer Vision Theory and Applications
, 2009
"... nearestneighbors search, randomized kdtrees, hierarchical kmeans tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in highdimensional spaces. There are no known exact algorithms for solving these highdimensional problems ..."
Abstract

Cited by 448 (2 self)
 Add to MetaCart
(Show Context)
nearestneighbors search, randomized kdtrees, hierarchical kmeans tree, clustering. For many computer vision problems, the most time consuming component consists of nearest neighbor matching in highdimensional spaces. There are no known exact algorithms for solving these highdimensional problems that are faster than linear search. Approximate algorithms are known to provide large speedups with only minor loss in accuracy, but many such algorithms have been published with only minimal guidance on selecting an algorithm and its parameters for any given problem. In this paper, we describe a system that answers the question, “What is the fastest approximate nearestneighbor algorithm for my data? ” Our system will take any given dataset and desired degree of precision and use these to automatically determine the best algorithm and parameter values. We also describe a new algorithm that applies priority search on hierarchical kmeans trees, which we have found to provide the best known performance on many datasets. After testing a range of alternatives, we have found that multiple randomized kd trees provide the best performance for other datasets. We are releasing public domain code that implements these approaches. This library provides about one order of magnitude improvement in query time over the best previously available software and provides fully automated parameter selection. 1
Pose Tracking from Natural Features on Mobile Phones
"... In this paper we present two techniques for natural feature tracking in realtime on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on currentgeneration phones. We use an approach based on heavily modified stateoftheart f ..."
Abstract

Cited by 116 (17 self)
 Add to MetaCart
(Show Context)
In this paper we present two techniques for natural feature tracking in realtime on mobile phones. We achieve interactive frame rates of up to 20Hz for natural feature tracking from textured planar targets on currentgeneration phones. We use an approach based on heavily modified stateoftheart feature descriptors, namely SIFT and Ferns. While SIFT is known to be a strong, but computationally expensive feature descriptor, Ferns classification is fast, but requires large amounts of memory. This renders both original designs unsuitable for mobile phones. We give detailed descriptions on how we modified both approaches to make them suitable for mobile phones. We present evaluations on robustness and performance on various devices and finally discuss their appropriateness for Augmented Reality applications.
Fast HighDimensional Approximation with Sparse Occupancy Trees
, 2010
"... The consecutive numbering of the publications is determined by their chronological order. The aim of this preprint series is to make new research rapidly available for scientific discussion. Therefore, the responsibility for the contents is solely due to the authors. The publications will be distrib ..."
Abstract

Cited by 94 (9 self)
 Add to MetaCart
The consecutive numbering of the publications is determined by their chronological order. The aim of this preprint series is to make new research rapidly available for scientific discussion. Therefore, the responsibility for the contents is solely due to the authors. The publications will be distributed by the authors.
Fast solvers and efficient implementations for distance metric learning
 In ICML
, 2008
"... In this paper we study how to improve nearest neighbor classification by learning a Mahalanobis distance metric. We build on a recently proposed framework for distance metric learning known as large margin nearest neighbor (LMNN) classification. Our paper makes three contributions. First, we describ ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
(Show Context)
In this paper we study how to improve nearest neighbor classification by learning a Mahalanobis distance metric. We build on a recently proposed framework for distance metric learning known as large margin nearest neighbor (LMNN) classification. Our paper makes three contributions. First, we describe a highly efficient solver for the particular instance of semidefinite programming that arises in LMNN classification; our solver can handle problems with billions of large margin constraints in a few hours. Second, we show how to reduce both training and testing times using metric ball trees; the speedups from ball trees are further magnified by learning low dimensional representations of the input space. Third, we show how to learn different Mahalanobis distance metrics in different parts of the input space. For large data sets, the use of locally adaptive distance metrics leads to even lower error rates. 1.
1 Parallel Spectral Clustering in Distributed Systems
"... Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform cluster ..."
Abstract

Cited by 60 (1 self)
 Add to MetaCart
(Show Context)
Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms such as kmeans. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nyström method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through
The concentration of fractional distances
 IEEE Trans. on Knowledge and Data Engineering
, 2007
"... Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, t ..."
Abstract

Cited by 51 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowskilike norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given. Index Terms—Nearest neighbor search, highdimensional data, distance concentration, fractional distances. 1
LargeScale Manifold Learning
"... This paper examines the problem of extracting lowdimensional manifold structure given millions of highdimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nod ..."
Abstract

Cited by 50 (7 self)
 Add to MetaCart
(Show Context)
This paper examines the problem of extracting lowdimensional manifold structure given millions of highdimensional face images. Specifically, we address the computational challenges of nonlinear dimensionality reduction via Isomap and Laplacian Eigenmaps, using a graph containing about 18 million nodes and 65 million edges. Since most manifold learning techniques rely on spectral decomposition, we first analyze two approximate spectral decomposition techniques for large dense matrices (Nyström and Columnsampling), providing the first direct theoretical and empirical comparison between these techniques. We next show extensive experiments on learning lowdimensional embeddings for two large face datasets: CMUPIE (35 thousand faces) and a web dataset (18 million faces). Our comparisons show that the Nyström approximation is superior to the Columnsampling method. Furthermore, approximate Isomap tends to perform better than Laplacian Eigenmaps on both clustering and classification with the labeled CMUPIE dataset. 1.
Unsupervised methods for determining object and relation synonyms on the web
 Journal of Artificial Intelligence Research
, 2009
"... The task of identifying synonymous relations and objects, or synonym resolution, is critical for highquality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither handtagged training examples nor domain knowledge is ..."
Abstract

Cited by 41 (3 self)
 Add to MetaCart
(Show Context)
The task of identifying synonymous relations and objects, or synonym resolution, is critical for highquality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither handtagged training examples nor domain knowledge is available. The paper presents a scalable, fullyimplemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver, introduces a probabilistic relational model for predicting whether two strings are coreferential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78 % precision and 68 % recall, and resolves relations with 90 % precision and 35 % recall. Several variations of Resolver’s probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97 % precision and 95 % recall on a data set from the TREC corpus.