Results 11  20
of
757
Distance Browsing in Spatial Databases
, 1999
"... Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is kn ..."
Abstract

Cited by 390 (20 self)
 Add to MetaCart
Two different techniques of browsing through a collection of spatial objects stored in an Rtree spatial data structure on the basis of their distances from an arbitrary spatial query object are compared. The conventional approach is one that makes use of a knearest neighbor algorithm where k is known prior to the invocation of the algorithm. Thus if m#kneighbors are needed, the knearest neighbor algorithm needs to be reinvoked for m neighbors, thereby possibly performing some redundant computations. The second approach is incremental in the sense that having obtained the k nearest neighbors, the k +1 st neighbor can be obtained without having to calculate the k +1nearest neighbors from scratch. The incremental approach finds use when processing complex queries where one of the conditions involves spatial proximity (e.g., the nearest city to Chicago with population greater than a million), in which case a query engine can make use of a pipelined strategy. A general incremental nearest neighbor algorithm is presented that is applicable to a large class of hierarchical spatial data structures. This algorithm is adapted to the Rtree and its performance is compared to an existing knearest neighbor algorithm for Rtrees [45]. Experiments show that the incremental nearest neighbor algorithm significantly outperforms the knearest neighbor algorithm for distance browsing queries in a spatial database that uses the Rtree as a spatial index. Moreover, the incremental nearest neighbor algorithm also usually outperforms the knearest neighbor algorithm when applied to the knearest neighbor problem for the Rtree, although the improvement is not nearly as large as for distance browsing queries. In fact, we prove informally that, at any step in its execution, the incremental...
Think Globally, Fit Locally: Unsupervised Learning of Low Dimensional Manifolds
 Journal of Machine Learning Research
, 2003
"... The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation. ..."
Abstract

Cited by 383 (11 self)
 Add to MetaCart
The problem of dimensionality reduction arises in many fields of information processing, including machine learning, data compression, scientific visualization, pattern recognition, and neural computation.
Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces
, 1993
"... We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdim ..."
Abstract

Cited by 356 (5 self)
 Add to MetaCart
We consider the computational problem of finding nearest neighbors in general metric spaces. Of particular interest are spaces that may not be conveniently embedded or approximated in Euclidian space, or where the dimensionality of a Euclidian representation is very high. Also relevant are highdimensional Euclidian settings in which the distribution of data is in some sense of lower dimension and embedded in the space. The vptree (vantage point tree) is introduced in several forms, together with associated algorithms, as an improved method for these difficult search problems. Tree construction executes in O(n log(n)) time, and search is under certain circumstances and in the limit, O(log(n)) expected time. The theoretical basis for this approach is developed and the results of several experiments are reported. In Euclidian cases, kdtree performance is compared.
Similarity Indexing with the SStree
 In Proceedings of the 12th International Conference on Data Engineering
, 1996
"... jain0ece.ucsd.edu ..."
Neighbourhood components analysis
 Advances in Neural Information Processing Systems 17
, 2004
"... In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leaveoneout KNN score on the training set. It can also learn a lowdimensional linear embedding of labele ..."
Abstract

Cited by 338 (9 self)
 Add to MetaCart
(Show Context)
In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leaveoneout KNN score on the training set. It can also learn a lowdimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is nonparametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1
Efficient Clustering of HighDimensional Data Sets with Application to Reference Matching
, 2000
"... Many important problems involve clustering large datasets. Although naive implementations of clustering are computationally expensive, there are established efficient techniques for clustering when the dataset has either (1) a limited number of clusters, (2) a low feature dimensionality, or (3) a sm ..."
Abstract

Cited by 329 (15 self)
 Add to MetaCart
(Show Context)
Many important problems involve clustering large datasets. Although naive implementations of clustering are computationally expensive, there are established efficient techniques for clustering when the dataset has either (1) a limited number of clusters, (2) a low feature dimensionality, or (3) a small number of data points. However, there has been much less work on methods of efficiently clustering datasets that are large in all three ways at oncefor example, having millions of data points that exist in many thousands of dimensions representing many thousands of clusters. We present a new technique for clustering these large, highdimensional datasets. The key idea involves using a cheap, approximate distance measure to efficiently divide the data into overlapping subsets we call canopies. Then clustering is performed by measuring exact distances only between points that occur in a common canopy. Using canopies, large clustering problems that were formerly impossible become practical. Under r...
Shape Indexing Using Approximate NearestNeighbour Search in HighDimensional Spaces
, 1997
"... Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of highdimensional features is critical, due to the improved level of discrimination they can provide. Unfortunately, f ..."
Abstract

Cited by 306 (12 self)
 Add to MetaCart
(Show Context)
Shape indexing is a way of making rapid associations between features detected in an image and object models that could have produced them. When model databases are large, the use of highdimensional features is critical, due to the improved level of discrimination they can provide. Unfortunately, finding the nearest neighbour to a query point rapidly becomes inefficient as the dimensionality of the feature space increases. Past indexing methods have used hash tables for hypothesis recovery, but only in lowdimensional situations. In this paper, we show that a new variant of the kd tree search algorithm makes indexing in higherdimensional spaces practical. This Best Bin First, or BBF, search is an approximate algorithm which finds the nearest neighbour for a large fraction of the queries, and a very close neighbour in the remaining cases. The technique has been integrated into a fully developed recognition system, which is able to detect complex objects in real, cluttered scenes in just a few seconds.
CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
, 1999
"... Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as Kmeans, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit s ..."
Abstract

Cited by 272 (23 self)
 Add to MetaCart
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as Kmeans, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models. These algorithms can breakdown if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms breakdown when the data consists of clusters that are of diverse shapes, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON that measures the similarity of two clusters based on a dynamic model. In the clustering process, two clusters are merged only if the interconnectivity and closeness (proximity) between two clusters are high relative to the internal intercon...
ANN: A library for approximate nearest neighbor searching, version 1.1.2. http://www.cs.umd.edu/∼mount/ANN
, 2010
"... ..."
(Show Context)
The partigame algorithm for variable resolution reinforcement learning in multidimensional statespaces
 MACHINE LEARNING
, 1995
"... Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and applies tec ..."
Abstract

Cited by 257 (8 self)
 Add to MetaCart
Partigame is a new algorithm for learning feasible trajectories to goal regions in high dimensional continuous statespaces. In high dimensions it is essential that learning does not plan uniformly over a statespace. Partigame maintains a decisiontree partitioning of statespace and applies techniques from gametheory and computational geometry to efficiently and adaptively concentrate high resolution only on critical areas. The current version of the algorithm is designed to find feasible paths or trajectories to goal regions in high dimensional spaces. Future versions will be designed to find a solution that optimizes a realvalued criterion. Many simulated problems have been tested, ranging from twodimensional to ninedimensional statespaces, including mazes, path planning, nonlinear dynamics, and planar snake robots in restricted spaces. In all cases, a good solution is found in less than ten trials and a few minutes.