| K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, "nearest neighbour ", in ICDT conference, pages 217--235, 1999. |
....of our benchmark. The retrieval costs obviously depend on the index structure with which the benchmark solved the nearest neighbor search problem. Recent work has shown that search costs in high dimensional spaces are exponentially dependant on the dimensionality of the features [BBKK97, WSB98, BGRS99] As such, it becomes obvious that above some dimensionality threshold all data items must be considered to answer the query [WSB98] Rather surprisingly, this is not only a theoretical phenomenon: in as low as 10 dimensional feature spaces for images, a brute force sequential scan often performs ....
....plots. Our preliminary investigation has shown that the effectiveness of high dimensional features can be better than the one of low dimensional features, but this is not always the case. Recently, it was questioned whether such high dimensional feature combinations are useful at all [BGRS99] Our experiments, however, show that very high dimensional (more than 500 dimensions) features are not harmful as concluded in [BGRS99] For instance, the best feature combination with 9 dimensions achieved an effectiveness of 0#26. A combination with 559 components obtained an effectiveness of ....
[Article contains additional citation context not shown here]
K. S. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "Nearest Neighbour" Meaningful? In Proceedings of the International Conference on Database Theory (ICDT), volume 1540 of Lecture Notes in Computer Science, pages 217--235, Jerusalem, Israel, Jan. 1999. Springer.
....performance, in particular in high dimensional vector spaces. In such spaces, the complexity of NN search is linear in the number N of objects to be searched. A number of experiments, whose original intention was to achieve a better performance with trees as index structures, show this [6, 5, 17] [25, 7] come to the same conclusion using a more formal line of argumentation. Thus, we for our part have developed a data structure for NN search, the so called VA File [23, 25] which is not based on trees. Its efficiency is due to its use of approximations. A socalled approximation file contains an ....
....this stop criterion, the algorithm visits exactly those points whose lower bound is smaller than the NN distance. This allows us to estimate the number of candidates. 2. 4 Meaningfulness of NN Search Recently, others have asked whether NN search is a meaningful implementation of similarity search [7]. With uniformly distributed data, if d approaches infinity, there are always objects whose distance is arbitrarily close to the NN distance. Beyer et al. conclude that the answer to the above question is not affirmative in many situations. However, in real data sets such as the ones generated ....
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbour" meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217--235. Springer-Verlag, 10--12 January 1999.
....performance, in particular in high dimensional vector spaces. In such spaces, the complexity of NN search is linear in the number N of objects to be searched. A number of experiments, whose original intention was to achieve a better performance with trees as index structures, show this [5, 4, 14] [15, 6] come to the same conclusion using a more formal line of argumentation. Thus, we for our part have developed a data structure for NN search, the so called VA File [15] which is not based on trees. Its efficiency is due to its use of approximations. A so called approximation file contains an ....
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbour " meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217--235. Springer-Verlag, 10--12 January 1999.
....special index structures for these features. In this paper, we consider feature types that are represented by points in a high dimensional data space. Then, contentbased retrieval corresponds to the problem of finding the nearest neighbor (NN) to the query point in the data space. Beyer et al. [5] have recently questioned this approach. A first result of this current article is to show that that finding does not hold in its full generality when applied to real data sets. On the other hand, many experiments have shown that the complexity of NN search is linear for highdimensional feature ....
....current article is to show that that finding does not hold in its full generality when applied to real data sets. On the other hand, many experiments have shown that the complexity of NN search is linear for highdimensional feature data, be it with tree based indexing structures, be it without [4, 3, 15, 26, 5]. 26, 5] have formally shown this for uniformly distributed data. Even more, one can conclude from those experiments that a sequential scan is superior to tree based structures if the dimensionality of the feature vectors is larger than around 10. In order to accelerate the scan through the ....
[Article contains additional citation context not shown here]
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbour" meaningful? In Catriel Beeri and Peter Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217--235. Springer-Verlag, 10--12 January 1999.
....A common implementation of similarity search is nearest neighbor search (NN search) As shown in Table 1, the dimensionality of features can be rather high. A number of theoretical and experimental work has shown that indexing such high dimensional spaces often results in linear search complexity [6, 5, 22, 47, 8]. In consequence, it is much more efficient to utilize linear structures instead of tree based structures. Our approach to this is called vector approximation file (VA File) a linear approach using approximations of the vectors. All the index services in our system are based on the VA File. Due ....
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbour" meaningful? In Proceedings of the International Conference on Database Theory (ICDT), 10--12 Jan. 1999.
....small (4 9) to large (several hundreds) Frequently, it is not sufficient to query one feature in isolation. Rather, a query typically combines several feature types to better reflect the notion of similarity. Given this, the number of dimensions of combined features is well above 10. Beyer et al. [2] have recently questioned NN search in high dimensional vector spaces. The result of their theoretical analysis over a uniformly distributed data set is that NN search as an implementation of similarity search is questionable if the number of dimensions is large, e.g. above 16. On the other hand, ....
....access methods generally work well for low dimensional spaces, their performance is known to degrade as the number of dimensions increases a phenomenon which has been termed the dimensional curse. This phenomenon has been observed in a number of experiments with synthetic as well as real data [6, 8, 2]. Recently, it has been formally shown that the complexity of similarity search in trees tends towards O(N) as dimensionality increases (N denotes the number of data items) 8] As experiments and formal analysis confirm, a sequential scan outperforms tree based approaches if the dimensionality is ....
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft. When is "nearest neighbour" meaningful? In C. Beeri and P. Buneman, editors, Proc. 7th Int. Conf. Data Theory, ICDT, number 1540 in Lecture Notes in Computer Science, LNCS, pages 217--235. Springer-Verlag, 10--12 Jan. 1999.
No context found.
K. Beyer, J. Goldstein, R. Ramakrishnan, and U. Shaft, "nearest neighbour ", in ICDT conference, pages 217--235, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC