Results 1 
3 of
3
The concentration of fractional distances
 IEEE Trans. on Knowledge and Data Engineering
, 2007
"... Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, t ..."
Abstract

Cited by 52 (2 self)
 Add to MetaCart
(Show Context)
Abstract—Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowskilike norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given. Index Terms—Nearest neighbor search, highdimensional data, distance concentration, fractional distances. 1
Counting distance permutations, in
 Chávez and Navarro
"... Distance permutation indexes support fast proximity searching in highdimensional metric spaces. Given some fixed reference sites, for each point in a database the index stores a permutation naming the closest site, the secondclosest, and so on. We examine how many distinct permutations can occur a ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
(Show Context)
Distance permutation indexes support fast proximity searching in highdimensional metric spaces. Given some fixed reference sites, for each point in a database the index stores a permutation naming the closest site, the secondclosest, and so on. We examine how many distinct permutations can occur as a function of the number of sites and the size of the space. We give theoretical results for tree metrics and vector spaces with L1, L2, and L ∞ metrics, improving on the previous best known storage space in the vector case. We also give experimental results and commentary on the number of distance permutations that actually occur in a variety of vector, string, and document databases. Key words: metric space, nearest neighbour, Voronoi diagram, distance permutation 1
The Dual Negative Selection Algorithm Based on Pattern Recognition Receptor Theory and Its Application in Twoclass Data Classification
"... Abstract — Negative Selection Algorithm (NSA) is an important artificial immune data classifiers generation method in Artificial Immune System (AIS) research. However, with the increase of the data dimensions, the current data classification algorithms which based on NSA exist the problems of excess ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract — Negative Selection Algorithm (NSA) is an important artificial immune data classifiers generation method in Artificial Immune System (AIS) research. However, with the increase of the data dimensions, the current data classification algorithms which based on NSA exist the problems of excessive number of generated classifiers and too low classifier generation efficiency. In this paper, the Dual Negative Selection Algorithm based on Pattern Recognition Receptor theory (PRR2NSA) is proposed, which simulates the process of Antigen Presenting Cells (APC) recognized the PathogenAssociated Molecular Patterns (PAMP) to trigger the immune response. The PRR2NSA algorithm generates the APC classifier based on training set clustering firstly, and then generates the Tcell classifiers within the coverage of the APC classifier set with dual negative selection algorithm (2NSA) secondly. The 2NSA avoids the unnecessary and timeconsuming selftolerance process of candidate classifier within the coverage of existing mature classifiers, thus greatly reduces classifier set size, significantly improves classifier generation efficiency. The PRR2NSA introduces the APC classifiers ’ costimulation to the TCell classifier, which reduce the occurrence of false classification on one hand, and accelerate the data classification efficiency on the other hand. Theoretical analysis and simulations show that the PRR2NSA algorithm effectively improves classification efficiency and reduces the time cost of algorithm. Index Terms — artificial immune system, realvalued negative selection algorithm, variablesized classifier, dual negative selection algorithm, PRR2NSA I.