### Table 2: A sample data set illustrates clusters embedded in subspaces of a high dimensional space.

2003

"... In PAGE 2: ... Hence, a good subspace clustering algorithm should be able to find clusters and the maximum associated set of dimensions. Consider, for example, a data set with 5 data points of 6 dimensional(given in Table2 ). In this data set, it is obvious that C = {x1, x2, x3} is a cluster and the maximum set of dimensions should be P = {1, 2, 3, 4}.... In PAGE 3: ...here sj is a vector defined as sj = (Aj1, Aj2, ..., Ajnj)T. Since there are possibly multiple states(or values) for a vari- able, a symbol table of a data set is usually not unique. For example, for the data set in Table2 , Table 3 is one of its symbol tables. BC BS A A A A B B B B C C C C D D D D BD BT Table 3: One of the symbol tables of the data set in Table 2.... In PAGE 3: ... For a given symbol table of the data set, the frequency table of each cluster is unique according to that symbol table. For example, for the data set in Table2 , let (C, P) be a subspace cluster, where C = {x1, x2, x3} and P = {1, 2, 3, 4}, if we use the symbol table presented in Table 3, then the corre- sponding frequency table for the subspace cluster (C, P) is given in Table 4. From the definition of frequency fjr in Equation (6), we have the following equalities: nj CG r=1 fjr(C) = |C|, j = 1, 2, .... ..."

Cited by 4

### Table 3: Occlusion results when considering sun- glasses. Note that in this case, NMF only is better when using a high dimensional feature space and no lighting conditions are considered. When light- ing conditions are considered, Bayesian approach obtains the best recognition rates.

### Table 2: Performance of the classi ers with degree predicted by the VC{bound. Each row describes one two{class{ classi er separating one digit (stated in the rst column) from the rest. The remaining columns contain: degree: the degree of the best polynomial as predicted by the described procedure, parameters: the dimensionality of the high dimensional space, which is also the maximum possible VC{dimension for linear classi ers in that space, hestim:: the VC dimension estimate for the actual classi ers, which is much smaller than the number of free parameters of linear classi ers in that space, 1 { 7: the numbers of errors on the test set for polynomial classi ers of degrees 1 through 7. The table shows that the decribed procedure chooses polynomial degrees which are optimal or close to optimal.

1995

"... In PAGE 5: ... We can then compare this prediction with the actual polynomial degree which gives the best performance on the test set. The re- sults are shown in Table2 ; cf.... ..."

Cited by 158

### TABLE 2.14: Performance of the classi ers with degree predicted by the VC-bound. Each row describes one two-class-classi er separating one digit (stated in the rst column) from the rest. The remaining columns contain: deg: the degree of the best polynomial as predicted by the described procedure, param.: the dimensionality of the high dimensional space, which is also the VC-dimension for the set of all separating hyperplanes in that space, hest:: the VC-dimension estimate for the actual classi ers, which is much smaller than the number of free parameters of linear classi ers in that space, 1 { 7: the numbers of errors on the test set for polynomial classi ers of degrees 1 through 7. The table shows that the decribed procedure chooses polynomial degrees which are optimal or close to optimal.

1997

### Table 3: F1 and number of Support Vectors for top two Medline queries 5 Conclusions The paper has presented a novel kernel for text analysis, and tested it on a catego- rization task, which relies on evaluating an inner product in a very high dimensional feature space. For a given sequence length k (k = 5 was used in the experiments reported) the features are indexed by all strings of length k. Direct computation of

2002

Cited by 199

### Table 1. The Isomap algorithm takes as input the distances dX(i,j) between all pairs i,j from N data points in the high-dimensional input space X, measured either in the standard Euclidean metric (as in Fig. 1A) or in some domain-speci c metric (as in Fig. 1B). The algorithm outputs coordinate vectors yi in a d-dimensional Euclidean space Y that (according to Eq. 1) best represent the intrinsic geometry of the data. The only free parameter (e or K) appears in Step 1.

"... In PAGE 2: ... These approxima- tions are computed efficiently by finding shortest paths in a graph with edges connect- ing neighboring data points. The complete isometric feature mapping, or Isomap, algorithm has three steps, which are detailed in Table1 . The first step deter- mines which points are neighbors on the manifold M, based on the distances dX(i,j) between pairs of points i,j in the input space X.... ..."

### Table 1. The Isomap algorithm takes as input the distances dX(i,j) between all pairs i,j from N data points in the high-dimensional input space X, measured either in the standard Euclidean metric (as in Fig. 1A) or in some domain-speci c metric (as in Fig. 1B). The algorithm outputs coordinate vectors yi in a d-dimensional Euclidean space Y that (according to Eq. 1) best represent the intrinsic geometry of the data. The only free parameter (e or K) appears in Step 1.

"... In PAGE 2: ... These approxima- tions are computed efficiently by finding shortest paths in a graph with edges connect- ing neighboring data points. The complete isometric feature mapping, or Isomap, algorithm has three steps, which are detailed in Table1 . The first step deter- mines which points are neighbors on the manifold M, based on the distances dX(i,j) between pairs of points i,j in the input space X.... ..."

### Tables III and IV. The error rate of the proposed IHDR algorithm was compared with some major tree classi ers. CART of [5] and C5.0 of [33] are among the best known classi cation trees3. However, like most other decision trees, they are univariate trees in that each internal node used only one input component to partition the samples. This means that the partition of samples is done using hyperplanes that are orthogonal to one axis. We do not expect that this type of tree can work well in a high dimensional or highly correlated space. Thus, we also tested a more recent multivariate tree OC1 of [10]. We realize that these trees were not designed for high-dimensional spaces like those from the images. Therefore, to fully explore their potential, we also tested the corresponding versions by performing the PCA before using CART, C5.0, and OC1 and called them CART with the PCA, C5.0 with the PCA, and OC1 with the PCA, respectively, as shown in Tables III and IV. Further, we compared the batch version of our HDR algorithm. Originally we expected the batch method to out-perform the incremental one. However, the error rate of the IHDR tree turned out lower than that of the HDR tree for this set of data. A major reason for this is that the same training samples may distribute in different leaf nodes for the IHDR tree because we ran several iterations during training. For the batch version, each training sample can only be allocated to a single leaf node.

2006

### Table 2. Various methods and algorithms mentioned in section 3.1 and their ability to confront effectively the issues mentioned in the same section (Incremental Updates, Performance in Text Classification Tasks, High Dimensionality, Low Computational Cost, Concept Drift, Dynamic Feature Space).

"... In PAGE 7: ...7 complexity for training the filtering models, updating them and providing recommen- dations. In Table2 , we summarize the basic characteristics of the aforementioned systems in terms of the issues discussed in this section. Table 2.... ..."

### Table 3: LOOCV error rates in the original space

in Oleg Okun ⋆

"... In PAGE 5: ... First, we performed two-class discrimination in the original, high dimensional space of 822 genes. Error rates for three NN classifiers are shown in Table3 when using leave-one-out cross-validation (LOOCV). These results will serve for a compar- ison with those obtained in low dimensional gene selection-induced space.... ..."