### Citations

3771 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ... properties of the data to decide which features should be kept. Finding the optimal feature subset is intractable [20]. Therefore, greedy search strategies such as sequential forward selection (SFS) =-=[21]-=- and sequential backward elimination (SBE) [20] are typically used. However, these greedy algorithms perform poorly when the feature evaluation criterion is non-monotonic. Meanwhile, various feature e... |

2451 | A global geometric framework for nonlinear dimensionality reduction
- Tenenbaum, Silva, et al.
(Show Context)
Citation Context ...s can be obtained based on the graph spectral theory [3, 4]. The graph spectral techniques are also adopted for dimensionality reduction in multidimensional space. Representative works such as Isomap =-=[5]-=-, Locally Linear Embedding [6], and Laplacian Embedding [7], can all be interpreted in a general graph embedding framework with different choices of the graph structures. Tetsuo et al. [8] study the t... |

2408 | Nonlinear dimensionality reduction by locally linear embedding
- Roweis, Saul
(Show Context)
Citation Context ... graph spectral theory [3, 4]. The graph spectral techniques are also adopted for dimensionality reduction in multidimensional space. Representative works such as Isomap [5], Locally Linear Embedding =-=[6]-=-, and Laplacian Embedding [7], can all be interpreted in a general graph embedding framework with different choices of the graph structures. Tetsuo et al. [8] study the trade-off between time and spac... |

1563 | Wrappers for feature subset selection
- Kohavi, John
(Show Context)
Citation Context ...ing algorithms. So we mainly focus on the filter methods which utilize some intrinsic properties of the data to decide which features should be kept. Finding the optimal feature subset is intractable =-=[20]-=-. Therefore, greedy search strategies such as sequential forward selection (SFS) [21] and sequential backward elimination (SBE) [20] are typically used. However, these greedy algorithms perform poorly... |

643 | gSpan: Graph-based substructure pattern mining
- Yan, Han
- 2002
(Show Context)
Citation Context ...dataset with 1k graphs as the default dataset and use the other five datasets for scalability testing. The numbers of nodes in graphs range from 10 to 20. The frequent feature set F is mined by gSpan =-=[38]-=- with a minimum support 5%, which is a typical setting in frequent subgraph mining to generate a moderate number of frequent subgraphs. We generate a synthetic dataset with 1,000 database graphs and 1... |

478 |
A New Measure of Rank Correlation
- Kendall
- 1938
(Show Context)
Citation Context ...approximate top-k results returned according to the graph distance in the feature space. To assess the effectiveness of the mapped feature space, we adopt the following widely-used measures [40] [41] =-=[42]-=- to evaluate the quality of the approximate top-k results. (1) Precision: the fraction of answers in the approximate top-k results that belong to the exact top-k results, which is defined as p(k) = |A... |

272 | Comparing top k lists
- Fagin, Sivakumar
- 2003
(Show Context)
Citation Context ...st of the approximate top-k results returned according to the graph distance in the feature space. To assess the effectiveness of the mapped feature space, we adopt the following widely-used measures =-=[40]-=- [41] [42] to evaluate the quality of the approximate top-k results. (1) Precision: the fraction of answers in the approximate top-k results that belong to the exact top-k results, which is defined as... |

200 | Graph indexing: A frequent structure-based approach
- Yan, Yu, et al.
- 2004
(Show Context)
Citation Context ....8s1 Pe rc en tag e δ DSPM Original (b) Distribution between q and DG Figure 1: Dissmilarity/Distance Distribution pergraph containment query. To accelerate subgraph containment query process, gIndex =-=[31]-=- and FG-Index [32] are proposed for filtering. gIndex generates all size-bounded frequent subgraphs as well as a subset of size-bounded infrequent subgraphs, while FGIndex indexes all frequent subgrap... |

182 |
A (sub)graph isomorphism algorithm for matching large graphs
- Cordella, Foggia, et al.
(Show Context)
Citation Context ...udes two parts: feature matching time and multidimensional search time. The former is to match each mapped feature with the query graph to generate a binary vector, which is done by the VF2 algorithm =-=[43]-=-. The latter is to retrieve the top-k result in the mapped feature space. The query time of Original is 3–5 times longer than that of DSPM because the number of mapped features, |F |, in Original is l... |

174 | A graph distance metric based on the maximal common subgraph
- Bunke, Shearer
- 1998
(Show Context)
Citation Context ...portant to note that by structure-preserving, it requests to keep the most informative fingerprint of the graphs in the entire DG . In this paper, for δ(gi, gi), we focus on two graph dissimilarities =-=[1]-=- [2] based on Maximum Common Subgraph (MCS). A graph g is a MCS of two graphs gi and gj , denoted as mcs(gi, gj), if g is a common subgraph of gi and gj and there is no other common subgraph g′ larger... |

162 | Unsupervised feature selection using feature similarity
- Mitra, Murthy
(Show Context)
Citation Context ... quality of a subset. For example, Talavera [22] selects features based on feature dependence; Dash et al. [23] choose features based on the entropy of distances between data points; and Mitra et al. =-=[24]-=- select features based on a new feature similarity measure called maximum information compression index (MICI). In recent years, spectral methods are also explored for feature selection, such as [25] ... |

110 | Discriminative frequent pattern analysis for effective classification
- Cheng, Yan, et al.
- 2007
(Show Context)
Citation Context ...mly selects p frequent subgraphs as dimensions. The correlation score between two features is an important measure to evaluate how similar the features are, which is defined using Jaccard Coefficient =-=[35]-=-. As shown in Fig. 2, DSPM has much smaller feature correlation score than Sample, while the query precision of DSPM is twice larger than that of Sample (see Exp-1 in Section 6 for details). Such a re... |

106 | Top-k query evaluation with probabilistic guarantees
- Theobald, Weikum, et al.
- 2004
(Show Context)
Citation Context ... the approximate top-k results returned according to the graph distance in the feature space. To assess the effectiveness of the mapped feature space, we adopt the following widely-used measures [40] =-=[41]-=- [42] to evaluate the quality of the approximate top-k results. (1) Precision: the fraction of answers in the approximate top-k results that belong to the exact top-k results, which is defined as p(k)... |

103 | Laplacian score for feature selection,”
- He, Cai, et al.
- 2005
(Show Context)
Citation Context ... [24] select features based on a new feature similarity measure called maximum information compression index (MICI). In recent years, spectral methods are also explored for feature selection, such as =-=[25]-=- which selects features based on the Laplacian score, and a unified framework SPEC [26] which considers Laplacian score as a special case. However, the features selected by these two methods are highl... |

101 | Biological network comparison using graphlet degree distribution
- Przulj
- 2007
(Show Context)
Citation Context ...ass, but the processing time grows exponentially while the recursion depth of the subtree patterns becomes deeper. The third class of graph kernels is based on restricted subgraphs, such as Graphlets =-=[15]-=- and h-hop neighbors [16]. However, the Graphlet is only be feasible on unlabeled graphs, and both Graphlets and h-hop neighbors have very limited power to capture the topological structure of graphs ... |

89 | Substructure similarity search in graph databases
- Yan, Yu, et al.
- 2005
(Show Context)
Citation Context ...ent subgraphs, while FGIndex indexes all frequent subgraphs with size-increasing support function and additionally includes all infrequent edges. Based on the indexing features used in gIndex, Grafil =-=[33]-=- is developed to support efficient subgraph similarity containment query. To solve the supergraph containment query problem, Chen et al. propose a contrast subgraph-based indexing framework, cIndex [3... |

80 | Spectral Feature Selection for Supervised and Unsupervised Learning. In:
- Zhao, Liu
- 2007
(Show Context)
Citation Context ...ion compression index (MICI). In recent years, spectral methods are also explored for feature selection, such as [25] which selects features based on the Laplacian score, and a unified framework SPEC =-=[26]-=- which considers Laplacian score as a special case. However, the features selected by these two methods are highly redundant as the correlation among features is neglected. Therefore, a two step appro... |

77 | Fg-index: towards verification-free query processing on graph databases
- Cheng, Ke, et al.
- 2007
(Show Context)
Citation Context ...e δ DSPM Original (b) Distribution between q and DG Figure 1: Dissmilarity/Distance Distribution pergraph containment query. To accelerate subgraph containment query process, gIndex [31] and FG-Index =-=[32]-=- are proposed for filtering. gIndex generates all size-bounded frequent subgraphs as well as a subset of size-bounded infrequent subgraphs, while FGIndex indexes all frequent subgraphs with size-incre... |

73 | Managing and Mining Graph Data
- Aggarwal, Wang
- 2010
(Show Context)
Citation Context ...rature, with the aim of finding the set of the frequent subgraphs, maximal frequent subgraphs, closed frequent subgraphs, and representative frequent subgraphs. A comprehensive survey can be found in =-=[30]-=-. However, the set of frequent subgraphs cannot be used directly as dimensions for DS-preserved mapping due to the anti-monotone property of frequent subgraphs. Frequent subgraph based indexing. In th... |

60 | Shortest-path kernels on graphs
- Borgwardt, Kriegel
- 2005
(Show Context)
Citation Context ...n defined on various graph patterns which generally fall into three classes. The first class, 86 based on random walks/paths, computes the number of matching pairs of random walks/paths in two graphs =-=[12, 13]-=-. The second class, subtree graph kernel [14], has higher representation power of graph structure than the first class, but the processing time grows exponentially while the recursion depth of the sub... |

60 | Feature Selection for Clustering - A Filter Solution
- Dash, Choi, et al.
- 2002
(Show Context)
Citation Context ...is non-monotonic. Meanwhile, various feature evaluation criteria are proposed to evaluate the quality of a subset. For example, Talavera [22] selects features based on feature dependence; Dash et al. =-=[23]-=- choose features based on the entropy of distances between data points; and Mitra et al. [24] select features based on a new feature similarity measure called maximum information compression index (MI... |

50 | Expressivity versus efficiency of graph kernels
- Ramon, Gärtner
- 2003
(Show Context)
Citation Context ...ly fall into three classes. The first class, 86 based on random walks/paths, computes the number of matching pairs of random walks/paths in two graphs [12, 13]. The second class, subtree graph kernel =-=[14]-=-, has higher representation power of graph structure than the first class, but the processing time grows exponentially while the recursion depth of the subtree patterns becomes deeper. The third class... |

44 | Feature selection as a preprocessing step for hierarchical clustering,” in
- Talavera
- 1999
(Show Context)
Citation Context ...lgorithms perform poorly when the feature evaluation criterion is non-monotonic. Meanwhile, various feature evaluation criteria are proposed to evaluate the quality of a subset. For example, Talavera =-=[22]-=- selects features based on feature dependence; Dash et al. [23] choose features based on the entropy of distances between data points; and Mitra et al. [24] select features based on a new feature simi... |

42 | Fast computation of graph kernels
- Vishwanathan, Borgwardt, et al.
- 1449
(Show Context)
Citation Context ...n defined on various graph patterns which generally fall into three classes. The first class, 86 based on random walks/paths, computes the number of matching pairs of random walks/paths in two graphs =-=[12, 13]-=-. The second class, subtree graph kernel [14], has higher representation power of graph structure than the first class, but the processing time grows exponentially while the recursion depth of the sub... |

37 | Pattern vectors from algebraic graph theory,
- Wilson, Hancock, et al.
- 2005
(Show Context)
Citation Context ...rtex in a single graph as a vector that best characterizes the similarities/weights between vertex pairs. The vector representation for the vertices can be obtained based on the graph spectral theory =-=[3, 4]-=-. The graph spectral techniques are also adopted for dimensionality reduction in multidimensional space. Representative works such as Isomap [5], Locally Linear Embedding [6], and Laplacian Embedding ... |

36 | Unsupervised feature selection for multi-cluster data
- Cai, Zhang, et al.
(Show Context)
Citation Context ...siders Laplacian score as a special case. However, the features selected by these two methods are highly redundant as the correlation among features is neglected. Therefore, a two step approach, MCFS =-=[27]-=-, is proposed to find the subset of features instead of evaluating each feature independently. Yang. et al. [28] propose a framework, UDFS, by integrating discriminative information and `2,1 minimizat... |

31 | Spectral embedding of graphs.
- Luo, Wilson, et al.
- 2003
(Show Context)
Citation Context ...rtex in a single graph as a vector that best characterizes the similarities/weights between vertex pairs. The vector representation for the vertices can be obtained based on the graph spectral theory =-=[3, 4]-=-. The graph spectral techniques are also adopted for dimensionality reduction in multidimensional space. Representative works such as Isomap [5], Locally Linear Embedding [6], and Laplacian Embedding ... |

30 |
Multidimensional scaling with restrictions on the configuration
- Leeuw, Heiser
- 1980
(Show Context)
Citation Context ... ≥ h(x) Here, x becomes the supporting point of the next majorizing function. We iterate this process until convergence occurs due to a lower bound of the function or due to constraints. As stated in =-=[37]-=-, the configuration of the supporting point and the parameter can be updated for Eq. (5) in the following form. xir = 1 n Σnk=1bikzkr (6) cur = Σ1≤i,j≤n(xir − xjr)(yir − yjr) Σ1≤i,j≤n(yir − yjr)2 (7) ... |

24 | Towards graph containment search and indexing
- Chen, Yan, et al.
- 2007
(Show Context)
Citation Context ...3] is developed to support efficient subgraph similarity containment query. To solve the supergraph containment query problem, Chen et al. propose a contrast subgraph-based indexing framework, cIndex =-=[34]-=-, to sort out significant and distinctive contrast subgraphs using a redundancyaware feature selection process. However, the selected frequent subgraphs in the above approaches are only used for effic... |

22 |
Leeuw, “Applications of convex analysis to multidimensional scaling,” Recent Developments in Statistics,
- De
- 1977
(Show Context)
Citation Context ...n techniques to reduce the computational cost. We further derive an approximate algorithm to handle a large graph database. Our algorithm, denoted as DSPM, is inspired by the majorization strategy in =-=[36]-=-. Unlike many traditional minimization methods, the majorization strategy iteratively generates a converging sequence of function values without a stepsize procedure that may be computationally expens... |

20 |
Graph embedding in vector spaces by means of prototype selection, in:
- Riesen, Neuhaus, et al.
- 2007
(Show Context)
Citation Context ...able to the problem studied in this paper. The approaches in the second category aim at representing each graph in a dataset as a feature vector based on graph operations or statistics. Riesen et al. =-=[9]-=- propose a general approach of mapping graphs to multidimensional vectors. They heuristically select k graphs in the graph set as prototypes, and then map each graph to a k-dimensional vector in which... |

15 | Efficiently Handling Feature Redundancy in High-Dimensional Data
- Yu, Liu
- 2003
(Show Context)
Citation Context ...feature selection approaches since this paper focuses on a graph database with no class labels. Exiting unsupervised feature selection methods fall into two categories: wrapper model and filter model =-=[19]-=-. In wrapper approaches, feature selection is wrapped in a specific learning algorithm, which usually results in high computational complexity and less generality so that the selected features are ina... |

9 | G-hash: towards fast kernel-based similarity search in large graph databases
- Wang, Smalter, et al.
- 2009
(Show Context)
Citation Context ...ime grows exponentially while the recursion depth of the subtree patterns becomes deeper. The third class of graph kernels is based on restricted subgraphs, such as Graphlets [15] and h-hop neighbors =-=[16]-=-. However, the Graphlet is only be feasible on unlabeled graphs, and both Graphlets and h-hop neighbors have very limited power to capture the topological structure of graphs as types and sizes of the... |

7 |
Unsupervised feature selection using nonnegative spectral analysis,” in
- Li, Yang, et al.
- 2012
(Show Context)
Citation Context ...ead of evaluating each feature independently. Yang. et al. [28] propose a framework, UDFS, by integrating discriminative information and `2,1 minimization into one step. A more general framework NDFS =-=[29]-=- is developed to learn the cluster label and feature selection simultaneously where cluster indicator is constrained to be nonnegative. However, they only select the most informative features and do n... |

6 | Effective feature construction by maximum common subgraph sampling
- Schietgat, Costa, et al.
- 2011
(Show Context)
Citation Context ...ture the topological structure of graphs as types and sizes of the kernel substructures are very limited. There are also some research on selecting useful features for graph kernels in the literature =-=[17, 18]-=-. However, these approaches aim at achieving higher classification accuracy but not for DS-preserved mapping. Thus they are not applicable for the problem studied in this paper. Feature selection. We ... |

6 |
l 2, 1-norm regularized discriminative feature selection for unsupervised learning,” in
- Yang, Shen, et al.
- 2011
(Show Context)
Citation Context ...nt as the correlation among features is neglected. Therefore, a two step approach, MCFS [27], is proposed to find the subset of features instead of evaluating each feature independently. Yang. et al. =-=[28]-=- propose a framework, UDFS, by integrating discriminative information and `2,1 minimization into one step. A more general framework NDFS [29] is developed to learn the cluster label and feature select... |

5 | Finding top-k similar graphs in graph databases,” in
- Zhu, Qin, et al.
- 2012
(Show Context)
Citation Context ...m 12 to 20 with an average density of 0.2, and the other five datasets are generated by varying the density from 0.1 to 0.3 with an average edge number of 20. Measures: Following the previous work in =-=[2]-=-, we use Eq.(2) as δ to compute the graph dissimilarity in the experiment. The results of using Eq.(1) as δ are similar to those of using Eq.(2), thus are omitted in the paper due to space limit. For ... |

5 |
Non-negative laplacian embedding,” in
- Luo, Ding, et al.
- 2009
(Show Context)
Citation Context .... The graph spectral techniques are also adopted for dimensionality reduction in multidimensional space. Representative works such as Isomap [5], Locally Linear Embedding [6], and Laplacian Embedding =-=[7]-=-, can all be interpreted in a general graph embedding framework with different choices of the graph structures. Tetsuo et al. [8] study the trade-off between time and space of graph embedding. Note th... |

5 | A linear-space algorithm for distance preserving graph embedding,”
- Asano, Bose, et al.
- 2009
(Show Context)
Citation Context ...h as Isomap [5], Locally Linear Embedding [6], and Laplacian Embedding [7], can all be interpreted in a general graph embedding framework with different choices of the graph structures. Tetsuo et al. =-=[8]-=- study the trade-off between time and space of graph embedding. Note that approaches in this category aim to transform the vertices of a single graph but not a collection of graphs to vectors, thus in... |

5 | Graph embedding in vector spaces by node attribute statistics,”
- Gibert, Valveny, et al.
- 2012
(Show Context)
Citation Context ...of the costly graph edit distance computation to obtain the k-dimensional vector for a query graph, which does not essentially reduce the the computation complexity in query processing. Gibert et al. =-=[11]-=- propose another embedding methodology to map graphs to vectors based on statistics of the node/edge attributes. However, such method only preserves very little structure information of graphs and can... |

4 |
Improving vector space embedding of graphs through feature selection algorithms,
- Bunke, Riesen
- 2011
(Show Context)
Citation Context ...dimensional vector in which the elements represent the graph edit distances between this graph and the prototypes. To improve the quality of the prototypes, they subsequently propose another approach =-=[10]-=- to first use all the graphs in the graph set as prototypes, and then apply feature selection algorithms to eliminate redundant prototypes and reduce the dimensionality. An obvious disadvantage of the... |

1 |
Feature selection for graph kernels,”
- Tan, Polat, et al.
- 2010
(Show Context)
Citation Context ...ture the topological structure of graphs as types and sizes of the kernel substructures are very limited. There are also some research on selecting useful features for graph kernels in the literature =-=[17, 18]-=-. However, these approaches aim at achieving higher classification accuracy but not for DS-preserved mapping. Thus they are not applicable for the problem studied in this paper. Feature selection. We ... |

1 |
Graphgen: A graph synthetic generator,”
- Cheng, Ke, et al.
- 2006
(Show Context)
Citation Context ... is a typical setting in frequent subgraph mining to generate a moderate number of frequent subgraphs. We generate a synthetic dataset with 1,000 database graphs and 1,000 query graphs using Graphgen =-=[39]-=-. The default parameters are set in a similar way as in [32], i.e., the average number of edges in each graph is 20, the number of distinct labels is 20, and the average graph density is 0.2. We furth... |