Results 11  20
of
120
Large Scale Mining of Molecular Fragments with Wildcards. Intelligent Data Analysis 8:495–504
, 2004
"... Abstract. The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules with the aim to build a classifier that predict ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
(Show Context)
Abstract. The main task of drug discovery is to find novel bioactive molecules, i.e., chemical compounds that, for example, protect human cells against a virus. One way to support solving this task is to analyze a database of known and tested molecules with the aim to build a classifier that predicts whether a novel molecule will be active or inactive, so that future chemical tests can be focused on the most promising candidates. In [1] an algorithm for constructing such a classifier was proposed that uses molecular fragments to discriminate between active and inactive molecules. In this paper we present two extensions of this approach: A special treatment of rings and a method that finds fragments with wildcards based on chemical expert knowledge. 1
GREW—A Scalable Frequent Subgraph Discovery Algorithm
 in Fourth IEEE International Conference on Data Mining (ICDM 2004). 2004
, 2003
"... Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled v ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
(Show Context)
Existing algorithms that mine graph datasets to discover patterns corresponding to frequently occurring subgraphs can operate efficiently on graphs that are sparse, contain a large number of relatively small connected components, have vertices with low and bounded degrees, and contain welllabeled vertices and edges. However, there are a number of applications that lead to graphs that do not share these characteristics, for which these algorithms highly become unscalable. In this paper we propose a heuristic algorithm called GREW to overcome the limitations of existing complete or heuristic frequent subgraph discovery algorithms. GREW is designed to operate on a large graph and to find patterns corresponding to connected subgraphs that have a large number of vertexdisjoint embeddings. Our experimental evaluation shows that GREW is efficient, can scale to very large graphs, and find nontrivial patterns that cover large portions of the input graph and the lattice of frequent patterns.
Subdue: compressionbased frequent pattern discovery in graph data
 Proceedings of the 1st international workshop on open
, 2005
"... A majority of the existing algorithms which mine graph datasets target complete, frequent subgraph discovery. We describe the graphbased data mining system Subdue which focuses on the discovery of subgraphs which are not only frequent but also compress the graph dataset, using a heuristic algori ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
(Show Context)
A majority of the existing algorithms which mine graph datasets target complete, frequent subgraph discovery. We describe the graphbased data mining system Subdue which focuses on the discovery of subgraphs which are not only frequent but also compress the graph dataset, using a heuristic algorithm. The rationale behind the use of a compressionbased methodology for frequent pattern discovery is to produce a fewer number of highly interesting patterns than to generate a large number of patterns from which interesting patterns need to be identied. We perform an experimental comparison of Subdue with the graph mining systems gSpan and FSG on the Chemical Toxicity and the Chemical Compounds datasets that are provided with gSpan. We present results on the performance on the Subdue system on the Mutagenesis and the KDD 2003 Citation Graph dataset. An analysis of the results indicates that Subdue can eciently discover bestcompressing frequent patterns which are fewer in number but can be of higher interest. 1.
Frequency Concepts and Pattern Detection for the Analysis of Motifs in Networks
 Transactions on Computational Systems Biology
, 2005
"... Abstract. Network motifs, patterns of local interconnections with potential functional properties, are important for the analysis of biological networks. To analyse motifs in networks the first step is to find patterns of interest. This paper presents 1) three different concepts for the determinatio ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Network motifs, patterns of local interconnections with potential functional properties, are important for the analysis of biological networks. To analyse motifs in networks the first step is to find patterns of interest. This paper presents 1) three different concepts for the determination of pattern frequency and 2) an algorithm to compute these frequencies. The different concepts of pattern frequency depend on the reuse of network elements. The presented algorithm finds all or highly frequent patterns under consideration of these concepts. The utility of this method is demonstrated by applying it to biological data. 1
Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections
"... A growing set of online applications are generating data that can be viewed as very large collections of small, dense social graphs — these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks. A natural question ..."
Abstract

Cited by 18 (1 self)
 Add to MetaCart
(Show Context)
A growing set of online applications are generating data that can be viewed as very large collections of small, dense social graphs — these range from sets of social groups, events, or collaboration projects to the vast collection of graph neighborhoods in large social networks. A natural question is how to usefully define a domainindependent ‘coordinate system ’ for such a collection of graphs, so that the set of possible structures can be compactly represented and understood within a common space. In this work, we draw on the theory of graph homomorphisms to formulate and analyze such a representation, based on computing the frequencies of small induced subgraphs within each graph. We find that the space of subgraph frequencies is governed both by its combinatorial properties — based on extremal results that constrain all graphs — as well as by its empirical properties — manifested in the way that real social graphs appear to lie near a simple onedimensional curve through this space. We develop flexible frameworks for studying each of these aspects. For capturing empirical properties, we characterize a simple stochastic generative model, a singleparameter extension of ErdősRényi random graphs, whose stationary distribution over subgraphs closely tracks the onedimensional concentration of the real social graph families. For the extremal properties, we develop a tractable linear program for bounding the feasible space of subgraph frequencies by harnessing a toolkit of known extremal graph theory. Together, these two complementary frameworks shed light on a fundamental question pertaining to social graphs: what properties of social graphs are ‘social ’ properties and what properties are ‘graph ’ properties? We conclude with a brief demonstration of how the coordinate system we examine can also be used to perform classification tasks, distinguishing between structures arising from different types of social graphs.
Efficient frequent query discovery in FARMER
 In Proc. of the 7th PKDD, volume 2838 of LNCS
, 2003
"... Abstract. The upgrade of frequent item set mining to a setup with multiple relations —frequent query mining — poses many efficiency problems. Taking Object Identity as starting point, we present several optimization techniques for frequent query mining algorithms. The resulting algorithm has a bette ..."
Abstract

Cited by 17 (0 self)
 Add to MetaCart
(Show Context)
Abstract. The upgrade of frequent item set mining to a setup with multiple relations —frequent query mining — poses many efficiency problems. Taking Object Identity as starting point, we present several optimization techniques for frequent query mining algorithms. The resulting algorithm has a better performance than a previous ILP algorithm and competes with more specialized graph mining algorithms in performance. 1
Randomization Techniques for Graphs
"... Mining graph data is an active research area. Several data mining methods and algorithms have been proposed to identify structures from graphs; still, the evaluation of those results is lacking. Within the framework of statistical hypothesis testing, we focus in this paper on randomization technique ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
(Show Context)
Mining graph data is an active research area. Several data mining methods and algorithms have been proposed to identify structures from graphs; still, the evaluation of those results is lacking. Within the framework of statistical hypothesis testing, we focus in this paper on randomization techniques for unweighted undirected graphs. Randomization is an important approach to assess the statistical significance of data mining results. Given an input graph, our randomization method will sample data from the class of graphs that share certain structural properties with the input graph. Here we describe three alternative algorithms based on local edge swapping and Metropolis sampling. We test our framework with various graph data sets and mining algorithms for two applications, namely graph clustering and frequent subgraph mining. 1
Towards Motif Detection in Networks: Frequency Concepts and Flexible Search
 in Proceedings of the International Workshop on Network Tools and Applications in Biology (NETTAB04
, 2004
"... Network motifs, patterns of local interconnections with potential functional properties, are important for the analysis of biological networks. To analyse motifs in networks the first step is finding patterns of interest. This paper presents 1) three di#erent concepts for the determination of pa ..."
Abstract

Cited by 14 (0 self)
 Add to MetaCart
(Show Context)
Network motifs, patterns of local interconnections with potential functional properties, are important for the analysis of biological networks. To analyse motifs in networks the first step is finding patterns of interest. This paper presents 1) three di#erent concepts for the determination of pattern frequency and 2) a flexible algorithm to compute these frequencies. The di#erent concepts of pattern frequency depend on the reuse of network elements. The presented algorithm finds patterns with highest frequency and can be used to determine pattern frequency in directed graphs under consideration of these concepts. The utility of this method is demonstrated by applying it to realworld data.
Comparison of graphbased and logicbased multirelational data mining
 ACM SIGKDD Explorations Newsletter
"... The goal of this paper is to generate insights about the differences between graphbased and logicbased approaches to multirelational data mining by performing a case study of graphbased system, Subdue and the inductive logic programming system, CProgol. We identify three key factors for compar ..."
Abstract

Cited by 13 (2 self)
 Add to MetaCart
(Show Context)
The goal of this paper is to generate insights about the differences between graphbased and logicbased approaches to multirelational data mining by performing a case study of graphbased system, Subdue and the inductive logic programming system, CProgol. We identify three key factors for comparing graphbased and logicbased multirelational data mining; namely, the ability to discover structurally large concepts, the ability to discover semantically complicated concepts and the ability to eectively utilize background knowledge. We perform an experimental comparison of Subdue and CProgol on the Mutagenesis domain and various articially generated Bongard problems. Experimental results indicate that Subdue can signicantly outperform CProgol while discovering structurally large multirelational concepts. It is also observed that CProgol is better at learning semantically complicated concepts and it tends to use background knowledge more eectively than Subdue. 1.
Insider threat detection using graphbased approaches
 PROC. CYBERSECURITY APPLICATIONS AND TECHNOLOGY CONFERENCE FOR HOMELAND SECURITY, WASHINGTON DC
, 2009
"... ..."