Results 1  10
of
327
A quickstart in frequent structure mining can make a difference
 In Proc. of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD2004
, 2004
"... Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have bee ..."
Abstract

Cited by 156 (5 self)
 Add to MetaCart
(Show Context)
Given a database, structure mining algorithms search for substructures that satisfy constraints such as minimum frequency, minimum confidence, minimum interest and maximum frequency. Examples of substructures include graphs, trees and paths. For these substructures many mining algorithms have been proposed. In order to make graph mining more efficient, we investigate the use of the “quickstart principle”, which is based on the fact that these classes of structures are contained in each other, thus allowing for the development of structure mining algorithms that split the search into steps of increasing complexity. We introduce the GrAph/Sequence/Tree extractiON (Gaston) algorithm that implements this idea by searching first for frequent paths, then frequent free trees and finally cyclic graphs. We investigate two alternatives for computing the frequency of structures and present experimental results to relate these alternatives.
Polymorphic Worm Detection Using Structural Information of Executables
 In RAID
, 2005
"... Abstract. Network worms are malicious programs that spread automatically across networks by exploiting vulnerabilities that affect a large number of hosts. Because of the speed at which worms spread to large computer populations, countermeasures based on human reaction time are not feasible. Therefo ..."
Abstract

Cited by 148 (13 self)
 Add to MetaCart
(Show Context)
Abstract. Network worms are malicious programs that spread automatically across networks by exploiting vulnerabilities that affect a large number of hosts. Because of the speed at which worms spread to large computer populations, countermeasures based on human reaction time are not feasible. Therefore, recent research has focused on devising new techniques to detect and contain network worms without the need of human supervision. In particular, a number of approaches have been proposed to automatically derive signatures to detect network worms by analyzing a number of wormrelated network streams. Most of these techniques, however, assume that the worm code does not change during the infection process. Unfortunately, worms can be polymorphic. That is, they can mutate as they spread across the network. To detect these types of worms, it is necessary to devise new techniques that are able to identify similarities between different mutations of a worm. This paper presents a novel technique based on the structural analysis of binary code that allows one to identify structural similarities between different worm mutations. The approach is based on the analysis of a worm’s control flow graph and introduces an original graph coloring technique that supports a more precise characterization of the worm’s structure. The technique has been used as a basis to implement a worm detection system that is resilient to many of the mechanisms used to evade approaches based on instruction sequences only.
Finding frequent patterns in a large sparse graph
 SIAM Data Mining Conference
, 2004
"... This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine ..."
Abstract

Cited by 131 (5 self)
 Add to MetaCart
(Show Context)
This paper presents two algorithms based on the horizontal and vertical pattern discovery paradigms that find the connected subgraphs that have a sufficient number of edgedisjoint embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods to determine the number of the edgedisjoint embeddings of a subgraph that are based on approximate and exact maximum independent set computations and use it to prune infrequent subgraphs. Experimental evaluation on real datasets from various domains show that both algorithms achieve good performance, scale well to sparse input graphs with more than 100,000 vertices, and significantly outperform a previously developed algorithm.
An efficient algorithm for discovering frequent subgraphs
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract

Cited by 120 (9 self)
 Add to MetaCart
(Show Context)
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
An improved algorithm for matching large graphs
 In: 3rd IAPRTC15 Workshop on Graphbased Representations in Pattern Recognition, Cuen
, 2001
"... In this paper an improved version of a graph matching algorithm is presented, which is able to efficiently solve the graph isomorphism and graphsubgraph isomorphism problems on Attributed Relational Graphs. This version is particularly suited to work with very large graphs, since its memory require ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
(Show Context)
In this paper an improved version of a graph matching algorithm is presented, which is able to efficiently solve the graph isomorphism and graphsubgraph isomorphism problems on Attributed Relational Graphs. This version is particularly suited to work with very large graphs, since its memory requirements are quite smaller than those of other algorithms of the same kind. After a detailed description of the algorithm, an experimental comparison is made against both the previous version (developed by the same authors) and the Ullmann’s algorithm. 1.
The Graph Isomorphism Problem
, 1996
"... The graph isomorphism problem can be easily stated: check to see if two graphs that look differently are actually the same. The problem occupies a rare position in the world of complexity theory, it is clearly in NP but is not known to be in P and it is not known to be NPcomplete. Many subdiscipli ..."
Abstract

Cited by 87 (0 self)
 Add to MetaCart
(Show Context)
The graph isomorphism problem can be easily stated: check to see if two graphs that look differently are actually the same. The problem occupies a rare position in the world of complexity theory, it is clearly in NP but is not known to be in P and it is not known to be NPcomplete. Many subdisciplines of mathematics, such as topology theory and group theory, can be brought to bear on the problem, and yet only for special classes of graphs have polynomialtime algorithms been discovered. Incongruently, this problem seems very easy in practice. It is almost always trivial to check two random graphs for isomorphism, and fast hardware implementations exists for application domains such as image processing. This paper is mostly a survey of related work in the graph isomorphism field. We examine the problem from many angles, mirroring the multifaceted nature of the literature. We survey complexity results for the graph isomorphism problem, and discuss some of the classes of graphs which hav...
Applicationspecific instruction generation for configurable processor architectures
 in Proc. ACM International Symposium on FieldProgrammable Gate Arrays
, 2004
"... Designing an applicationspecific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerge ..."
Abstract

Cited by 68 (7 self)
 Add to MetaCart
(Show Context)
Designing an applicationspecific embedded system in nanometer technologies has become more difficult than ever due to the rapid increase in design complexity and manufacturing cost. Efficiency and flexibility must be carefully balanced to meet different application requirements. The recently emerged configurable and extensible processor architectures offer a favorable tradeoff between efficiency and flexibility, and a promising way to minimize certain important metrics (e.g., execution time, code size, etc.) of the embedded processors. This paper addresses the problem of generating the applicationspecific instructions to improve the execution speed for configurable processors. A set of algorithms, including pattern generation, pattern selection, and application mapping, are proposed to efficiently utilize the instruction set extensibility of the target configurable processor. Applications of our approach to several reallife benchmarks on the Altera Nios processor show encouraging performance speedup (2.75X on average and up to 3.73X in some cases).
Symmetry Breaking for Boolean Satisfiability: . . .
"... Boolean Satisfiability solvers improved dramatically over the last seven years [14, 13] and are commonly used in applications such as bounded model checking, planning, and FPGA routing. However, a number of practical SAT instances remain difficult to solve. Recent work pointed out that symmetries i ..."
Abstract

Cited by 55 (12 self)
 Add to MetaCart
Boolean Satisfiability solvers improved dramatically over the last seven years [14, 13] and are commonly used in applications such as bounded model checking, planning, and FPGA routing. However, a number of practical SAT instances remain difficult to solve. Recent work pointed out that symmetries in the search space are often to blame [1]. The framework of symmetrybreaking (SBPs) [5], together with further improvements [1], was then used to achieve empirical speedups. For symmetrybreaking to be successful in practice, its overhead must be less than the complexity reduction it brings. In this work we show how logic minimization helps to improve this tradeoff and achieve much better empirical results. We also contribute detailed new studies of SBPs and their efficiency as well as new general constructions of SBPs.
Network motif discovery using subgraph enumeration and symmetry breaking
 IN PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY (RECOMB
, 2007
"... The study of biological networks and network motifs can yield significant new insights into systems biology. Previous methods of discovering network motifs – networkcentric subgraph enumeration and sampling – have been limited to motifs of 6 to 8 nodes, revealing only the smallest network componen ..."
Abstract

Cited by 55 (1 self)
 Add to MetaCart
(Show Context)
The study of biological networks and network motifs can yield significant new insights into systems biology. Previous methods of discovering network motifs – networkcentric subgraph enumeration and sampling – have been limited to motifs of 6 to 8 nodes, revealing only the smallest network components. New methods are necessary to identify larger network substructures and functional motifs. Here we present a novel algorithm for discovering large network motifs that achieves these goals, based on a novel symmetrybreaking technique, which eliminates repeated isomorphism testing, leading to an exponential speedup over previous methods. This technique is made possible by reversing the traditional networkbased search at the heart of the algorithm to a motifbased search, which also eliminates the need to store all motifs of a given size and enables parallelization and scaling. Additionally, our method enables us to study the clustering properties of discovered motifs, revealing even larger network elements. We apply this algorithm to the proteinprotein interaction network and transcription regulatory network of S. cerevisiae, and discover several large network motifs, which were previously inaccessible to existing methods, including a 29node cluster of 15node motifs corresponding to the key transcription machinery of S. cerevisiae.
A performance comparison of five algorithms for graph isomorphism
 in Proceedings of the 3rd IAPR TC15 Workshop on Graphbased Representations in Pattern Recognition
, 2001
"... Despite the significant number of isomorphism algorithms presented in the literature, till now no efforts have been done for characterizing their performance. Consequently, it is not clear how the behavior of those algorithms varies as the type and the size of the graphs to be matched varies in case ..."
Abstract

Cited by 54 (2 self)
 Add to MetaCart
(Show Context)
Despite the significant number of isomorphism algorithms presented in the literature, till now no efforts have been done for characterizing their performance. Consequently, it is not clear how the behavior of those algorithms varies as the type and the size of the graphs to be matched varies in case of real applications. In this paper we present a benchmarking activity for characterizing the performance of a bunch of algorithms for exact graph isomorphism. To this purpose we use a large database containing 10,000 couples of isomorphic graphs with different topologies (regular graphs, randomly connected graphs, bounded valence graph), enriched with suitably modified versions of them for simulating distortions occurring in real cases. The size of the considered graphs ranges from a few nodes to about 1000 nodes. 1.