Results 1 - 10
of
45
Substructure Discovery Using Minimum Description Length and Background Knowledge
- Journal of Artificial Intelligence Research
, 1994
"... The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures ..."
Abstract
-
Cited by 127 (34 self)
- Add to MetaCart
The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our Subdue substructure discovery system based on the minimum description length principle. The Subdue system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of Subdue produce a hierarchical description of the structural regularities in the data. Subdue uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by Subdue to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate Subdu...
Algorithmics and Applications of Tree and Graph Searching
- In Symposium on Principles of Database Systems
, 2002
"... Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree an ..."
Abstract
-
Cited by 89 (8 self)
- Add to MetaCart
Modern search engines answer keyword-based queries extremely efficiently. The impressive speed is due to clever inverted index structures, caching, a domain-independent knowledge of strings, and thousands of machines. Several research efforts have attempted to generalize keyword search to keytree and keygraph searching, because trees and graphs have many applications in next-generation database systems. This paper surveys both algorithms and applications, giving some emphasis to our own work.
Fingerprint classification by directional image partitioning
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1999
"... Abstract—In this work, we introduce a new approach to automatic fingerprint classification. The directional image is partitioned into “homogeneous ” connected regions according to the fingerprint topology, thus giving a synthetic representation which can be exploited as a basis for the classificatio ..."
Abstract
-
Cited by 46 (0 self)
- Add to MetaCart
Abstract—In this work, we introduce a new approach to automatic fingerprint classification. The directional image is partitioned into “homogeneous ” connected regions according to the fingerprint topology, thus giving a synthetic representation which can be exploited as a basis for the classification. A set of dynamic masks, together with an optimization criterion, are used to guide the partitioning. The adaptation of the masks produces a numerical vector representing each fingerprint as a multidimensional point, which can be conceived as a continuous classification. Different search strategies are discussed to efficiently retrieve fingerprints both with continuous and exclusive classification. Experimental results have been given for the most commonly used fingerprint databases and the new method has been compared with other approaches known in the literature: As to fingerprint retrieval based on continuous classification, our method gives the best performance and exhibits a very high robustness. Index Terms—Fingerprint classification, directional image, partitioning algorithms, continuous classification, biometric systems. ————————— — F ——————————
Graph-based hierarchical conceptual clustering
- International Journal on Artificial Intelligence Tools
, 2001
"... Hierarchical conceptual clustering has been proven to be a useful data mining technique. Graph-based representation of structural information has been shown to be successful in knowledge discovery. The Subdue substructure discovery system provides the advantages of both approaches. In this paper we ..."
Abstract
-
Cited by 24 (4 self)
- Add to MetaCart
Hierarchical conceptual clustering has been proven to be a useful data mining technique. Graph-based representation of structural information has been shown to be successful in knowledge discovery. The Subdue substructure discovery system provides the advantages of both approaches. In this paper we present Subdue and focus on its clustering capabilities. We use two examples to illustrate the validity of the approach both in structured and unstructured domains, as well as compare Subdue to an earlier clustering algorithm.
Scalable Discovery Of Informative Structural Concepts Using Domain Knowledge
- IEEE Expert
, 1996
"... Discovering repetitive, and functional substructures in large structural databases improves the ability to interpret and compress the data. However, scientists working with a database in their area of expertise often search for predetermined types of structures, or for structures exhibiting characte ..."
Abstract
-
Cited by 22 (18 self)
- Add to MetaCart
Discovering repetitive, and functional substructures in large structural databases improves the ability to interpret and compress the data. However, scientists working with a database in their area of expertise often search for predetermined types of structures, or for structures exhibiting characteristics specific to the domain. This paper presents a method for guiding the discovery process with domain-specific knowledge. In this paper, the Subdue discovery system is used to evaluate the benefits of using domain knowledge to guide the discovery process. Results show that domain-specific knowledge improves the search for substructures which are useful to the domain, and leads to greater compression of the data. Empirical and theoretical results also indicate the scalability of the algorithm to increasingly large structural databases. Keywords--data mining, minimum description length principle, data compression, inexact graph match, domain knowledge, scalability Supported by NASA gran...
Structure-Based Similarity Search with Graph Histograms
- In Proceedings of the 10th International Workshop on Database & Expert Systems Applications
, 1999
"... Objects like road networks, CAD/CAM components, electrical or electronic circuits, molecules, can be represented as graphs, in many modern applications. In this paper, we propose an efficient and effective graph manipulation technique that can be used in graph-based similarity search. Given a query ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
Objects like road networks, CAD/CAM components, electrical or electronic circuits, molecules, can be represented as graphs, in many modern applications. In this paper, we propose an efficient and effective graph manipulation technique that can be used in graph-based similarity search. Given a query graph G q (V; E), we would like to determine fast which are the graphs in the database that are similar to G q (V; E), with respect to a similarity measure. First, we study the similarity measure between two graphs. Then, we discuss graph representation techniques by means of multidimensional vectors. It is shown that no false dismissals are introduced by using the vector representation. Finally we illustrate some representative queries that can be handled by our approach, and present experimental results, based on the proposed graph similarity algorithm. The results show that considerable savings are obtained with respect to computational effort and I/O operations, in comparison to conventional searching techniques.
Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain
- In Proc. of the 12th International Florida AI Research Society Conference
, 1999
"... The ever-increasing number of chemical compounds added every year has not been accompanied by a similar growth in our ability to analyze and classify these compounds. The problem of prevention of cancer caused by many of these chemicals has been of great scientific and humanitarian value. The use of ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
The ever-increasing number of chemical compounds added every year has not been accompanied by a similar growth in our ability to analyze and classify these compounds. The problem of prevention of cancer caused by many of these chemicals has been of great scientific and humanitarian value. The use of AI discovery tools for predicting chemical toxicity is being investigated. The basic idea behind the work is to obtain structure-activity representation (SARs)[Srinivasan et al.], which relates molecular structures to cancerous activity. The data is obtained from the U.S National Toxicology Program conducted by the National Institute of Environmental Health Sciences (NIEHS). A general approach to automatically discover repetitive substructures from the datasets is outlined by this research. Relevant SARs are identified using the Subdue substructure discovery system that discovers commonly occurring substructures in a given set of compounds. The best substructure given by Subdue is used as a...
An Empirical Study of Domain Knowledge and Its Benefits to Substructure Discovery
- IEEE Transactions on Knowledge and Data Engineering
, 1997
"... Discovering repetitive, interesting, and functional substructures in a structural database improves the ability to interpret and compress the data. However, scientists working with a database in their area of expertise often search for predetermined types of structures or for structures exhibiting ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Discovering repetitive, interesting, and functional substructures in a structural database improves the ability to interpret and compress the data. However, scientists working with a database in their area of expertise often search for predetermined types of structures or for structures exhibiting characteristics specific to the domain. This paper presents a method for guiding the discovery process with domain-specific knowledge. In this paper, the SUBDUE discovery system is used to evaluate the benefits of using domain knowledge to guide the discovery process. Domain knowledge is incorporated into SUBDUE following a single general methodology to guide the discovery process. Results show that domain-specific knowledge improves the search for substructures that are useful to the domain and leads to greater compression of the data. To illustrate these benefits, examples and experiments from the computer programming, computer-aided design circuit, and artificially generated domains...
Fast Error-correcting Graph Isomorphism Based on Model Precompilation
- Technischer Bericht IAM-96-012, Institut für Informatik, Universität
, 1996
"... In this paper we present a fast algorithm for the computation of errorcorrecting graph isomorphisms. The new algorithm is an extension of a method for exact subgraph isomorphism detection from an input graph to a set of a priori known model graphs, which was previously developed by the authors. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
In this paper we present a fast algorithm for the computation of errorcorrecting graph isomorphisms. The new algorithm is an extension of a method for exact subgraph isomorphism detection from an input graph to a set of a priori known model graphs, which was previously developed by the authors. Similarly to the original algorithm, the new method is based on the idea of creating a decision tree from the model graphs. This decision tree is compiled off-line in a preprocessing step. At run time, it is used to find all error-correcting graph isomorphisms from an input graph to any of the model graphs up to a certain degree of distortion. The main advantage of the new algorithm is that error-correcting graph isomorphism detection is guaranteed to require time that is only polynomial in terms of the size of the input graph. Furthermore, the time complexity is completely independent of the number of model graphs and the number of edges in each model graph. However, the size of th...

