Results 1 
9 of
9
Output Space Sampling for Graph Patterns
, 2009
"... Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms s ..."
Abstract

Cited by 22 (5 self)
 Add to MetaCart
(Show Context)
Recent interest in graph pattern mining has shifted from finding all frequent subgraphs to obtaining a small subset of frequent subgraphs that are representative, discriminative or significant. The main motivation behind that is to cope with the scalability problem that the graph mining algorithms suffer when mining databases of large graphs. Another motivation is to obtain a succinct output set that is informative and useful. In the same spirit, researchers also proposed sampling based algorithms that sample the output space of the frequent patterns to obtain representative subgraphs. In this work, we propose a generic sampling framework that is based on MetropolisHastings algorithm to sample the output space of frequent subgraphs. Our experiments on various sampling strategies show the versatility, utility and efficiency of the proposed sampling approach.
Formal Concept Sampling for Counting and ThresholdFree Local Pattern Mining
"... We describe a MetropolisHastings algorithm for sampling formal concepts, i.e., closed (item) sets, according to any desired strictly positive distribution. Important applications are (a) estimating the number of all formal concepts as well as (b) discovering any number of interesting, nonredundan ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
We describe a MetropolisHastings algorithm for sampling formal concepts, i.e., closed (item) sets, according to any desired strictly positive distribution. Important applications are (a) estimating the number of all formal concepts as well as (b) discovering any number of interesting, nonredundant, and representative local patterns. Setting (a) can be used for estimating the runtime of algorithms examining all formal concepts. An application of setting (b) is the construction of data mining systems that do not require any userspecified threshold like minimum frequency or confidence. 1
Ring: An integrated method for frequent representative subgraph mining
 In ICDM
, 2009
"... Abstract—We propose a novel representative based subgraph mining model. A series of standards and methods are proposed to select invariants. Patterns are mapped into invariant vectors in a multidimensional space. To find qualified patterns, only a subset of frequent patterns is generated as represen ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a novel representative based subgraph mining model. A series of standards and methods are proposed to select invariants. Patterns are mapped into invariant vectors in a multidimensional space. To find qualified patterns, only a subset of frequent patterns is generated as representatives, such that every frequent pattern is close to one of the representative patterns while representative patterns are distant from each other. We devise the RING algorithm, integrating the representative selection into the pattern mining process. Meanwhile, we use Rtrees to assist this mining process. Last but not least, a large number of real and synthetic datasets are employed for the empirical study, which show the benefits of the representative model and the efficiency of the RING algorithm.
Sampling Minimal Frequent Boolean (DNF) Patterns
"... We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns (in ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
We tackle the challenging problem of mining the simplest Boolean patterns from categorical datasets. Instead of complete enumeration, which is typically infeasible for this class of patterns, we develop effective sampling methods to extract a representative subset of the minimal Boolean patterns (in disjunctive normal form – DNF). We make both theoretical and practical contributions, which allow us to prune the search space based on provable properties. Our approach can provide a nearuniform sample of the minimal DNF patterns. We also show that the mined minimal DNF patterns are very effective when used as features for classification.
Randomly Sampling Maximal Itemsets
"... Pattern mining techniques generally enumerate lots of uninteresting and redundant patterns. To obtain less redundant collections, techniques exist that give condensed representations of these collections. However, the proposed techniques often rely on complete enumeration of the pattern space, which ..."
Abstract
 Add to MetaCart
(Show Context)
Pattern mining techniques generally enumerate lots of uninteresting and redundant patterns. To obtain less redundant collections, techniques exist that give condensed representations of these collections. However, the proposed techniques often rely on complete enumeration of the pattern space, which can be prohibitive in terms of time and memory. Sampling can be used to filter the output space of patterns without explicit enumeration. We propose a framework for random sampling of maximal itemsets from transactional databases. The presented framework can use any monotonically decreasing measure as interestingness criteria for this purpose. Moreover, we use an approximation measure to guide the search for maximal sets to different parts of the output space. We show in our experiments that the method can rapidly generate small collections of patterns with good quality. The sampling framework has been implemented in the interactive visual data mining tool called MIME 1, as such enabling users to quickly sample a collection of patterns and analyze the results.
Journeys to Data Mining.Mohamed Medhat Gaber Editor Journeys to Data Mining Experiences from 15 Renowned ResearchersEditor
"... ..."
(Show Context)
Sampling Frequent and Minimal Boolean Patterns: Theory and Application
"... (will be inserted by the editor) ..."
(Show Context)
Mining Maximal Frequent Patterns With Similarity Matrices of Data Records
"... In this paper, we proposed a similarity matrix based method to mining maximal frequent patterns from large database. The study is very different from the previous Aprioriliked method. Especially, the method can be performed directly on the original data in database without various format transf ..."
Abstract
 Add to MetaCart
(Show Context)
In this paper, we proposed a similarity matrix based method to mining maximal frequent patterns from large database. The study is very different from the previous Aprioriliked method. Especially, the method can be performed directly on the original data in database without various format transformation. The analyzing and experimental results show that the method is useful for frequent pattern mining tasks with large data set.