Results 1  10
of
38
ORIGAMI: Mining Representative Orthogonal Graph Patterns
"... In this paper, we introduce the concept of αorthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are αorthogonal if their similarity is bounded above by α. Each αorthogonal pattern is also a representative for those patterns that are at least β simil ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce the concept of αorthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are αorthogonal if their similarity is bounded above by α. Each αorthogonal pattern is also a representative for those patterns that are at least β similar to it. Given user defined α,β ∈ [0,1], the goal is to mine an αorthogonal, βrepresentative set that minimizes the set of unrepresented patterns. We present ORIGAMI, an effective algorithm for mining the set of representative orthogonal patterns. ORIGAMI first uses a randomized algorithm to randomly traverse the pattern space, seeking previously unexplored regions, to return a set of maximal patterns. ORIGAMI then extracts an αorthogonal, βrepresentative set from the mined maximal patterns. We show the effectiveness of our algorithm on a number of real and synthetic datasets. In particular, we show that our method is able to extract high quality patterns even in cases where existing enumerative graph mining methods fail to do so. 1
From Local Patterns to Global Models: The LeGo Approach to Data Mining
"... Abstract. In this paper we present LeGo, a generic framework that utilizes existing local pattern mining techniques for global modeling in a variety of diverse data mining tasks. In the spirit of well known KDD process models, our work identifies different phases within the data mining step, each of ..."
Abstract

Cited by 19 (5 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we present LeGo, a generic framework that utilizes existing local pattern mining techniques for global modeling in a variety of diverse data mining tasks. In the spirit of well known KDD process models, our work identifies different phases within the data mining step, each of which is formulated in terms of different formal constraints. It starts with a phase of mining patterns that are individually promising. Later phases establish the context given by the global data mining task by selecting groups of diverse and highly informative patterns, which are finally combined to one or more global models that address the overall data mining task(s). The paper discusses the connection to various learning techniques, and illustrates that our framework is broad enough to cover and leverage frequent pattern mining, subgroup discovery, pattern teams, multiview learning, and several other popular algorithms. The Safarii learning toolbox serves as a proofofconcept of its high potential for practical data mining applications. Finally, we point out several challenging open research questions that naturally emerge in a constraintbased localtoglobal pattern mining, selection, and combination framework. 1
ORIGAMI: A Novel and Effective Approach for Mining Representative Orthogonal Graph Patterns
, 2008
"... In this paper, we introduce the concept of αorthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are αorthogonal if their similarity is bounded above by α. Each αorthogonal pattern is also a representative for those patterns that are at least β simil ..."
Abstract

Cited by 14 (6 self)
 Add to MetaCart
(Show Context)
In this paper, we introduce the concept of αorthogonal patterns to mine a representative set of graph patterns. Intuitively, two graph patterns are αorthogonal if their similarity is bounded above by α. Each αorthogonal pattern is also a representative for those patterns that are at least β similar to it. Given user defined α, β ∈ [0, 1], the goal is to mine an αorthogonal, βrepresentative set that minimizes the set of unrepresented patterns. We present ORIGAMI, an effective algorithm for mining the set of representative orthogonal patterns. ORIGAMI first uses a randomized algorithm to randomly traverse the pattern space, seeking previously unexplored regions, to return a set of maximal patterns. ORIGAMI then extracts an αorthogonal, βrepresentative set from the mined maximal patterns. We show the effectiveness of our algorithm on a number of real and synthetic datasets. In particular, we show that our method is able to extract highquality patterns even in cases where existing enumerative graph mining methods fail to do so.
Constructing Iceberg Lattices from Frequent Closures Using Generators
 DS 2008. LNCS (LNAI)
, 2008
"... Frequent closures (FCIs) and generators (FGs) as well as the precedence relation on FCIs are key components in the definition of a variety of association rule bases. Although their joint computation has been studied in concept analysis, no scalable algorithm exists for the task at present. We propo ..."
Abstract

Cited by 10 (5 self)
 Add to MetaCart
(Show Context)
Frequent closures (FCIs) and generators (FGs) as well as the precedence relation on FCIs are key components in the definition of a variety of association rule bases. Although their joint computation has been studied in concept analysis, no scalable algorithm exists for the task at present. We propose here to reverse a method from the latter field using a fundamental property of hypergraph theory. The goal is to extract the precedence relation from a more common mining output, i.e. closures and generators. The resulting order computation algorithm proves to be highly efficient, benefitting from peculiarities of generator families in typical mining datasets. Due to its genericity, the new algorithm fits an arbitrary FCI/FGminer.
Expressive power of an algebra for data mining
 ACM Trans. Database Syst
"... The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is “what’s an appropriate foundation for data mining, which can accommod ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
(Show Context)
The relational data model has simple and clear foundations on which significant theoretical and systems research has flourished. By contrast, most research on data mining has focused on algorithmic issues. A major open question is “what’s an appropriate foundation for data mining, which can accommodate disparate mining tasks. ” We address this problem by presenting a database model and an algebra for data mining. The database model is based on the 3Wmodel introduced by Johnson et al. [2000]. This model relied on black box mining operators. A main contribution of this paper is to open up these black boxes, by using generic operators in a data mining algebra. Two key operators in this algebra are regionize, which creates regions (or models) from data tuples, and a restricted form of looping called mining loop. Then, the resulting data mining algebra MA is studied and properties concerning expressive power and complexity are established. We present results in three directions: (1) expressiveness of the mining algebra; (2) relations with alternative frameworks, and (3) interactions between regionize and mining loop.
Musk: Uniform Sampling of k Maximal Patterns
"... Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a twostep solution; in the first step, they obtain all t ..."
Abstract

Cited by 9 (4 self)
 Add to MetaCart
(Show Context)
Recent research in frequent pattern mining (FPM) has shifted from obtaining the complete set of frequent patterns to generating only a representative (summary) subset of frequent patterns. Most of the existing approaches to this problem adopt a twostep solution; in the first step, they obtain all the frequent patterns, and in the second step, some form of clustering is used to obtain the summary pattern set. However, the twostep method is inefficient and sometimes infeasible since the first step itself may fail to finish in a reasonable amount of time. In this paper, we propose an alternative approach to mining frequent pattern representatives based on a uniform sampling of the output space. Our new algorithm, Musk, obtains representative patterns by sampling uniformly from the pool of all frequent maximal patterns; uniformity is achieved by a variant of Markov Chain Monte Carlo (MCMC) algorithm. Musk simulates a random walk on the frequent pattern partial order graph with a prescribed transition probability matrix, whose values are computed locally during the simulation. In the stationary distribution of the random walk, all maximal frequent pattern nodes in the partial order graph are sampled uniformly. Experiments on various kind of graph and itemset databases validate the effectiveness of our approach.
Actionability and Formal Concepts: A Data Mining Perspective
"... Abstract. The last few years, we have studied different set pattern mining techniques from binary data. It includes the computation of formal concepts to support various knowledge discovery processes. For instance, when considering postgenomics, we can exploit Boolean data sets that encode a relati ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
Abstract. The last few years, we have studied different set pattern mining techniques from binary data. It includes the computation of formal concepts to support various knowledge discovery processes. For instance, when considering postgenomics, we can exploit Boolean data sets that encode a relation between some genes and the proteins that may regulate them. In such a context, it appears interesting to exploit the analogy between a putative transcriptional module (i.e., a typically important hypothesis for gene regulation understanding) and a formal concept that holds within such data. In this paper, we assume that knowledge nuggets can be captured by collections of formal concepts and we discuss the challenging issue of mining/selecting actionable patterns from these collections, i.e., looking for relevant patterns that really support knowledge discovery. Therefore, a major issue concerns the computation of complete collections of formal concepts that satisfy userdefined constraints. This is useful not only to avoid the computation of too small patterns that might be due to noise (e.g., using size constraints on both their intents and extents) but also to introduce some faulttolerance. We discuss the pros and the cons of some recent proposals in that direction. 1
Mining conjunctive sequential patterns
 Data Mining and Knowledge Discovery
"... Abstract. In this paper we aim at extending the nonderivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of nonderivability is meaningless. Therefore, we extend our focus ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we aim at extending the nonderivable condensed representation in frequent itemset mining to sequential pattern mining. We start by showing a negative example: in the context of frequent sequences, the notion of nonderivability is meaningless. Therefore, we extend our focus to the mining of conjunctions of sequences. Besides of being of practical importance, this class of patterns has some nice theoretical properties. Based on a new unexploited theoretical definition of equivalence classes for sequential patterns, we are able to extend the notion of a nonderivable itemset to the sequence domain. We present a new depthfirst approach to mine nonderivable conjunctive sequential patterns and show its use in mining association rules for sequences. This approach is based on a well known combinatorial theorem: the Möbius inversion. A performance study using both synthetic and real datasets illustrates the efficiency of our mining algorithm. These new introduced patterns have a highpotential for reallife applications, especially for network monitoring and biomedical fields with the ability to get sequential association rules with all the classical statistical metrics such as confidence, conviction, lift etc. 1
Reliable representations for association rules
 Data & Knowledge Engineering
, 2011
"... This is the author’s version of a work that was submitted/accepted for publication in the following source: ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
This is the author’s version of a work that was submitted/accepted for publication in the following source:
Discovering Knowledge from Local Patterns with Global Constraints
"... Abstract. It is well known that local patterns are at the core of a lot of knowledge which may be discovered from data. Nevertheless, use of local patterns is limited by their huge number and computational costs. Several approaches (e.g., condensed representations, pattern set discovery) aim at grou ..."
Abstract

Cited by 4 (1 self)
 Add to MetaCart
(Show Context)
Abstract. It is well known that local patterns are at the core of a lot of knowledge which may be discovered from data. Nevertheless, use of local patterns is limited by their huge number and computational costs. Several approaches (e.g., condensed representations, pattern set discovery) aim at grouping or synthesizing local patterns to provide a global view of the data. A global pattern is a pattern which is a set or a synthesis of local patterns coming from the data. In this paper, we propose the idea of global constraints to write queries addressing global patterns. A key point is the ability to bias the designing of global patterns according to the expectation of the user. For instance, a global pattern can be oriented towards the search of exceptions or a clustering. It requires to write queries taking into account such biases. Open issues are to design a generic framework to express powerful global constraints and solvers to mine them. We think that global constraints are a promising way to discover relevant global patterns.