Results 1  10
of
31
Depthfirst nonderivable itemset mining
 In SIAM Int. Conf. on Data Mining (SDM’05
, 2005
"... Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, indepe ..."
Abstract

Cited by 58 (7 self)
 Add to MetaCart
(Show Context)
Mining frequent itemsets is one of the main problems in data mining. Much effort went into developing efficient and scalable algorithms for this problem. When the support threshold is set too low, however, or the data is highly correlated, the number of frequent itemsets can become too large, independently of the algorithm used. Therefore, it is often more interesting to mine a reduced collection of interesting itemsets, i.e., a condensed representation. Recently, in this context, the nonderivable itemsets were proposed as an important class of itemsets. An itemset is called derivable when its support is completely determined by the support of its subsets. As such, derivable itemsets represent redundant information and can be pruned from the collection of frequent itemsets. It was shown both theoretically and experimentally that the collection of nonderivable frequent itemsets is in general much smaller than the complete set of frequent itemsets. A breadthfirst, Aprioribased algorithm, called NDI, to find all nonderivable itemsets was proposed. In this paper we present a depthfirst algorithm, dfNDI, that is based on Eclat for mining the nonderivable itemsets. dfNDI is evaluated on reallife datasets, and experiments show that dfNDI outperforms NDI with an order of magnitude. 1
A survey on condensed representations for frequent sets
 In: Constraint Based Mining and Inductive Databases, SpringerVerlag, LNAI
, 2005
"... Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Freq ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
Abstract. Solving inductive queries which have to return complete collections of patterns satisfying a given predicate has been studied extensively the last few years. The specific problem of frequent set mining from potentially huge boolean matrices has given rise to tens of efficient solvers. Frequent sets are indeed useful for many data mining tasks, including the popular association rule mining task but also feature construction, associationbased classification, clustering, etc. The research in this area has been boosted by the fascinating concept of condensed representations w.r.t. frequency queries. Such representations can be used to support the discovery of every frequent set and its support without looking back at the data. Interestingly, the size of condensed representations can be several orders of magnitude smaller than the size of frequent set collections. Most of the proposals concern exact representations while it is also possible to consider approximated ones, i.e., to trade computational complexity with a bounded approximation on the computed support values. This paper surveys the core concepts used in the recent works on condensed representation for frequent sets. 1
On Inverse Frequent Set Mining
, 2003
"... Frequent set mining is a wellknown technique to summarize binary data. However, it is an open problem how difficult it is to invert the frequent set mining, i.e., how difficult it is to find a binary data set that is compatible with frequent set mining results, the frequent sets. This inverse data ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Frequent set mining is a wellknown technique to summarize binary data. However, it is an open problem how difficult it is to invert the frequent set mining, i.e., how difficult it is to find a binary data set that is compatible with frequent set mining results, the frequent sets. This inverse data mining problem is related to the questions of how well privacy is preserved in the frequent sets and how well the frequent sets characterize the original data set. In this paper we analyze the computational complexity of the problem of finding a binary data set compatible with a given collection of frequent sets and show that in many cases the problem is computationally very difficult.
Closed Sets for Labeled Data ⋆
"... Abstract. Closed sets are being successfully applied in the context of compacted data representation for association rule learning. However, their use is mainly descriptive. This paper shows that, when considering labeled data, closed sets can be adapted for prediction and discrimination purposes by ..."
Abstract

Cited by 24 (0 self)
 Add to MetaCart
(Show Context)
Abstract. Closed sets are being successfully applied in the context of compacted data representation for association rule learning. However, their use is mainly descriptive. This paper shows that, when considering labeled data, closed sets can be adapted for prediction and discrimination purposes by conveniently contrasting covering properties on positive and negative examples. We formally justify that these sets characterize the space of relevant combinations of features for discriminating the target class. In practice, identifying relevant/irrelevant combinations of features through closed sets is useful in many applications. Here we apply it to compacting emerging patterns and essential rules and to learn descriptions for subgroup discovery. 1
Time series knowledge mining
, 2006
"... An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of loca ..."
Abstract

Cited by 20 (2 self)
 Add to MetaCart
An important goal of knowledge discovery is the search for patterns in data that can help explain the underlying process that generated the data. The patterns are required to be new, useful, and understandable to humans. In this work we present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge. The patterns have a hierarchical structure, each level corresponds to a single temporal concept. On the lowest level, intervals are used to represent duration. Overlapping parts of intervals represent coincidence on the next level. Several such blocks of intervals are connected with a partial order relation on the highest level. Each pattern element consists of a semiotic triple to connect syntactic and semantic information with pragmatics. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. Efficient algorithms for the discovery of the patterns are proposed. The search for coincidence as well as partial order can be formulated as variants of the well known frequent itemset problem. One of the best known algorithms for this problem is therefore adapted for our purposes. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using several data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field.
An efficient framework for mining flexible constraints
 In PAKDD
, 2005
"... Abstract. Constraintbased mining is an active field of research which is a key point to get interactive and successful KDD processes. Nevertheless, usual solvers are limited to particular kinds of constraints because they rely on properties to prune the search space which are incompatible together. ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
(Show Context)
Abstract. Constraintbased mining is an active field of research which is a key point to get interactive and successful KDD processes. Nevertheless, usual solvers are limited to particular kinds of constraints because they rely on properties to prune the search space which are incompatible together. In this paper, we provide a general framework dedicated to a large set of constraints described by SQLlike and syntactic primitives. This set of constraints covers the usual classes and introduces new tough and flexible constraints. We define a pruning operator which prunes the search space by automatically taking into account the characteristics of the constraint at hand. Finally, we propose an algorithm which efficiently makes use of this framework. Experimental results highlight that usual and new complex constraints can be mined in large datasets. 1
FREQUENT SET MINING
"... Frequent sets lie at the basis of many Data Mining algorithms. As a result, hundreds of algorithms have been proposed in order to solve the frequent set mining problem. In this chapter, we attempt to survey the most successful algorithms and techniques that try to solve this problem efficiently. ..."
Abstract

Cited by 13 (1 self)
 Add to MetaCart
Frequent sets lie at the basis of many Data Mining algorithms. As a result, hundreds of algorithms have been proposed in order to solve the frequent set mining problem. In this chapter, we attempt to survey the most successful algorithms and techniques that try to solve this problem efficiently.
Efficient mining of understandable patterns from multivariate interval time series
 Data Mining and Knowledge Discovery
, 2007
"... Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
(Show Context)
Abstract. We present a new method for the understandable description of local temporal relationships in multivariate data, called Time Series Knowledge Mining (TSKM). We define the Time Series Knowledge Representation (TSKR) as a new language for expressing temporal knowledge in time interval data. The patterns have a hierarchical structure, with levels corresponding to the temporal concepts duration, coincidence, and partial order. The patterns are very compact, but offer details for each element on demand. In comparison with related approaches, the TSKR is shown to have advantages in robustness, expressivity, and comprehensibility. The search for coincidence and partial order in interval data can be formulated as instances of the well known frequent itemset problem. Efficient algorithms for the discovery of the patterns are adapted accordingly. A novel form of search space pruning effectively reduces the size of the mining result to ease interpretation and speed up the algorithms. Human interaction is used during the mining to analyze and validate partial results as early as possible and guide further processing steps. The efficacy of the methods is demonstrated using two real life data sets. In an application to sports medicine the results were recognized as valid and useful by an expert of the field. Keywords: knowledge discovery, time series, interval patterns, Allen’s relations 1
BLOSOM: A Framework for Mining Arbitrary Boolean Expressions over Attribute Sets
 IN: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD 2006
, 2006
"... We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binaryvalued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, ..."
Abstract

Cited by 12 (5 self)
 Add to MetaCart
We introduce a novel framework (BLOSOM) for mining (frequent) boolean expressions over binaryvalued datasets. We organize the space of boolean expressions into four categories: pure conjunctions, pure disjunctions, conjunction of disjunctions, and disjunction of conjunctions. For each category, we propose a closure operator that naturally leads to the concept of a closed boolean expression. The closed expressions and their minimal generators give the most specific and most general boolean expressions that are satisfied by their corresponding object set. Further, the closed/minimal generator expressions form a lossless representation of all possible boolean expressions. BLOSSOM efficiently
An Automata Approach to Pattern Collections
 Knowledge Discovery in Inductive Databases, 3rd International Workshop, KDID 2004
, 2004
"... Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations sh ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
Condensed representations of pattern collections have been recognized to be important building blocks of inductive databases, a promising theoretical framework for data mining, and recently they have been studied actively. However, there has not been much research on how condensed representations should actually be represented. In this paper we study how...