Results 1 
9 of
9
Association Mining
, 2006
"... The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateofthe ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
The task of finding correlations between items in a dataset, association mining, has received considerable attention over the last decade. This article presents a survey of association mining fundamentals, detailing the evolution of association mining algorithms from the seminal to the stateoftheart. This survey focuses on the fundamental principles of association mining, that is, itemset identification, rule generation, and their generic optimizations.
Frequent Itemsets for Genomic Profiling
 In Proc. 1st International Symposium on Computational Life Sciences (CompLife 2005), LNCS 3695
"... Abstract. Frequent itemset mining is a promising approach to the study of genomic profiling data. Here a dataset consists of real numbers describing the relative level in which a clone occurs in human DNA for given patient samples. One can then mine, for example, for sets of samples that share some ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
Abstract. Frequent itemset mining is a promising approach to the study of genomic profiling data. Here a dataset consists of real numbers describing the relative level in which a clone occurs in human DNA for given patient samples. One can then mine, for example, for sets of samples that share some common behavior on the clones, i.e., gains or losses. Frequent itemsets show promising biological expressiveness, can be computed efficiently, and are very flexible. Their visualization provides the biologist with useful information for the discovery of patterns. Also it turns out that the use of (larger) frequent itemsets tends to filter out noise. 1
A probability analysis for candidatebased frequent itemset algorithms
 In Proceedings of the 2006 ACM Symposium on Applied Computing, DM track, volume 1 of 2
, 2006
"... This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a select ..."
Abstract

Cited by 6 (1 self)
 Add to MetaCart
(Show Context)
This paper explores the generation of candidates, which is an important step in frequent itemset mining algorithms, from a theoretical point of view. Important notions in our probabilistic analysis are success (a candidate that is frequent), and failure (a candidate that is infrequent). For a selection of candidatebased frequent itemset mining algorithms, the probabilities of these events are studied for the shopping model where all the shoppers are independent and each combination of items has its own probability, so any correlation between items is possible. The Apriori Algorithm is considered in detail; for AIS, Eclat, FPgrowth and the Fast Completion Apriori Algorithm, the main principles are sketched. The results of the analysis are used to compare the behaviour of the algorithms for a variety of data distributions. 1.
A Hierarchical Dynamic Load Balancing Strategy for Distributed Data Mining
"... Extracting useful knowledge from data sets measuring in gigabytes and even terabytes is a challenging research area for the data mining community. Sequential approaches suffer from a performance problem due to the fact that they have to mine voluminous databases. Parallelism is introduced as an impo ..."
Abstract
 Add to MetaCart
Extracting useful knowledge from data sets measuring in gigabytes and even terabytes is a challenging research area for the data mining community. Sequential approaches suffer from a performance problem due to the fact that they have to mine voluminous databases. Parallelism is introduced as an important solution that could improve the response time and the scalability of these approaches. However, parallelization process is not trivial and still facing many challenges including the workload balancing problem. In this paper, we propose a hierarchical dynamic load balancing strategy for parallel association rule mining algorithms in the context of a Grid computing environment. The French research grid “Grid’5000 ” is used as our experimental testbed. Through a detailed experimental study, we show that our strategy improves the performance and helps the parallel algorithm to scale very well with the number of computational nodes available.
Mining Frequent Itemsets A Perspective from Operations Research
"... Many papers on frequent itemsets have been published. Besides some contests in this field were held. In the majority of the papers the focus is on speed. Ad hoc algorithms and datastructures were introduced. In this paper we put most of the algorithms in one framework, using classical Operations Res ..."
Abstract
 Add to MetaCart
(Show Context)
Many papers on frequent itemsets have been published. Besides some contests in this field were held. In the majority of the papers the focus is on speed. Ad hoc algorithms and datastructures were introduced. In this paper we put most of the algorithms in one framework, using classical Operations Research paradigms such as backtracking, depthfirst and breadthfirst search, and branchandbound. Moreover we present experimental results where the different algorithms are implemented under similar designs.
Bestk Queries on Database Systems
"... the study of how to select k items based on fuzzy matching and ranking of database tuples (i.e. topk queries) has attracted much attention recently. However, taking the topk tuples based on their scores computed independently is inadequate for modeling some complex queries finding the best k tupl ..."
Abstract
 Add to MetaCart
the study of how to select k items based on fuzzy matching and ranking of database tuples (i.e. topk queries) has attracted much attention recently. However, taking the topk tuples based on their scores computed independently is inadequate for modeling some complex queries finding the best k tuples based on some selection criteria involving a global measure on multiple selected tuples (e.g., tuple redundancy or compatibility). In this paper, we introduce and study such bestk queries, and further model a database selection problem generally as a decision problem, in which a database system would respond to a query by selecting a subset of tuples that optimize a certain utility function defined globally. Accordingly, we present a general formal framework for database selection, which covers the boolean query search, the topk query search, and the bestk query search all as special cases. We prove that finding answers to a general bestk query is an NPhard problem and propose an efficient approximation algorithm.
Design and Analysis of a Dynamic Load Balancing Strategy for LargeScale Distributed Association Rule Mining
"... Abstract Association rule mining is one of the most important data mining techniques. Algorithms of this technique search a large space, considering numerous different alternatives and scanning the data repeatedly. Parallelism seems to be the natural solution in order to be able to work with indust ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Association rule mining is one of the most important data mining techniques. Algorithms of this technique search a large space, considering numerous different alternatives and scanning the data repeatedly. Parallelism seems to be the natural solution in order to be able to work with industrialsized databases. Largescale computing systems, such as Grid computing environments, are recently regarded as promising platforms for data and computationintensive applications like data mining. However, to improve the performance and achieve scalability by using these heterogeneous platforms, new data partitioning approaches and workload balancing features are needed. The focus of this paper is to propose a dynamic load balancing strategy for parallel association rule mining algorithms in the context of a Grid computing environment. This strategy is built upon a distributed model which necessitates small overheads in the communication costs for load updates and for both data and work transfers. It also supports the heterogeneity of the system and it is fault tolerant.