Results 21  30
of
1,752
Top 10 algorithms in data mining
, 2007
"... Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining a ..."
Abstract

Cited by 126 (2 self)
 Add to MetaCart
(Show Context)
Abstract This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, kMeans, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification,
An efficient algorithm for discovering frequent subgraphs
 IEEE Transactions on Knowledge and Data Engineering
, 2002
"... Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This i ..."
Abstract

Cited by 120 (7 self)
 Add to MetaCart
(Show Context)
Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to nontraditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets.
Mining Frequent Itemsets with Convertible Constraints
 Proc. of 2001 Int. Conf. on Data Engineering
, 2001
"... Recent work has highlighted the importance of the constraintbased mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper, we study constraints which cannot be handled with existing t ..."
Abstract

Cited by 118 (18 self)
 Add to MetaCart
(Show Context)
Recent work has highlighted the importance of the constraintbased mining paradigm in the context of frequent itemsets, associations, correlations, sequential patterns, and many other interesting patterns in large databases. In this paper, we study constraints which cannot be handled with existing theory and techniques. For example,, , ( can contain items of arbitrary values) "!$ # %'&) ( , are customarily regarded as “tough ” constraints in that they cannot be pushed inside an algorithm such as Apriori. We develop a notion of convertible constraints and systematically analyze, classify, and characterize this class. We also develop techniques which enable them to be readily pushed deep inside the recently developed FPgrowth algorithm for frequent itemset mining. Results from our detailed experiments show the effectiveness of the techniques developed. 1.
Mining Frequent Patterns with Counting Inference
 Sigkdd Explorations
, 2000
"... ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^> ..."
Abstract

Cited by 113 (10 self)
 Add to MetaCart
(Show Context)
ACB(D,?E= A&F"=@F"<G?8&:H?E>CI J"FCA; 8:HKMLONQPR1NQSEDT:H; U:V; W 8GA&F XHYHU?</>Z71FC["?I\F"= 8; K]; ^>C8&; F"7VF*_8&:1?`D?I I W ab71FDc7d>*I J"F*A&; 8&:1K e = A&; F*A&;gfih:1; <F"= 8; K]; ^>C8&; F"7; <j1>*<G?XF"7E>.7H?Dk<G8GA>C8&?J*lU>*I I ?X mHn*o opqrks&t*u rHogv r wxv rCypqpr@sp 8:1>C8TA?I ; ?<.F*7z8&:1?/UF"7HU?=H8{F*_c p} mHn*o opqrH~ f?9<G:1FD8&:@>C8]8&:H?9<GY1=H=(FCA&8xFC_`_ A?Y1?78x71F*7HWa?l =1>C8G8?A&7H<U>C7j@?x; _ ?AGA&?X_ AF*KM_ A&?bYH?7b8a?l=1>C8G8&?A&71<`DT; 8&: W F"Y 8E>*UU?<G<&; 71J98:H?ZX1>8>Cj@>C<&?"f\H=@?A&; KE?7b8&<`UF"KE=1>CA&; 71JLNP R1NS/8&F8&:1?T8: A&??`>*I J"F*A&; 8&:HK]< e = A&; F*A&;gB@,I F*<&?`>*71Xzz>CbWGZ; 71?AB <G:1FD8&:@>C8xLNQPR1NS; <]>*KEF"7HJ8&:1?ZKEF"<8?EU; ?7b8]>CI J"F*A&; 8&:HK]< _ FCA{KE; 7H; 71J`_ A?Y1?78T=1>C8G8?A&7H<f 1.
Discriminative frequent pattern analysis for effective classification
 In ICDE
, 2007
"... The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent patternbased classification, and provide solid reasons ..."
Abstract

Cited by 112 (20 self)
 Add to MetaCart
(Show Context)
The application of frequent patterns in classification appeared in sporadic studies and achieved initial success in the classification of relational data, text documents and graphs. In this paper, we conduct a systematic exploration of frequent patternbased classification, and provide solid reasons supporting this methodology. It was well known that feature combinations (patterns) could capture more underlying semantics than single features. However, inclusion of infrequent patterns may not significantly improve the accuracy due to their limited predictive power. By building a connection between pattern frequency and discriminative measures such as information gain and Fisher score, we develop a strategy to set minimum support in frequent pattern mining for generating useful patterns. Based on this strategy, coupled with a proposed feature selection algorithm, discriminative frequent patterns can be generated for building high quality classifiers. We demonstrate that the frequent patternbased classification framework can achieve good scalability and high accuracy in classifying large datasets. Empirical studies indicate that significant improvement in classification accuracy is achieved (up to 12 % in UCI datasets) using the soselected discriminative frequent patterns. 1.
Depth First Generation of Long Patterns
, 2000
"... In this paper we present an algorithm for mining long patterns in databases. The algorithm finds large itemsets by using depth first search on a lexicographic tree of itemsets. The focus of this paper is to develop CPUefficient algorithms for finding frequent itemsets in the cases when the database ..."
Abstract

Cited by 96 (2 self)
 Add to MetaCart
(Show Context)
In this paper we present an algorithm for mining long patterns in databases. The algorithm finds large itemsets by using depth first search on a lexicographic tree of itemsets. The focus of this paper is to develop CPUefficient algorithms for finding frequent itemsets in the cases when the database contains patterns which are very wide. We refer to this algorithm as DepthProject, and it achieves more than one order of magnitude speedup over the recently proposed MaxMiner algorithm for finding long patterns. These techniques may be quite useful for applications in areas such as computational biology in which the number of records is relatively small, but the itemsets are very long. This necessitates the discovery of patterns using algorithms which are especially tailored to the nature of such domains.
HMine: HyperStructure Mining of Frequent Patterns in Large Databases
, 2001
"... Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, mem ..."
Abstract

Cited by 89 (7 self)
 Add to MetaCart
Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, memorybased vs. diskbased, etc.
Efficient Algorithms for Mining Closed Itemsets and Their Lattice Structure
 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets u ..."
Abstract

Cited by 85 (7 self)
 Add to MetaCart
The set of frequent closed itemsets uniquely determines the exact frequency of all itemsets, yet it can be orders of magnitude smaller than the set of all frequent itemsets. In this paper, we present CHARM, an efficient algorithm for mining all frequent closed itemsets. It enumerates closed sets using a dual itemsettidset search tree, using an efficient hybrid search that skips many levels. It also uses a technique called diffsets to reduce the memory footprint of intermediate computations. Finally, it uses a fast hashbased approach to remove any "nonclosed" sets found during computation. We also present CHARML, an algorithm that outputs the closed itemset lattice, which is very useful for rule generation and visualization. An extensive experimental evaluation on a number of real and synthetic databases shows that CHARM is a stateoftheart algorithm that outperforms previous methods. Further, CHARML explicitly generates the frequent closed itemset lattice.
Alternative interest measures for mining associations in databases
 IEEE Transactions on Knowledge and Data Engineering
"... Abstract—Data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data. Discovering associations between items in a large database is one such data mining activity. In finding associations, support is used as an indicator as to whether an a ..."
Abstract

Cited by 83 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Data mining is defined as the process of discovering significant and potentially useful patterns in large volumes of data. Discovering associations between items in a large database is one such data mining activity. In finding associations, support is used as an indicator as to whether an association is interesting. In this paper, we discuss three alternative interest measures for associations: anyconfidence, allconfidence, and bond. We prove that the important downward closure property applies to both allconfidence and bond. We show that downward closure does not hold for anyconfidence. We also prove that, if associations have a minimum allconfidence or minimum bond, then those associations will have a given lower bound on their minimum support and the rules produced from those associations will have a given lower bound on their minimum confidence as well. However, associations that have that minimum support (and likewise their rules that have minimum confidence) may not satisfy the minimum allconfidence or minimum bond constraint. We describe the algorithms that efficiently find all associations with a minimum allconfidence or minimum bond and present some experimental results. Index Terms—Data mining, associations, interest measures, databases, performance. æ 1
Automatic pool allocation: improving performance by controlling data structure layout in the heap
 In Proceedings of PLDI
, 2005
"... This paper describes Automatic Pool Allocation, a transformation framework that segregates distinct instances of heapbased data structures into seperate memory pools and allows heuristics to be used to partially control the internal layout of those data structures. The primary goal of this work is ..."
Abstract

Cited by 82 (9 self)
 Add to MetaCart
(Show Context)
This paper describes Automatic Pool Allocation, a transformation framework that segregates distinct instances of heapbased data structures into seperate memory pools and allows heuristics to be used to partially control the internal layout of those data structures. The primary goal of this work is performance improvement, not automatic memory management, and the paper makes several new contributions. The key contribution is a new compiler algorithm for partitioning heap objects in imperative programs based on a contextsensitive pointer analysis, including a novel strategy for correct handling of indirect (and potentially unsafe) function calls. The transformation does not require type safe programs and works for the full generality of C and C++. Second, the paper describes several optimizations that exploit data structure partitioning to further improve program performance. Third, the paper evaluates how memory hierarchy behavior and overall program performance are impacted by the new transformations. Using a number of benchmarks and a few applications, we find that compilation times are extremely low, and overall running times for heap intensive programs speed up by 1025 % in many cases, about 2x in two cases, and more than 10x in two small benchmarks. Overall, we believe this work provides a new framework for optimizing pointer intensive programs by segregating and controlling the layout of heapbased data structures.