Results 1  10
of
20
Interestingness measures for data mining: a survey
 ACM Computing Surveys
"... Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to ..."
Abstract

Cited by 158 (2 self)
 Add to MetaCart
Interestingness measures play an important role in data mining, regardless of the kind of patterns being mined. These measures are intended for selecting and ranking patterns according to their potential interest to the user. Good measures also allow the time and space costs of the mining process to be reduced. This survey reviews the interestingness measures for rules and summaries, classifies them from several perspectives, compares their properties, identifies their roles in the data mining process, gives strategies for selecting appropriate measures for applications, and identifies opportunities for future research in this area.
Knowledge discovery and interestingness measures: A survey
, 1999
"... Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analy ..."
Abstract

Cited by 61 (1 self)
 Add to MetaCart
(Show Context)
Knowledge discovery in databases, also known as data mining, is the efficient discovery of previously unknown, valid, novel, potentially useful, and understandable patterns in large databases. It encompasses many different techniques and algorithms which differ in the kinds of data that can be analyzed and the form of knowledge representation used to convey the discovered knowledge. An important problem in the area of data mining is the development of effective measures of interestingness for ranking the discovered knowledge. In this report, we provide a general overview of the more successful and widely known data mining techniques and algorithms, and survey seventeen interestingness measures from the literature that have been successfully employed in data mining applications. 1 1
Data Mining in Large Databases Using Domain Generalization Graphs
 Journal of Intelligent Information Systems
, 1999
"... Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of at ..."
Abstract

Cited by 18 (4 self)
 Add to MetaCart
(Show Context)
Attributeoriented generalization summarizes the information in a relational database by repeatedly replacing specific attribute values with more general concepts according to userdefined concept hierarchies. We introduce domain generalization graphs for controlling the generalization of a set of attributes and show how they are constructed. We then present serial and parallel versions of the MultiAttribute Generalization algorithm for traversing the generalization state space described by joining the domain generalization graphs for multiple attributes. Based upon a generateandtest approach, the algorithm generates all possible summaries consistent with the domain generalization graphs. Our experimental results show that significant speedups are possible by partitioning path combinations from the DGGs across multiple processors. We also rank the interestingness of the resulting summaries using measures based upon variance and relative entropy. Our experimental results also show that these measures provide an effective basis for analyzing summary data generated from relational databases. Variance appears more useful because it tends to rank the less complex summaries (i.e., those with few attributes and/or tuples) as more interesting.
Heuristics for Ranking the Interestingness of Discovered Knowledge
 Proceedings of the Third PacificAsia Conference on Knowledge Discovery and Data Mining (PAKDD'99
, 1999
"... We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution. ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
(Show Context)
We describe heuristics, based upon information theory and statistics, for ranking the interestingness of summaries generated from databases. The tuples in a summary are unique, and therefore, can be considered to be a population described by some probability distribution.
Combined pattern mining: from learned rules to actionable knowledge. AI08
, 2008
"... Abstract. Association mining often produces large collections of association rules that are difficult to understand and put into action. In this paper, we have designed a novel notion of combined patterns to extract useful and actionable knowledge from a large amount of learned rules. We also prese ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
(Show Context)
Abstract. Association mining often produces large collections of association rules that are difficult to understand and put into action. In this paper, we have designed a novel notion of combined patterns to extract useful and actionable knowledge from a large amount of learned rules. We also present definitions of combined patterns, design novel metrics to measure their interestingness and analyze the redundancy in combined patterns. Experimental results on reallife social security data demonstrate the effectiveness and potential of the proposed approach in extracting actionable knowledge from complex data. 1
Measuring the interestingness of discovered knowledge: A principled approach
 Intell. Data Anal
"... ..."
(Show Context)
Ranking the Interestingness of Summaries from Data Mining Systems
 In Proceedings of the 12th Annual Florida Artificial Intelligence Research Symposium (FLAIRS'99
, 1999
"... We study data mining where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the MultiAttribute Generalization algorithm for domain generali ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
We study data mining where the task is description by summarization, the representation language is generalized relations, the evaluation criteria are based on heuristic measures of interestingness, and the method for searching is the MultiAttribute Generalization algorithm for domain generalization graphs. We present and empirically compare four heuristics for ranking the interestingness of generalized relations (or summaries). The measures are based on common measures of the diversity of a population, statistical variance, the Simpson index, and the Shannon index. All four measures rank less complex summaries (i.e., those with few tuples and/or nonANY attributes) as most interesting. Highly ranked summaries provide a reasonable starting point for further analysis of discovered knowledge.
Geng L.: A Unified framework for Utility based Measures for Mining Itemsets
 Second International Workshop on UtilityBased Data Mining
, 2006
"... A pattern is of utility to a person if its use by that person contributes to reaching a goal. Utility based measures use the utilities of the patterns to reflect the user’s goals. In this paper, we first review utility based measures for itemset mining. Then, we present a unified framework for incor ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
A pattern is of utility to a person if its use by that person contributes to reaching a goal. Utility based measures use the utilities of the patterns to reflect the user’s goals. In this paper, we first review utility based measures for itemset mining. Then, we present a unified framework for incorporating several utility based measures into the data mining process by defining a unified utility function. Next, within this framework, we summary the mathematical properties of utility based measures that will allow the time and space costs of the itemset mining algorithm to be reduced.
Direct Discovery of High Utility Itemsets without Candidate Generation
"... Abstract—Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interestingness measures that reflect both the statistical significance and the user’s expectation. Among utility mining problems, utility mining with the itemset share framework is a hard on ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
(Show Context)
Abstract—Utility mining emerged recently to address the limitation of frequent itemset mining by introducing interestingness measures that reflect both the statistical significance and the user’s expectation. Among utility mining problems, utility mining with the itemset share framework is a hard one as no antimonotone property holds with the interestingness measure. The stateoftheart works on this problem all employ a twophase, candidate generation approach, which suffers from the scalability issue due to the huge number of candidates. This paper proposes a high utility itemset growth approach that works in a single phase without generating candidates. Our basic approach is to enumerate itemsets by prefix extensions, to prune search space by utility upper bounding, and to maintain original utility information in the mining process by a novel data structure. Such a data structure enables us to compute a tight bound for powerful pruning and to directly identify high utility itemsets in an efficient and scalable way. We further enhance the efficiency significantly by introducing recursive irrelevant item filtering with sparse data, and a lookahead strategy with dense data. Extensive experiments on sparse and dense, synthetic and real data suggest that our algorithm outperforms the stateoftheart algorithms over one order of magnitude. KeywordsUtility mining; high utility itemsets; frequent itemsets; pattern mining I.
Predicting Itemset Sales Profiles with Share Measures and RepeatBuying Theory
 in Proc. 4th Intl. Conf. on Intelligent Data Engineering and automated learning, Hong Kong
, 2003
"... Given a random sample of sales transaction records (i.e., scanner panels) for a particular period (such as a week, month, quarter, etc.), we analyze the scanner panels to determine approximations for the penetration and purchase frequency distribution of frequently purchased items and itemsets. ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
(Show Context)
Given a random sample of sales transaction records (i.e., scanner panels) for a particular period (such as a week, month, quarter, etc.), we analyze the scanner panels to determine approximations for the penetration and purchase frequency distribution of frequently purchased items and itemsets. If the purchase frequency distribution for an item or itemset in the current period can be modeled by the negative binomial distribution, then the parameters of the model are used to predict sales profiles for the next period. We present representative experimental results based upon synthetic data.