Results 11 - 20
of
62
A Foundational Approach to Mining Itemset Utilities from Databases
- Proceedings of the Third SIAM International Conference on Data Mining
, 2004
"... Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theore ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
Most approaches to mining association rules implicitly consider the utilities of the itemsets to be equal. We assume that the utilities of itemsets may differ, and identify the high utility itemsets based on information in the transaction database and external information about utilities. Our theoretical analysis of the resulting problem lays the foundation for future utility mining algorithms. 1
Spatial Subgroup Mining Integrated in an Object-Relational Spatial Database
- Principles of Data Mining and Knowledge Discovery (PKDD), T. Elomaa, H. Mannila and H. Toivonen, eds, 6th European Conference, LNAI 2431
, 2002
"... Abstract. SubgroupMiner is an advanced subgroup mining system supporting multirelational hypotheses, efficient data base integration, discovery of causal subgroup structures, and visualization based interaction options. When searching for dependencies between subgroups and a target group, spatial su ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Abstract. SubgroupMiner is an advanced subgroup mining system supporting multirelational hypotheses, efficient data base integration, discovery of causal subgroup structures, and visualization based interaction options. When searching for dependencies between subgroups and a target group, spatial subgroups with multirelational descriptions are explored. Search strategies of data mining algorithms are efficiently integrated with queries in an object-relational query language and executed in a database to enable scalability for spatial data. 1
Generalization-Based Data Mining in Object-Oriented Databases Using an Object Cube Model
- Data and Knowledge Engineering
, 1998
"... Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases ..."
Abstract
-
Cited by 13 (1 self)
- Add to MetaCart
Data mining is the discovery of knowledge and useful information from the large amounts of data stored in databases. With the increasing popularity of object-oriented database systems in advanced database applications, it is important to study the data mining methods for object-oriented databases because mining knowledge from such databases may improve understanding, organization, and utilization of the data stored there.
Maximally informative k-itemsets and their efficient discovery
- in Proceedings of KDD 2006, 2006
, 2006
"... In this paper we present a new approach to mining binary data. We treat each binary feature (item) as a means of distinguishing two sets of examples. Our interest is in selecting from the total set of items an itemset of specified size, such that the database is partitioned with as uniform a distrib ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
In this paper we present a new approach to mining binary data. We treat each binary feature (item) as a means of distinguishing two sets of examples. Our interest is in selecting from the total set of items an itemset of specified size, such that the database is partitioned with as uniform a distribution over the parts as possible. To achieve this goal, we propose the use of joint entropy as a quality measure for itemsets, and refer to optimal itemsets of cardinality k as maximally informative k-itemsets. We claim that this approach maximises distinctive power, as well as minimises redundancy within the feature set. A number of algorithms is presented for computing optimal itemsets efficiently.
The Role of Feature Construction in Inductive Rule Learning
"... This paper proposes a unifying framework for inductive rule learning algorithms. We suggest that the problem of constructing an appropriate inductive hypothesis (set of rules) can be broken down in the following subtasks: rule construction, body construction, and feature construction. Each of the ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper proposes a unifying framework for inductive rule learning algorithms. We suggest that the problem of constructing an appropriate inductive hypothesis (set of rules) can be broken down in the following subtasks: rule construction, body construction, and feature construction. Each of these subtasks may have its own declarative bias, search strategies, and heuristics. In particular, we argue that feature construction is a crucial notion in explaining the relations between attribute-value rule learning and inductive logic programming (ILP). We demonstrate this by a general method for transforming ILP problems to attributevalue form, which overcomes some of the traditional limitations of propositionalisation approaches.
Information-theoretic measures for knowledge discovery and data mining, in: Entropy Measures, Maximum Entropy and Emerging Applications, Karmeshu (Ed
- in Entropy Measures, Maximum Entropy and Emerging Applications
, 2003
"... Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller popu ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Abstract. A database may be considered as a statistical population, and an attribute as a statistical variable taking values from its domain. One can carry out statistical and information-theoretic analysis on a database. Based on the attribute values, a database can be partitioned into smaller populations. An attribute is deemed important if it partitions the database such that previously unknown regularities and patterns are observable. Many information-theoretic measures have been proposed and applied to quantify the importance of attributes and relationships between attributes in various fields. In the context of knowledge discovery and data mining (KDD), we present a critical review and analysis of informationtheoretic measures of attribute importance and attribute association, with emphasis on their interpretations and connections. 1
Supervised descriptive rule discovery: A unifying survey of contrast set, emerging pattern and subgroup mining
- Journal of Machine Learning Research
"... This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods.
Sampling-Based Sequential Subgroup Mining
, 2005
"... Subgroup discovery is a learning task that aims at finding interesting rules from classified examples. The search is guided by a utility function, trading o# the coverage of rules against their statistical unusualness. One shortcoming of existing approaches is that they do not incorporate prior know ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Subgroup discovery is a learning task that aims at finding interesting rules from classified examples. The search is guided by a utility function, trading o# the coverage of rules against their statistical unusualness. One shortcoming of existing approaches is that they do not incorporate prior knowledge. To this end a novel generic sampling strategy is proposed. It allows to turn pattern mining into an iterative process. In each iteration the focus of subgroup discovery lies on those patterns that are unexpected with respect to prior knowledge and previously discovered patterns. The result of this technique is a small diverse set of understandable rules that characterise a specified property of interest. As another contribution this article derives a simple connection between subgroup discovery and classifier induction. For a popular utility function this connection allows to apply any standard rule induction algorithm to the task of subgroup discovery after a step of stratified resampling. The proposed techniques are empirically compared to state of the art subgroup discovery algorithms.
A Sequential Sampling Algorithm for a General Class of Utility Criteria
- In Proceedings of the International Conference on Knowledge Discovery and Data Mining
, 2000
"... Many discovery problems, e.g., subgroup or association rule discovery, can naturally be cast as n-best hypothesis problems where the goal is to nd the n hypotheses from a given hypothesis space that score best according to a given utility function. We present a sampling algorithm that solves this pr ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
Many discovery problems, e.g., subgroup or association rule discovery, can naturally be cast as n-best hypothesis problems where the goal is to nd the n hypotheses from a given hypothesis space that score best according to a given utility function. We present a sampling algorithm that solves this problem by issuing a small number of database queries while guaranteeing precise bounds on condence and quality of solutions. Known sampling algorithms assume that the utility be the average (over the examples) of some function, which is not the case for many frequently used utility functions. We show that our algorithm works for all utilities that can be estimated with bounded error. We provide such error bounds and resulting worst-case sample bounds for some of the most frequently used utilities, and prove that there is no sampling algorithm for another popular class of utility functions. The algorithm is sequential in the sense that it starts to return (or discard) hypotheses that already...
On Information-Theoretic Measures of Attribute Importance
- Proceedings of the Third Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'99
, 1999
"... An attribute is deemed important in data mining if it partitions the database such that previously unknown regularities are observable. Many information-theoretic measures have been applied to quantify the importance of an attribute. In this paper, we summarize and critically analyze these measu ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
An attribute is deemed important in data mining if it partitions the database such that previously unknown regularities are observable. Many information-theoretic measures have been applied to quantify the importance of an attribute. In this paper, we summarize and critically analyze these measures. 1

