Results 1 - 10
of
35
Beyond Market Baskets: Generalizing Association Rules To Dependence Rules
, 1998
"... One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market bask ..."
Abstract
-
Cited by 414 (5 self)
- Add to MetaCart
One of the more well-studied problems in data mining is the search for association rules in market basket data. Association rules are intended to identify patterns of the type: “A customer purchasing item A often also purchases item B. Motivated partly by the goal of generalizing beyond market basket data and partly by the goal of ironing out some problems in the definition of association rules, we develop the notion of dependence rules that identify statistical dependence in both the presence and absence of items in itemsets. We propose measuring significance of dependence via the chi-squared test for independence from classical statistics. This leads to a measure that is upward-closed in the itemset lattice, enabling us to reduce the mining problem to the search for a border between dependent and independent itemsets in the lattice. We develop pruning strategies based on the closure property and thereby devise an efficient algorithm for discovering dependence rules. We demonstrate our algorithm’s effectiveness by testing it on census data, text data (wherein we seek term dependence), and synthetic data.
Sampling Large Databases for Association Rules
, 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract
-
Cited by 330 (4 self)
- Add to MetaCart
Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and inthose rare cases where our sampling method does not produce all association rules, the missing rules can be found inasecond pass. Our experiments show that the proposed algorithms can nd association rules very e ciently in only onedatabase pass. 1
Data Mining: An Overview from Database Perspective
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract
-
Cited by 314 (23 self)
- Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
Mining Quantitative Association Rules in Large Relational Tables
, 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitio ..."
Abstract
-
Cited by 304 (2 self)
- Add to MetaCart
We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset. 1 Introduction Data mining, also known as knowledge discovery in databases, has been recognized as a new area for database research. The problem of discove...
Parallel Mining of Association Rules
- IEEE Transactions on Knowledge and Data Engineering
, 1996
"... We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem-specific information. The best algorithm exhibits near p ..."
Abstract
-
Cited by 203 (3 self)
- Add to MetaCart
We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem-specific information. The best algorithm exhibits near perfect scaleup behavior, yet requires only minimal overhead compared to the current best serial algorithm.
Finding Interesting Rules from Large Sets of Discovered Association Rules
, 1994
"... Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Efficient methods exist for discovering association rules from large collections of data. Th ..."
Abstract
-
Cited by 185 (9 self)
- Add to MetaCart
Association rules, introduced by Agrawal, Imielinski, and Swami, are rules of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set W , then it has 1 also in column B". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules can, however, be so large that browsing the rule set and finding interesting rules from it can be quite difficult for the user. We show how a simple formalism of rule templates makes it possible to easily describe the structure of interesting rules. We also give examples of visualization of rules, and show how a visualization tool interfaces with rule templates. 1 Introduction Data mining (knowledge discovery in databases) is a field of increasing interest combining databases, artificial intelligence, and machine learning. The purpose of data mining is to facilitate understanding large amounts of data by discovering interesting regularities or exceptions (see e...
Pruning and Grouping Discovered Association Rules
, 1995
"... Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set X, then it has 1 also in the columns in set Y ". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules c ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
Association rules are statements of the form "for 90 % of the rows of the relation, if the row has value 1 in the columns in set X, then it has 1 also in the columns in set Y ". Efficient methods exist for discovering association rules from large collections of data. The number of discovered rules can, however, be so large that the rules cannot be presented to the user. We show how the set of rules can be pruned by forming rule covers. A rule cover is a subset of the original set of rules such that for each row in the relation there is an applicable rule in the cover if and only if there is an applicable rule in the original set. We also discuss grouping of association rules by clustering, and present some experimental results of both pruning and grouping. Keywords: data mining, association rules, covers, clustering. 1 Introduction Association rules are an interesting class of database regularities, introduced by Agrawal, Imielinski, and Swami [AIS93]. An association rule is an expres...
Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules
- IEEE Transactions on Knowledge and Data Engineering
, 1997
"... In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply ..."
Abstract
-
Cited by 58 (8 self)
- Add to MetaCart
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a sufficient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets first and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usual...
Knowledge Discovery from Telecommunication Network Alarm Databases
, 1996
"... A telecommunication network produces daily large amounts of alarm data. The data contains hidden valuable knowledge about the behavior of the network. This knowledge can be used in filtering redundant alarms, locating problems in the network, and possibly in predicting severe faults. We describe the ..."
Abstract
-
Cited by 46 (8 self)
- Add to MetaCart
A telecommunication network produces daily large amounts of alarm data. The data contains hidden valuable knowledge about the behavior of the network. This knowledge can be used in filtering redundant alarms, locating problems in the network, and possibly in predicting severe faults. We describe the TASA (Telecommunication Network Alarm Sequence Analyzer) system for discovering and browsing knowledge from large alarm databases. The system is built on the basis of viewing knowledge discovery as an interactive and iterative process, containing data collection, pattern discovery, rule postprocessing, etc. The system uses a novel framework for locating frequently occurring episodes from sequential data. The TASA system offers a variety of selection and ordering criteria for episodes, and supports iterative retrieval from the discovered knowledge. This means that a large part of the iterative nature of the KDD process can be replaced by iteration in the rule postprocessing stage. The user i...
Treatment of Missing Values for Association Rules
, 1998
"... Agrawal et al. [2] have proposed a fast algorithm to explore very large databases with association rules [1]. In many real world applications missing values are often inevitable and we will show, in this case, that association rules give bad results. We propose an approach to increase their resi ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Agrawal et al. [2] have proposed a fast algorithm to explore very large databases with association rules [1]. In many real world applications missing values are often inevitable and we will show, in this case, that association rules give bad results. We propose an approach to increase their resistance against missing values. The main idea is to cut a database in several valid databases (vdb) for a rule, a vdb must not have any missing values. We redefine support and confidence of rules for vdb. These definitions are fully compatible with [2] which is the core of all known algorithms. Simulations show that this approach outperforms the classic approach by factors five to ten.

