• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Y.Fu. Discovery of multiple-level association rules from large databases (1995)

by J Han
Venue:Proc. Of the 21st VLDB Conference
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 463
Next 10 →

Mining Sequential Patterns: Generalizations and Performance Improvements

by Ramakrishnan Srikant, Rakesh Agrawal - RESEARCH REPORT RJ 9994, IBM ALMADEN RESEARCH , 1995
"... The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified ..."
Abstract - Cited by 759 (5 self) - Add to MetaCart
The problem of mining sequential patterns was recently introduced in [3]. We are given a database of sequences, where each sequence is a list of transactions ordered by transaction-time, and each transaction is a set of items. The problem is to discover all sequential patterns with a user-specified minimum support, where the support of a pattern is the number of data-sequences that contain the pattern. An example of a sequential pattern is "5 % of customers bought `Foundation' and `Ringworld' in one transaction, followed by `Second Foundation ' in a later transaction". We generalize the problem as follows. First, we add time constraints that specify a minimum and/or maximum time period between adjacent elements in a pattern. Second, we relax the restriction that the items in an element of a sequential pattern must come from the same transaction, instead allowing the items to be present in a set of transactions whose transaction-times are within a user-specified time window. Third, given a user-defined taxonomy (is-a hierarchy) on items, we allow sequential patterns to include items across all levels of the taxonomy. We present GSP, a new algorithm that discovers these generalized sequential patterns. Empirical evaluation using synthetic and real-life data indicates that GSP is much faster than the AprioriAll algorithm presented in [3]. GSP scales linearly with the number of data-sequences, and has very good scale-up properties with respect to the average data-sequence size.

Data Mining: An Overview from Database Perspective

by Ming-syan Chen, Jiawei Hun, Philip S. Yu - IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 1996
"... Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have sh ..."
Abstract - Cited by 532 (26 self) - Add to MetaCart
Mining information and knowledge from large databases has been recognized by many researchers as a key research topic in database systems and machine learning, and by many industrial companies as an important area with an opportunity of major revenues. Researchers in many different fields have shown great interest in data mining. Several emerging applications in information providing services, such as data warehousing and on-line services over the Internet, also call for various data mining techniques to better understand user behavior, to improve the service provided, and to increase the business opportunities. In response to such a demand, this article is to provide a survey, from a database researcher's point of view, on the data mining techniques developed recently. A classification of the available data mining techniques is provided and a comparative study of such techniques is presented.
(Show Context)

Citation Context

...e which can be discovered in a database are categorized as follows. Mining association rules in transactional or relational databases has recently attracted a lot of attention in database communities =-=[4, 7, 39, 57, 66, 73, 78]. The task-=- is to derive a set of strong association rules in the form of "A 1s\Delta \Delta \Delta Am =) B 1s\Delta \Delta \Delta B n ," where A i (for i 2 f1; : : : ; mg) and B j (for j 2 f1; : : : ;...

Sampling Large Databases for Association Rules

by Hannu Toivonen , 1996
"... Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the data ..."
Abstract - Cited by 470 (3 self) - Add to MetaCart
Discovery of association rules is an important database mining problem. Current algorithms for nding association rules require several passes over the analyzed database, and obviously the role of I/O overhead is very signi cant for very large databases. We present new algorithms that reduce the database activity considerably. Theidea is to pick a random sample, to ndusingthis sample all association rules that probably hold in the whole database, and then to verify the results with the restofthe database. The algorithms thus produce exact association rules, not approximations based on a sample. The approach is, however, probabilistic, and inthose rare cases where our sampling method does not produce all association rules, the missing rules can be found inasecond pass. Our experiments show that the proposed algorithms can nd association rules very e ciently in only onedatabase pass. 1

Mining Quantitative Association Rules in Large Relational Tables

by Ramakrishnan Srikant, Rakesh Agrawal , 1996
"... We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by fi ..."
Abstract - Cited by 444 (3 self) - Add to MetaCart
We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be "10% of married people between age 50 and 60 have at least 2 cars". We deal with quantitative attributes by finepartitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a "greater-than-expected-value" interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset. 1 Introduction Data mining, also known as knowledge discovery in databases, has been recognized as a new area for database research. The problem of discove...
(Show Context)

Citation Context

...cal attribute values unless unless a taxonomy (is-a hierarchy) is present on the attribute. In this case, the taxonomy can be used to implicitly combine values of a categorical attribute (see [SA95], =-=[HF95]-=-). Using a taxonomy in this manner is somewhat similar to considering ranges over quantitative attributes. 2 RecID Age: 20..29 Age: 30..39 Married: Yes Married: No NumCars: 0 NumCars: 1 NumCars: 2 100...

New Algorithms for Fast Discovery of Association Rules

by Mohammed Javeed Zaki, Srinivasan Parthasarathy, Mitsunori Ogihara, Wei Li - In 3rd Intl. Conf. on Knowledge Discovery and Data Mining , 1997
"... Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery ..."
Abstract - Cited by 397 (26 self) - Add to MetaCart
Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. In this paper we present efficient algorithms for the discovery of frequent itemsets, which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The related database items are grouped together into clusters representing the potential maximal frequent itemsets in the database. Each cluster induces a sub-lattice of the itemset lattice. Efficient lattice traversal techniques are presented, which quickly identify all the true maximal frequent itemsets, and all their subsets if desired. We also present the effect of using different database layout schemes combined with the proposed clustering and traversal techniques. The proposed algorithms scan a (pre-processed) d...
(Show Context)

Citation Context

...transaction or it isn't. Other extensions of association rules include mining over data where the quantity of items is also considered [28], or mining for rules in the presence of a taxonomy on items =-=[6, 15]-=-. There has also been work insnding frequent sequences of itemsets over temporal data [6, 22, 29, 20]. 1.1 Contribution The main limitation of almost all proposed algorithms [2, 5, 21, 23, 3] is that ...

Discovery of frequent episodes in event sequences

by Heikki Mannila, Hannu Toivonen - Data Min. Knowl. Discov , 1997
"... Abstract. Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in ..."
Abstract - Cited by 362 (13 self) - Add to MetaCart
Abstract. Sequences of events describing the behavior and actions of users or systems can be collected in several domains. An episode is a collection of events that occur relatively close to each other in a given partial order. We consider the problem of discovering frequently occurring episodes in a sequence. Once such episodes are known, one can produce rules for describing or predicting the behavior of the sequence. We give efficient algorithms for the discovery of all frequent episodes from a given class of episodes, and present detailed experimental results. The methods are in use in telecommunication alarm management. Keywords: event sequences, frequent episodes, sequence analysis 1.
(Show Context)

Citation Context

... are similar to serial episodes with some extra restrictions and an event taxonomy. Our methods can be extended with a taxonomy by a direct application of the similar extensions to association rules (=-=Han and Fu, 1995-=-; Holsheimer et al.,1995; Srikant and Agrawal, 1995). Also, our methods can be applied on analyzing several sequences; there is actually a variety of choices for the definition of frequency (or suppor...

Parallel Mining of Association Rules

by Rakesh Agrawal, John C. Shafer - IEEE Transactions on Knowledge and Data Engineering , 1996
"... We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem-specific information. The best algorithm exhibits near p ..."
Abstract - Cited by 325 (3 self) - Add to MetaCart
We consider the problem of mining association rules on a shared-nothing multiprocessor. We present three algorithms that explore a spectrum of trade-offs between computation, communication, memory usage, synchronization, and the use of problem-specific information. The best algorithm exhibits near perfect scaleup behavior, yet requires only minimal overhead compared to the current best serial algorithm.
(Show Context)

Citation Context

...decision-support applications. Discovering association rules is an important data mining problem [1]. Recently, there has been considerable research in designing fast algorithms for this task [1] [3] =-=[5]-=- [6] [8] [12] [9] [11]. However, with the exception of [10], the work so far has been concentrated on designing serial algorithms. Since the databases to be mined are often very large (measured in gig...

Exploratory Mining and Pruning Optimizations of Constrained Associations Rules

by Raymond T. Ng, Laks V.S. Lakshmanan, Alex Pang, Jiawei Hah , 1998
"... From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model f ..."
Abstract - Cited by 313 (44 self) - Add to MetaCart
From the standpoint of supporting human-centered discovery of knowledge, the present-day model of mining association rules suffers from the following serious shortcom- ings: (i) lack of user exploration and control, (ii) lack of focus, and (iii) rigid notion of relationships. In effect, this model functions as a black-box, admitting little user interaction in between. We propose, in this paper, an architecture that opens up the black-box, and supports constraintbased, human-centered exploratory mining of associations. The foundation of this architecture is a rich set of con- straint constructs, including domain, class, and $QL-style aggregate constraints, which enable users to clearly specify what associations are to be mined. We propose constrained association queries as a means of specifying the constraints to be satisfied by the antecedent and consequent of a mined association.
(Show Context)

Citation Context

...ed on the levelwise Apriori framework [3, 13], partitioning [19, 18], and sampling [24]; (ii) incremental updating and parallel algorithms [6, 2, 8]; (iii) mining of generalized and multi-level rules =-=[21, 9]-=-; (iv) mining of quantitative rules [22, 16]; (v) mining of multidimensional rules [7, 14, 12]; (vi) mining rules with item constraints [23]; and (vii) association-rule based query languages [15, 4]. ...

Mining Association Rules with Item Constraints

by Ramakrishnan Srikant, Quoc Vu, Rakesh Agrawal
"... The problem of discovering association rules has received considerable research attention and several fast algorithms for mining association rules have been developed. In practice, users are often interested in a subset of association rules. For example, they may only want rules that contain a speci ..."
Abstract - Cited by 289 (0 self) - Add to MetaCart
The problem of discovering association rules has received considerable research attention and several fast algorithms for mining association rules have been developed. In practice, users are often interested in a subset of association rules. For example, they may only want rules that contain a specific item or rules that contain children of a specific item in a hierarchy. While such constraints can be applied as a postprocessing step, integrating them into the mining algorithm can dramatically reduce the execution time. We consider the problem of integrating constraints that are boolean expressions over the presence or absence of items into the association discovery algorithm. We present three integrated algorithms for mining association rules with item constraints and discuss their tradeoffs. 1. Introduction The problem of discovering association rules was introduced in (Agrawal, Imielinski, & Swami 1993). Given a set of transactions, where each transaction is a set of literals (call...
(Show Context)

Citation Context

...that people bought Jackets with Hiking Boots and Ski Pants with Hiking Boots. This generalization of association rules and algorithms for finding such rules are described in (Srikant & Agrawal 1995) (=-=Han & Fu 1995-=-). In practice, users are often interested only in a subset of associations, for instance, those containing at least one item from a user-defined subset of items. When taxonomies are present, this set...

Levelwise Search and Borders of Theories in Knowledge Discovery

by Heikki Mannila, Hannu Toivonen , 1997
"... One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm fo ..."
Abstract - Cited by 263 (15 self) - Add to MetaCart
One of the basic problems in knowledge discovery in databases (KDD) is the following: given a data set r, a class L of sentences for defining subgroups of r, and a selection predicate, find all sentences of L deemed interesting by the selection predicate. We analyze the simple levelwise algorithm for finding all such descriptions. We give bounds for the number of database accesses that the algorithm makes. For this, we introduce the concept of the border of a theory, a notion that turns out to be surprisingly powerful in analyzing the algorithm. We also consider the verification problem of a KDD process: given r and a set of sentences S ` L, determine whether S is exactly the set of interesting statements about r. We show strong connections between the verification problem and the hypergraph transversal problem. The verification problem arises in a natural way when using sampling to speed up the pattern discovery step in KDD.
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University