| M. A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. International Conference on Data Engineering, pages 25--33, 1995. |
....Data Skewness, Workload Balance, Parallel Mining, Partitioning. This research is supported in part by RGC (the Hong Kong Research Grants Council) grant, project number HKU 7023 98E. 1 Introduction Mining association rules in large databases is an important problem in data mining research [1, 2, 4, 6, 11, 12, 15, 17, 19, 20, 24]. It can be reduced to finding large itemsets with respect to a given support threshold [1, 2] The problem demands a lot of CPU resources and disk I O to solve. It needs to scan all the transactions in a database which introduces much I O, and at the same time search through a large set of ....
M. A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. In Proc. of the 11th Int. Conf. on Data Eng., Taipei, 1995.
....as Oracle DBMS or the Sun Solaris Operating system. More focus is required for techniques that can be applied to generic system performance analysis. To address these problems, we propose the Analyzer for Data Mining Results (ADMIRe) technique, which uses results from data mining ( 2] 5] 7] [6] [8] 12] 16] 19] operations on system performance data. We argue that data mining techniques are very useful for system performance analysis due to the completeness of the results. This systematic approach allows us to thoroughly examine the relationships among large number of parameters of ....
Maurice Houtsma, Anm Swami, "Set-Oriented Mining for Association Rules in Relational Databases", 1 lth International Conference on Data Engineering.
....than the related association rules. We use the terms rules and large item sets interchangably throughout the paper. The reader is referred to [4] for the method of generation of association rules from large item sets. A number of algorithms have been developed for discovering large item sets [5,6,7,8]. The Apriori algorithm [5] is the state of the art in this area that has smaller computa A transaction is a set of items. tional complexity compared to other algorithms. The high level structure of Ariori is given in Figure 1. 1. L 1 =f large 1 item set g; 2. for (k=2; L k Gamma1 6= k ....
M. A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. In Proceedings of the 11th International Conference on Data Engineering, pages 25-33, Taipei, Taiwan, 1995.
....retained. Faster algorithms for mining association rules were proposed in [3] while a hash based algorithm was established in [17] Generalized association rules were presented in [21] Methods for mining quantitative association rules were established in [22] Other related work may be found in [9, 11, 19]. An up to date survey on some of the work done in data mining may be found in [6] In this paper we consider the problem of online mining of association rules. The idea in online mining is that an end user ought to be able to query the database for association rules at differing values of ....
Houtsma M., and Swami A. Set-oriented Mining for Association Rules in Relational Databases. Proceedings of the 11th International Conference on Data Engineering. March 1995, pages 25-33.
....for the OS kernel. The databases are stored on a non local 2GB disk and there is exactly one network port for the machine. As a result disk accesses are inherently sequential. We used di erent synthetic databases that have been used as benchmark databases for many association rules algorithms [1, 2, 10, 23, 27, 28, 33, 36, 47]. The dataset generation procedure is described in [2] and the code is publicly available from IBM (http: www.almaden.ibm.com cs quest syndata.html) These datasets mimic the transactions in a retailing environment, where people tend to buy sets of items together, the so called potential ....
....It eliminates false sharing and synchronization completely, but it retains good locality. It thus performs the best for the large data sets. 7. Related Work 7.1. Association Mining 7.1.1. Sequential Algorithms Several algorithms for mining associations have been proposed in the literature [1, 2, 10, 23, 27, 28, 30, 32, 33, 36, 40]. The Apriori algorithm [2] is the best known previous algorithm, and it uses an ecient candidate generation procedure, such that only the frequent itemsets at a level are used to construct candidates at the next level. However, it requires multiple database scans. The DHP algorithm [33] tries to ....
[Article contains additional citation context not shown here]
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995.
....from past transactions to improve the quality of decisions taken for the future. Serving for this kind of purposes, data mining emerges as a new research area in databases. In the area of data mining, the problem of discovering association rules has recently received a great deal of attention [1, 2, 4, 6, 10]. In this problem, we are given a set of items and a large collection of transactions which are subsets of these items. The task is to nd relationships between the presence of various items within those transactions. Mining association rules is one of the fundamental problems in data mining and ....
M. Houtsma and A. Swami, Set-Oriented Mining for Association Rules in Relational Databases, Proceedings of the 11th International Conference on Data Engineering, 25-33, 1995.
....tree is incremented. AprioriTID [2] is an extension of the basic Apriori approach. Instead of relying on the raw database AprioriTID internally represents each transaction by the current candidates it contains. With AprioriHybrid both approaches are combined, c.f. 2] To some extent also SETM [13] is an Apriori(TID) like algorithm which is intended to be implemented directly in SQL. DIC is a further variation of the Apriori Algorithm [7] DIC softens the strict separation between counting and generating candidates. Whenever a candidate reaches minsupp, that is even when this candidate has ....
M. Houtsma and A. Swami. Set-oriented mining for association rules in relational databases. Technical Report RJ 9567, IBM Almaden Research Center, San Jose, California, Oktober 1993.
....rules from large databases. Index terms. Data mining, knowledge discovery in databases, association rules, multiple level association rules, algorithms, performance. 1 Introduction Mining of association rules from large data sets has been a focused topic in recent data mining research [1, 3, 4, 2, 9, 8, 10, 11, 14, 22, 15, 16, 18, 17, 19, 20, 21, 24]. Many applications at mining associations requires that mining be performed at multiple levels of abstraction. For example, besides finding 80 of customers that purchase milk may also purchase bread, it is interesting to allow users to drill down and show that 75 of people buy wheat bread if ....
M. Houtsma and A. Swami. Set-oriented mining for association rules in relational databases. In Proc. 1995 Int. Conf. Data Engineering, pages 25--34, Taipei, Taiwan, March 1995.
....cannot possibly be frequent. These decisions can be fairly costly; moreover, they have to be made repeatedly for many subsets for each transaction. If an unlikely candidate set is rejected, this decision has to be made for every transaction the set appears in. 2.1. 2 SETM (SET ORIENTED MINING) [5] This algorithm was designed to use only standard database operations to find frequent sets. For this reason, it uses its own data representation to store every itemset supported by a transaction along with the transaction s ID (TID) SETM repeatedly modifies the entire database to perform ....
M. Houtsma, Arun Swami, Set-Oriented Mining for Association Rules in Relational Databases. IEEE 95, pp. 25-33.
....and development in data mining evolves in several directions, such as association rules, time series, and classification. The direction of association rules is focussed on the development of algorithms to find frequently occurring patterns in a database, see among others [AgIS 93, AgSr 94a, HoSw 95, MaTV 94, SrAg 96] In time series databases, one tries to find all common patterns embedded in a database of sequences of events [AgSr 94b] The classification of tuples in a number of groups on the basis of common characteristics and the derivation of rules from a group is another direction in ....
....the data mining system has as task to come up with strategies that limit the number of queries passed to the database system, and the database system has as task to process received queries. The advantages of such an architecture in the field of association rules have been discussed in [HoSw 95] and in the field of classification in [WWVS 96] However, although search strategies attempt to minimize the number of queries passed to a database system in order to extract knowledge, they still generate a large number 2. Preliminaries assumptions 3 of queries. Consequently, the architecture ....
[Article contains additional citation context not shown here]
Houtsma, M., Swami, A., Set-Oriented Mining for Association Rules in Relational Databases, in Proc. 11th Int. Conf. on Data Engineering, 1995 pp. 25-33.
....is new. The authors cited above each propose essentially to implement data mining algorithms using only certain stereotypical queries, variously called count by group primitives , SIPs or two way table queries , to access the data. The focus in each case is on classification algorithms. (Houstma and Swami, 1995)) have proposed a somewhat similar idea in the context of association rule mining. We will also focus on classification, and we will use the count by group terminology throughout. 1.1. An outline of the problem The performance of an algorithm depends on the architecture it runs on: since our ....
Houstma, M. and Swami, A. 1995. Set-Oriented Mining for Association Rules in Relational Databases. Proc. Int. Conf. on Data Engineering, pp. 25-33.
....and can be done efficiently in a reasonable time. However, the first subproblem is very tedious and computationally expensive for very large databases and this is the case for many real life applications. Many efficient algorithms have been proposed for finding the frequent patterns in a database [1, 2, 4, 5, 6, 9, 10, 11, 13, 15, 17]. Maintenance of association rules is an important problem. When new transactions are added to the set of old transaction database, how can we update the association rules discovered in the set of old transactions efficiently Naturally, when new transactions are added to a database, some of the ....
....a larger number of candidates and larger number of itemsets. Association rule algorithms generally differ on a) the generation of the candidates, b) counting of the support of a candidate itemset c) number of scans over the database, and d) the data structures employed. Readers are referred to [1, 2, 4, 5, 6, 9, 10, 11, 13, 15, 17] for some algorithms for discovering large itemsets. 2.2 Update of Association Rules Table 1 summarizes the notations used in the remainder of the paper. Updating association rules was first introduced in [7] Given DB; db; jDBj; jdbj; minsup and LDB , the problem of updating association rules is ....
Maurice Houtsma and Arun Swami. Set-oriented mining of association rules in relational databases. In Proceedings of ICDE'95, pages 25--33, 1995.
....if they have k Gamma 2 items in common while Apriori generates it if k Gamma 2 items of two large (k Gamma 1) itemsets are same. The candidates are counted after generating the candidates and by scanning the database. CHAPTER 2. A SURVEY IN ASSOCIATION RULES 20 Set Oriented Mining (SETM) HS95] uses SQL commands to mine association rules. The number of scans over the database is equal to the length of maximal itemset. The candidate set C k is generated by the natural join of L k Gamma1 with L 1 in the attribute T ID, and it is implemented by a merge sort join. The candidates are ....
Maurice Houtsma and Arun Swami. Set-oriented mining of association rules in relational databases. In Proceedings of 11 th Intl. Conf. on Data Engineering (ICDE'95), Taipei, Taiwan, March 1995.
....Index TermsData mining, knowledge discovery in databases, association rules, multiple level association rules, algorithms, performance. 1INTRODUCTION MINING of association rules from large data sets has been a focused topic in recent data mining research [1] 3] 4] 2] 9] 8] 10] 11] [14], 22] 15] 16] 18] 17] 19] 20] 21] 24] Many applications at mining associations require that mining be performed at multiple levels of abstraction. For example, besides finding 80 percent of customers that purchase milk may also purchase bread, it is interesting to allow users to ....
M. Houtsma and A. Swami, Set-Oriented Mining for Association Rules in Relational Databases, Proc. 1995 Int'l Conf. Data Eng., pp. 2534, Taipei, Taiwan, Mar. 1995.
....of all frequent itemsets. This can be easily accomplished once the maximal elements have been identi ed, by making an additional database pass, and gathering the support of all uncounted subsets. 3 Related Work Several algorithms for mining associations have been proposed in the literature [1, 2, 6, 15, 19, 20, 21, 23, 26, 27]. The Apriori algorithm [2] is the best known previous algorithm, and it uses an ecient candidate generation procedure, such that only the frequent itemsets at a level are used to construct candidates at the next level. However, it requires multiple database scans, as many as the longest frequent ....
....The AS CPA algorithm and its sampling versions [20] build on top of Partition and produce a much smaller set of potentially frequent candidates. It requires at most two database scans. Approaches using only general purpose DBMS systems and relational algebra operations have also been studied [14, 15]. Detailed architectural alternatives in the tight integration of association mining with DBMS were presented in [25] They also pointed out the bene ts of using the vertical database layout. All the above algorithms generate all possible frequent itemsets. Methods for nding the maximal elements ....
[Article contains additional citation context not shown here]
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995.
No context found.
M. A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. International Conference on Data Engineering, pages 25--33, 1995.
No context found.
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995.
No context found.
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995.
No context found.
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995. 5 Database #Items Avg. Length #Transactions
No context found.
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. Data Engineering, 1995. 5 Database #Items Avg. Length #Transactions
No context found.
Houtsma, M., Swami, A.: Set-oriented Mining for Association Rules in Relational Databases. Proc. of 1995 Int. Conf. Data Engineering (1995)
No context found.
M. Houtsma and A. Swami. Set-oriented mining of association rules in relational databases. In 11th Intl. Conf. data Engineering, 1995.
No context found.
Maurice A. W. Houtsma and A. N. Swami. Set-oriented mining for association rules in relational databases. In 11th International Conference on Data Engineering (ICDE'95), pages 25 -- 33, 1995. 146
No context found.
Houtsma, M. A. W. and Swami, A. 1995. Set-oriented mining for association rules in relational databases. In 11th International Conference on Data Engineering (Taipei, Taiwan, March 6-10 1995).
No context found.
Houtsma, M., Swami, A., Set-Oriented Mining for Association Rules in Relational Databases, in Proc. 11th Int. Conf. on Data Engineering, 1995 pp. 25-33.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC