by Sunita Sarawagi, Shiby Thomas, Rakesh Agrawal
In SIGMOD
http://sage.chungbuk.ac.kr/Damine/Papers/Integration/Integ01.ps
Add To MetaCart
Abstract:
Data mining on large data warehouses is becoming increasingly important. In support of this trend, we consider a spectrum of architectural alternatives for coupling mining with database systems. These alternatives include: loosecoupling through a SQL cursor interface; encapsulation of a mining algorithm in a stored procedure; caching the data to a file system on-the-fly and mining; tight-coupling using primarily user-defined functions; and SQL implementations for processing in the DBMS. We comprehensively study the option of expressing the mining algorithm in the form of SQL queries using Association rule mining as a case in point. We consider four options in SQL-92 and six options in SQL enhanced with object-relational extensions (SQL-OR). Our evaluation of the different architectural alternatives shows that from a performance perspective, the Cache-Mine option is superior, although the performance of the SQL-OR option is within a factor of two. Both the Cache-Mine and the SQL-OR approaches incur a higher storage penalty than the loose-coupling approach which performance-wise is a factor of 3 to 4 worse than Cache-Mine. The SQL-92 implementations were too slow to qualify as a competitive option. We also compare these alternatives on the basis of qualitative factors like automatic parallelization, development ease, portability and inter-operability.
Citations
|
1449
|
Mining association rules between sets of items in large databases
– Agrawal, Imielinski, et al.
- 1993
|
|
358
|
Mining generalized association rules
– Srikant, Agrawal
- 1995
|
|
342
|
Dynamic itemset counting and implication rules for market basket data
– Brin, Motwani, et al.
- 1997
|
|
299
|
Mining sequential patterns: Generalizations and performance improvements
– Srikant, Agrawal
- 1996
|
|
272
|
Sampling Large Databases for Association Rules
– Toivonen
- 1996
|
|
212
|
Verkamo. Fast discovery of association rules
– Agrawal, Mannila, et al.
- 1996
|
|
210
|
Understanding the New SQL: A Complete Guide
– Melton, Simon
- 1992
|
|
194
|
New Algorithms for fast discovery of association rules
– Zaki, Ogihara, et al.
- 1997
|
|
173
|
A database perspective on knowledge discovery
– IMIELINSKI, MANNILA
- 1996
|
|
157
|
Parallel mining of association rules
– Agrawal, Shafer
- 1996
|
|
121
|
Psaila G., A New SQL-like Operator for Mining Association Rules
– Ceri, Meo
- 1996
|
|
85
|
DMQL: A Data Mining Query Language for Relational Databases
– Han, Fu, et al.
- 1996
|
|
63
|
Set-oriented mining of association rules
– Houtsma, Swami
- 1993
|
|
52
|
The Quest data mining system
– Agrawal, Mehta, et al.
- 1996
|
|
49
|
Developing tightly-coupled data mining applications on a relational database system
– Agrawal, Shim
- 1996
|
|
37
|
Using the new DB2: IBM's Object-relational database system
– Chamberlin
- 1996
|
|
13
|
Query Flocks: A Generalization of Association Rule Mining
– Tsur, Ullman, et al.
- 1998
|
|
9
|
Abdulghani A., Discovery board application programming interface and query lan- guage for database mining
– Imielinski, Virmani
- 1996
|
|
4
|
Using DB/2's object relational extensions for mining associations rules
– Rajamani, Iyer, et al.
- 1997
|
|
2
|
DB2 Universal Database Application programming guide Version 5
– Corporation
- 1997
|
|
2
|
Object oriented extensions in SQL3: a status report. Sigmod record
– Kulkarni
- 1994
|
|
2
|
Oracle RDBMS Database Administrator's Guide Volumes
– Oracle
- 1992
|
|
2
|
SQL table function open architecture and data access middleware
– Pirahesh, Reinwald
- 1998
|