(Enter summary)
Abstract: The current generation of data mining tools have limited capacity and performance, since these tools tend to be sequential. This paper explores a migration path out of this bottleneck by considering an integrated hardware and software approach to parallelize data mining. Our analysis shows that parallel data mining solutions require the following components: parallel data mining algorithms, parallel and distributed data bases, parallel file systems, parallel I/O, tertiary storage, management of ... (Update)
Context of citations to this paper: More
...results that we show, sustain the validity of a structured approach for DM application. It is also well recognized in the literature [9, 10] the need for a tighter integration of high performance DM systems with the support for management, ltering and selection of data. We...
.... definite need for tools and techniques to be able to rapidly develop parallel and out of core versions of various data mining techniques [28]. In this paper, we present the design and initial performance evaluation of a middleware for enabling rapid development of parallel data...
Cited by: More
Processing Frequent Itemset Discovery Queries by - Division And Set
(Correct)
Compiler and Middleware Support for Scalable Data Mining - Agrawal, Jin, Li
(Correct)
Efficient Parallel Frequency Mining Based On A Novel Top-Down.. - Özkural (2002)
(Correct)
Active bibliography (related documents): More All
0.6: Glossary on Parallel Input/Output - Stockinger
(Correct)
0.5: User Interactivity in Very Large Scale Data Mining - Wrobel, Wettschereck..
(Correct)
0.5: Parallel Sequence Mining on Shared-Memory Machines - Mohammed Zaki Computer (2000)
(Correct)
Similar documents based on text: More All
1.4: The PKDD Discovery Challenges on Thrombosis Data - Berka (2001)
(Correct)
1.0: Indexing and Data Access Methods for Database Mining - Ramesh, Maniatty, Zaki (2001)
(Correct)
0.6: Feasible Itemset Distributions in Data Mining: - Application
(Correct)
Related documents from co-citation: More All
4: Data Mining: Concepts and Techniques (context) - Han, Kamber - 1998
4: Scalable parallel data mining for association rules
- Han, Karypis et al. - 1997
3: Parallel and distributed association mining: A survey
- Zaki - 1999
BibTeX entry: (Update)
W.A. Maniatty and M.J. Zaki, A Requirements Analysis for Parallel KDD Systems, IPDPS'2000 Data Mining Workshop, Cancun, Maxico, May 2000. http://citeseer.ist.psu.edu/295031.html More
@article{ maniatty00requirements,
author = "William A. Maniatty and Mohammed J. Zaki",
title = "A Requirements Analysis for Parallel {KDD} Systems",
journal = "Lecture Notes in Computer Science",
volume = "1800",
pages = "358+",
year = "2000",
url = "citeseer.ist.psu.edu/295031.html" }
Citations (may not include all citations):
298
Parallel database systems: The future of high-performance da..
- DeWitt, Gray - 1992
262
Multidimensional access methods
- Gaede, Gunther - 1998
156
reliable secondary storage (context) - Chen, High-performance - 1994
145
Sprint: A scalable parallel classifier for data mining
- Shafer, Agrawal et al. - 1996
115
Scalable parallel data mining for association rules
- Han, Karypis et al. - 1997
100
Database mining: A performance perspective
- Agrawal, Imielinski et al. - 1993
89
The GAMMA database machine project
- DeWitt - 1990
88
A database perspective on knowledge discovery
- Imielinski, Mannila - 1996
76
The galley parallel file system
- Nieuwejaar, Kotz - 1997
75
Active storage for large-scale data mining and multimedia
- Riedel, Gibson et al. - 1997
66
DMQL: A data mining query language for relational databases
- Han - 1996
58
A trace-driven comparison of algorithms for parallel prefetc..
- Kimbrel - 1996
56
Parallel mining of association rules
- Agrawal, Shafer - 1996
52
A fast distributed algorithm for mining association rules
- Cheung - 1996
49
On implementing mpi-io portably and with high performance
- Thakur, Gropp et al. - 1999
47
Parallel database systems: Open problems and new issues (context) - Valduriez - 1993
47
Principles of Distributed Database Systems (context) - Oszu, Valduriez - 1999
43
The case for intelligent disks (context) - Keeton, Patterson et al. - 1998
42
Mining very large databases with parallel processing (context) - Freitas, Lavington - 1998
41
PIOUS: a scalable parallel I/O system for distributed comput..
- Moyer, Sunderam - 1994
39
Parallel and distributed association mining: A survey
- Zaki - 1999
36
hint generation through speculative execution (context) - Chang, Gibson - 1999
34
Developing tightly-coupled data mining applications on a rel..
- Agrawal, Shim - 1996
32
A benchmark of non-stop SQL on the debit credit transaction (context) - Group - 1988
32
a highly parallel database system (context) - Boral, Bubba - 1990
28
Data surveyor: Searching the nuggets in parallel (context) - Holsheimer, Kersten et al. - 1996
25
DataMine: Application programming interface and query langua.. (context) - Imielinski, Virmani et al. - 1996
16
Integrating association rule mining with databases: alternat.. (context) - Sarawagi, Thomas et al. - 1998
15
Large-scale parallel data clustering
- Judd, McKinley et al. - 1996
14
Large-Scale Parallel Data Mining (context) - Zaki, Ho - 2000
13
Strategies for parallel data mining
- Skillicorn - 1999
11
NASD scalable storage systems (context) - Gibson - 1999
11
Parallel classification for data mining on shared-memory mul..
- Zaki, Ho et al. - 1999
11
MAFIA: Efficient and scalable subspace clustering for very l..
- Nagesh, Goil et al. - 1999
8
ParFiSys: A parallel file system for MPP (context) - Carretero - 1996
8
The integrated delivery of large-scale data mining: The ACSy.. (context) - Williams - 2000
8
Mining algorithms for sequential patterns in parallel: Hash ..
- Shintani, Kitsuregawa - 1998
7
Designing a kernel for data mining (context) - Anand - 1997
6
Multidimensional array I/O in Panda (context) - Seamons, Winslett - 1996
4
ViPIOS: The vienna parallel input/output system
- Schikuta, Fuerle et al. - 1998
4
Parallel sequence mining on SMP machines (context) - Zaki - 2000
4
Performance analysis of parallel systems: Approaches and ope..
- Reed - 1998
3
Parallel algorithms for fast discovery of association rules (context) - Zaki - 1997
3
Advanced Database Machine Architectures (context) - Hsiao - 1983
2
Advances in Distributed Data Mining (context) - Kargupta, Chan - 2000
2
Foundations of an inductive query language (context) - Siebes - 1995
2
Dictionary on parallel input/output (context) - Stockinger - 1998
1
ScalParC: A scalable and parallel classification algorithm f.. (context) - Initiative, www et al. - 1998
1
Parallel out-ofcore divide and conquer techniques with appli.. (context) - Sreenivas, Alsabti et al. - 1999
1
Adding inter-transaction parallelism to existing DBMS: Early.. (context) - Lorie - 1989
1
chapter Software Raid and Parallel File Systems (context) - Cortes, Cluster et al. - 1999
1
Large-scale file systems with the flexibility of databases (context) - Choudhary, Kotz - 1996
1
A clustering algorithm on distributed memory machines (context) - Dhillon, Modha - 2000
1
Includes pointers to his Parallel I/O Bibliography (context) - Kotz, archive - 1998
1
Intensive data management in parallel systems: A survey (context) - Khan - 1999
The graph only includes citing articles where the year of publication is known.
Documents on the same site (http://www.cs.rpi.edu/~zaki/papers.html): More
Parallel Classification for Data Mining on Shared-Memory.. - Zaki (1998)
(Correct)
Efficient Enumeration of Frequent Sequences - Zaki (1998)
(Correct)
PlanMine: Predicting Plan Failures using Sequence Mining - Zaki, Lesh, Ogihara (1999)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC