MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  WORK IN PROGRESS. DO NOT REDISTRIBUTE. Data Mining on an OLTP System (Nearly) for Free

Download:
Download as a PDF
by Erik Riedel, Christos Faloutsos, Greg Ganger, David Nagle
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/christos/www/PUBLICATIONS/sigmod00-dm-cmu-99-151.pdf
Add To MetaCart

Abstract:

This paper proposes a scheme for scheduling disk requests that takes advantage of the ability of high-level functions to operate directly at individual disk drives. We show that such a scheme makes it possible to support a Data Mining workload on an OLTP system almost for free: there is only a small impact on the throughput and response time of the existing workload. Specifically, we show that an OLTP system has the disk resources to provide a consistent one third of its sequential bandwidth to a background Data Mining task with close to zero impact on OLTP throughput and response time at high transaction loads. At low transaction loads, we show much lower impact than observed in previous work. This means that a production OLTP system can be used for Data Mining tasks without the expense of a second dedicated system. Our scheme takes advantage of close interaction with the on-disk scheduler by reading blocks for the Data Mining workload as the disk head “passes over ” them while satisfying demand blocks from the OLTP request stream. We show that this scheme provides a consistent level of throughput for the background workload even at very high foreground loads. Such a scheme is of most benefit in combination with an Active Disk environment that allows the background Data Mining application to also take advantage of the processing power and memory available directly on the disk drives.

Citations

421 An Introduction to Disk Drive Mod-eling – Ruemmler, Wilkes - 1994
339 Cure: an efficient clustering algorithm for large databases – Guha, Rastogi, et al. - 1998
172 An Overview of Data Warehousing – Chaudhuri, Dayal - 1997
157 Parallel Mining of Association Rules – Agrawal, Shafer - 1996
124 Scheduling algorithms for modern disk drives – Worthington, Ganger, et al. - 1994
97 Active Storage for Large-Scale Data Mining and Multimedia Applications – Riedel, Faloutsos, et al. - 1998
91 A case for intelligent disks (IDISKS – Keeton, Patterson, et al. - 1998
60 Effects of scheduling on file memory operations – DENNING, J
41 The DiskSim Simulation Environment Version 1.0 Reference – Ganger, Worthinghton, et al.
40 Managing Memory to Meet Multiclass Workload Response Time Goals – Brown, Carey, et al. - 1993
21 Ratio rules: A new paradigm for fast, quantifiable data mining – Korn, Labrinidis, et al. - 1998
14 Resource allocation and scheduling for mixed database workloads – Brown, Carey, et al. - 1992
13 BIRCH: A New Data Clustering Algorithm and Its – Zhang, Ramakrishnan, et al. - 1997
9 Taming the Giants and the Monsters: Mining Large Databases for Nuggets of Knowledge” Database Programming and Design – Fayyad - 1998
6 Performance evaluation of concurrent OLTP and DSS workloads in a single database system – Paulin - 1997
5 New Open-Processor Platform Enables Cost-Effective, System-on-a-chip Solutions for Hard Disk Drives” www.cirrus.com/3ci – Logic, Inc - 1998