External Memory Algorithms and Data Structures (1998)
| Citations: | 286 - 24 self |
BibTeX
@MISC{Vitter98externalmemory,
author = {Jeffrey Scott Vitter},
title = { External Memory Algorithms and Data Structures},
year = {1998}
}
Years of Citing Articles
OpenURL
Abstract
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "out-of-core" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machine-independent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.







