The fundamental limits of performance for a general model of information retrieval from databases are studied. In the scenarios considered a large quantity of information is to be stored on some physical storage device. Requests for information are modeled as a randomly generated sequence with a known distribution. The requests are assumed to be "context-dependent", i.e., to vary according to the sequence of previous requests. The state of the physical storage device is also assumed to depend on the history of previous requests. In general the logical structure of the information to be stored does not match the physical structure of the storage device, and consequently there are nontrivial limits on the minimum achievable average access times, where the average is over the possible sequences of user requests. The paper applies basic information-theoretic methods to establish these limits and demonstrates constructive procedures that approach them, for a wide class of systems. Allowing redundancy greatly lowers the achievable access times, even when the amount added is an arbitrarily small fraction of the total amount of information in the database. The achievable limits both with and without redundancy are computed; in the case where redundancy is allowed the limits essentially coincide with lower limits for more general storage systems.
|
4364
|
Elements of Information Theory
– Cover, Thomas
- 1991
|
|
649
|
An Introduction to Probability Theory and Its
– Feller
- 1968
|
|
550
|
The case for redundant arrays of inexpensive disks (RAID
– Patterson, Gibson, et al.
- 1988
|
|
374
|
Information Theory: Coding Theorems for Discrete Memoryless Systems
– Csiszár, Körner
- 1982
|
|
257
|
RAID: High-performance, reliable secondary storage
– Chen, Lee, et al.
- 1994
|
|
167
|
Space-filling Curves
– Sagan
- 1994
|
|
158
|
Group Representations in Probability and Statistics
– Diaconis
- 1988
|
|
143
|
Introduction to Analytical Number Theory
– Apostol
- 1989
|
|
114
|
Competitive paging with locality of reference
– Borodin, Irani, et al.
- 1995
|
|
79
|
Strongly Competitive Algorithms for Paging With Locality of Reference
– Irani, Karlin, et al.
- 1992
|
|
64
|
Markov paging
– Karlin, Phillips, et al.
|
|
52
|
Intersections of random walks
– Lawler
- 1991
|
|
43
|
An Analysis of File Migration in a UNIX Supercomputing Environment
– Miller, Katz
- 1993
|
|
33
|
Randomized and multipointer paging with locality of reference
– Fiat, Karlin
- 2002
|
|
28
|
Optimal simulation between mesh-connected arrays of processors
– Kosaraju, Atalah
|
|
26
|
IP over connection-oriented networks and distributional paging
– Lund, Phillips, et al.
- 1994
|
|
24
|
Compression of two-dimensional data
– Lempel, Ziv
- 1986
|
|
21
|
Optimal file sharing in distributed networks
– Naor, Roth
- 1995
|
|
16
|
The performance of parity placements in disk arrays
– LEE, KATZ
- 1993
|
|
15
|
Benchmarking tape system performance
– Johnson, Miller
- 1998
|
|
13
|
Meta-data design for a massive data analysis system, in
– Baru, Frost, et al.
- 1996
|
|
13
|
Optimal numberings of an n \Theta n array
– Mitchison, Durbin
- 1986
|
|
8
|
High-performance network and channel based storage
– Katz
- 1992
|
|
6
|
Preserving average proximity in arrays
– DEMILLO, EISENSTAT, et al.
- 1978
|
|
4
|
Aspects and Applications of the Random
– Weiss
- 1994
|
|
3
|
Using redundancy to speed up disk arrays
– Cohn, Stevenson
- 1994
|
|
3
|
Distributed information storage
– Roche
- 1992
|
|
2
|
Data placement for large read-only interactive multimedia information systems on multi-disk environment
– Chen, Kashyap, et al.
- 1908
|
|
2
|
Towards the interoperability of web, database, and mass storage technologies for petabyte databases
– Moore, Marciano, et al.
- 1996
|
|
2
|
On-line algorithms: Competitive analysis and beyond
– Phillips, Westbrook
- 1999
|
|
1
|
Information theory and information retrieval
– Coffey, Herbsman
- 1993
|
|
1
|
Information theory approaches to retrieval from databases
– Coffey, Herbsman, et al.
- 1994
|
|
1
|
Information retrieval from databases
– Coffey, Leung, et al.
- 1994
|
|
1
|
Strongly competitive algorithms with locality of reference
– Irani, Karlin, et al.
- 1995
|
|
1
|
Secondary storage and filesystems," in The Computer Science and Engineering
– McKusick
- 1997
|
|
1
|
Random walks on lattices. II
– Montroll, Weiss
- 1965
|
|
1
|
Tuning database design for high performance
– Shasha
- 1997
|
|
1
|
A generalized Pascal's triangle
– Wong, Maddocks
- 1975
|