Results 1 -
4 of
4
PARTIAL-MATCH RETRIEVAL ALGORITHMS*
"... Abstract. We examine the efficiency of hash-coding and tree-search algorithms for retrieving from a file of k-letter words all words which match a partially-specified input query word (for example, retrieving all six-letter English words of the form S**R*H where "* " is a "don’t care ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract. We examine the efficiency of hash-coding and tree-search algorithms for retrieving from a file of k-letter words all words which match a partially-specified input query word (for example, retrieving all six-letter English words of the form S**R*H where "* " is a "don’t care " character). We precisely characterize those balanced hash-coding algorithms with minimum average number of lists examined. Use of the first few letters of each word as a list index is shown to be one such optimal algorithm. A new class of combinatorial designs (called associative block designs) provides better hash functions with a greatly reduced worst-case number of lists examined, yet with optimal average behavior maintained. Another efficient variant involves storing each word in several lists. Tree-search algorithms are shown to be approximately as efficient as hash-coding algorithms, on the average. In general, these algorithms require time about O(n <k-s)/k) to respond to a query word with s letters specified, given a file of n k-letter words. Previous algorithms either required time O(s n/k) or else used exorbitant amounts of storage.
A SCATTER STORAGE SCHEME FOR DICTIONARY LOOKUPS
"... Scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. Of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason- • able storage requirements. The theoretical aspects of computing hash addresses ..."
Abstract
- Add to MetaCart
Scatter storage schemes are examined with respect to their applicability to dictionary lookup procedures. Of particular interest are virtual scatter methods which combine the advantages of rapid search speed and reason- • able storage requirements. The theoretical aspects of computing hash addresses are developed, and several algorithms are evaluated. Finally, experiments with an actual text lookup process are described, and a possible library application is discussed. A document retrieval system must have some means of recording the subject matter of each document in its data base. Some systems store the actual text words, while others store keywords or similar content indicators. The SMART system ( 1) uses concept numbers for this purpose, each number indicating that a certain word appears in the document. Two advantages are apparent. First, a concept number can be held in a fixed-sized storage element. This produces faster processing than if variable-sized keywords were used. Second, the amount of storage required to hold