| W. Wong and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--669, September 1993. |
....and meta search scenarios, and we are not aware of previous applications in a peer to peer environment. On the other hand, there has also been significant work in the IR community, much of it preceding the above, on pruning techniques for vector space queries. Some early work is described in [10, 22, 37, 49, 52], and more closely related recent work is in [1, 2] One difference between these two strains of work is that in the IR case, a random lookup of a posting is much more expensive than scanning it. Thus, the recent pruning techniques from IR typically restrict index access to scans, resulting in ....
W. Wong and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--669, September 1993.
....for a small sample. Previous work on pruning techniques for top can be divided into two fairly disjoint sets of literature. In the IR community, researcher have studied pruning techniques for the fast evaluation of vector space queries since at least the 1980s. Some early work is described in [11, 21, 38, 41]. Most relevant to our work are the techniques by Persin, Zobel, and Sacks Davis [32] and the follow up work in [1, 2] which study effective early termination (pruning) schemes for the cosine measure based on the idea of sorting the postings in the inverted lists by their contribution to the ....
W. Wong and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--669, September 1993.
....and meta search scenarios, and we are not aware of previous applications in a peer to peer environment. On the other hand, there has also been significant work in the IR community, much of it preceding the above, on pruning techniques for vector space queries. Some early work is described in [7, 17, 24, 35, 38], and more closely related recent work is in [1, 2] One difference between these two strains of work is that in the IR case, a random lookup of a posting is much more expensive than scanning. Thus, recent pruning techniques from IR typically restrict access to scans, resulting in more limited ....
W. Wong and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--669, September 1993.
....limit on the num ber of terms considered or on the total number of pointers decoded; or to place an upper bound on the term frequency ft, and only process terms that appear in fewer than x of the documents, for some predetermined value x. Buckley and Le wit [1] Lucarella [8] and Wong and Lee [13] have also considered various stopping rules, and derive conditions under which inverted file entries can be discarded without affecting the r top documents (although their order within the ranking may be different) Here we are prepared to allow approx imate as well as exact rules, acknowledging ....
W.Y.P. Wong and D.K. Lee. Implementa- tions of partial document ranking using inverted files. Information Processing Management, 29(5):647 669, 1993.
....after some threshold of computational resources has been expended. One approach has been to count disk I O and stop after a threshold has been Table 10: Training Sets vs Number of Unique Terms training set size number of unique terms 160 140078 80 97408 40 69651 20 48730 10 34953 reached [49]. The key to this approach is to sort the terms in the query based on some indicator of term goodness and process the terms in this order. By doing this, query processing stops after the important terms have been processed. Sorting the terms is really analogous to sorting their posting lists. ....
....the quality of the result. Using term frequency is a means of implementing a dynamic stop word list in which high frequency terms are eliminated without using a static set of stop words. 3.2. 2 Cutoff Based on Maximum Estimated Weight Two other measures of sorting the query terms are described in [49]. The first computes the maximum term frequency of a given query term as tf max and uses the following as a means of sorting the query. The tf max is computed as the highest term frequency for the query term in any of the documents that contain this term. tf max # idf The idea is that a term that ....
W. Yee, P. Wong, and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--689, 1993.
....employed: 1. Boolean weighting [121] the weight is assigned one if the keyword occurs in the document and zero otherwise. 2. Term frequency (TF) the number of occurrences of the keyword record in the document. 3. Normalization [76] TF is normalized so as to lie between 0 and 1. 4. TF IDF [122, 133 149]: the product of Term Frequency (TF) and Inverse Document Frequency (IDF) IDF measures the discrimination of a document from the reminder of the collected documents. A high value indicates the keyword occurs frequently in a document but infrequently in the remainder of the collected documents. ....
Wong W.Y. and Lee D.L., "Implementations of Partial Document Ranking Using Inverted Files", Information Processing & Management., 29(5):647669, 1993.
....Determining what these functions actually look like might be done experimentally or analytically. The problem with this scheme is that, short of processing all of the term weights, it gives us no guarantees on the correctness of the final ranking obtained. This scheme was proposed by Wong and Lee [93], who describe two estimation techniques for determining how many term weights must be processed to achieve a given level of retrieval effectiveness. An alternative to this ad hoc stopping condition would be a stopping condition that takes advantage of the organization of the term weights. Each ....
Wong, W. Y. P. and Lee, D. L. Implementations of partial document ranking using inverted files. Inf. Process. & Mgmnt., 29(5):647--669, 1993.
....of TREC queries against the TREC data [22] 3.2 Partial Result Set Retrieval Another way to improve run time performance is to stop processing after some threshold of computational resources has been expended. One approach has been to count disk I O and stop after a threshold has been reached [33]. The key to this approach is to sort the terms in the query based on some indicator of term goodness and process the terms in this order. By doing this, query processing stops after the important terms have been processed. Sorting the terms is really analogous to sorting their posting lists. ....
....the quality of the result. Using term frequency is a means of implementing a dynamic stop word list in which high frequency terms are eliminated without using a static set of stop words. 3.2. 2 Cutoff Based on Maximum Estimated Weight Two other measures of sorting the query terms are described in [33]. The first computes the maximum term frequency of a given query term as tf max and uses the following as a means of sorting the query. tf max Theta idf The idea is that a term that appears frequently in all the documents in which it appears is probably of more importance than a term that ....
W. Yee, P. Wong, and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--689, 1993.
....against the TREC data [25] 4.2 Partial Result Set Retrieval Another way to improve run time performance is to stop processing after some threshold of computational resources has been expended. One approach has been to count disk I O and stop after a threshold of disk I O has been reached [35]. The key to this approach is to sort the terms in the query based on some indicator of term goodness and process the terms in this order. By doing this, query processing stops after the important terms have been processed. Sorting the terms is really analogous to sorting their posting lists. ....
....the quality of the result. Using term frequency is a means of implementing a dynamic stop word list in which high frequency terms are eliminated without using a static set of stop words. 4.2. 2 Cutoff Based on Maximum Estimated Weight Two other measures of sorting the query terms are described in [35]. The first computes the maximum term frequency of a given query term as tf max and uses the following as a means of sorting the query. tf max Theta idf The idea is that a term that appears frequently in all the documents in which it appears, is probably of more importance than a term that ....
W. Yee, P. Wong, and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--689, 1993.
....Determining what these functions actually look like might be done experimentally or analytically. The problem with this scheme is that, short of processing all of the belief values, it gives us no guarantees on the correctness of the final ranking obtained. This scheme was proposed by Wong and Lee [61], who describe two estimation techniques for determining how many belief values must be processed to achieve a given level of retrieval effectiveness. An alternative to this ad hoc stopping condition would be a stopping condition that takes advantage of the organization of the belief values. Each ....
....update performance to have the worst query processing performance due to fragmentation of the long inverted lists. My inverted list implementation will have similar incremental update characteristics, but is actually intended to provide superior query processing performance. Wong and Lee [61] describe an inverted list implementation where the inverted list entries are sorted by within document term frequency and each inverted list is divided into pages. The potential similarity contribution of a given inverted list page can be estimated from the inverse document frequency of the term ....
W. Y. P. Wong and D. L. Lee. Implementations of partial document ranking using inverted files. Inf. Process. & Mgmnt., 29(5):647--669, 1993.
....will become much more competitive in speed if it doesn t have to perform false drop elimination. However, there are still many interesting problems to be investigated. For instance, we are investigating search heuristics to further improve search performance and reduce the effect of false drops [19]. We are investigating methods which use the signature file to produce a coarse ranking and apply exact ranking (without false drops) only to the top ranked documents, thus obtaining low storage overhead and high precision [7] Acknowledgment The authors are indebted to the anonymous reviewers ....
Wong, W.Y.P. & Lee, D.L., "Implementations of partial document ranking using inverted files," Information Processing and Management 29 (Sept. 1993), 647--669.
.... (e.g. see [Tur94, VH97] Of course, efficiency is also a concern for IR systems, and there has been significant research on indexing techniques and query evaluation heuristics that improve the query response time while maintaining a constant level of effectiveness (e.g. see [Fal85, ZMSD92, WL93, Per94, Bro95, TF95] This research, however, has not investigated the system mechanisms that underlie the indexing and querying approaches. Database systems developers have long realized that efficient access to data requires smart algorithms for disk allocation and disk scheduling, buffer ....
....for calculating the ranking) are stored. Inverted lists are traditionally ordered by document identifiers, as many query evaluation algorithms use those identifiers (e.g. see [ZMSD92, MZ94, Bro95] An alternative organization, used in this paper, is a frequency ordering of the inverted lists [WL93, Per94] where those documents in which the terms occur most frequently are stored first in the lists. This organization has the advantage that documents found early in the inverted lists are likely to be highly ranked, and has been shown to give significant performance advantages [WL93, Per94] ....
[Article contains additional citation context not shown here]
W.Y.P. Wong and D.L. Lee. Implementation of partial document ranking using inverted files. Information Processing & Management, 29(5), Oct. 1993.
No context found.
W. Wong and D. Lee. Implementations of partial document ranking using inverted files. Information Processing and Management, 29(5):647--669, September 1993.
No context found.
W. Y. P. Wong and D. L. Lee. Implementation of partial document ranking using inverted files. Information Processing & Management, 29(5), Oct. 1993.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC