| C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC-5. In Proceedings of TREC-5, pages 105-118, 1997. |
....i )f(t k , c i ) or the maximum f max (t k ) max m i=1 f(t k , c i ) of their category specific values are usually computed. 3 A definition of n grams We start by precisely characterizing what we mean by statistical phrases. The same definition has been used in a number of IR contexts (e.g. [2, 21]) but never in the case of TC (see Section 7 for a detailed discussion) 2 In this paper we make the general assumption that a document d j can in principle belong to zero, one or many of the categories in C; this assumption is indeed verified in the Reuters 21578 benchmark we use for our ....
....the experimental results we have obtained, we should remark that this is not the only approach to the evaluation of n grams for TC. A possible alternative approach consists in generating only a subset of prospectively good n grams (i.e. n grams selected according to a particular statistical filter [2, 4, 21] or heuristics [6, 11, 23] using them in document indexing, and checking the di#erence in e#ectiveness that a given classifier exhibits with respect to the standard bag of words case. This latter method has no doubt the advantage of a better computational e#ciency; for instance, a heuristics ....
C. Buckley, A. Singhal, and M. Mitra. Using query zoning and correlation within SMART: TREC-5. In E. M. Voorhees and D. K. Harman, editors, Proceedings of TREC-5, 5th Text Retrieval Conference, Gaithersburg, US, 1996.
....of document and query vectors, ik t k k i d q = 1 T d q , 1) where q k is the weight of term k in the query, d ik is the weight of term k in document i, and t is the number of terms in the index. We used SMART Lnu weights for document terms (Buckley, Singhal, Mitra, Salton, 1996; Buckley, Singhal, Mitra, 1997), and SMART ltc weights (Buckley, C. Salton, G. Allan, J. Singhal, A. 1995) for query terms. Lnu weights attempt to match the probability of retrieval given a document length with the probability of relevance given that length (Singhal, Buckley, Mitra, 1996) Our implementation of Lnu ....
....most of the findings were based on the performance of initial retrieval only and did not investigate the effect of automatically expanding the feedback query with phrase index terms. 5 Though Lnu weights with a slope of 0. 2 proved effective in both TREC 4 and TREC 5 (Buckley et al. 1996; Buckley, Singhal, Mitra, 1997), we found a slope of 0.3 to be more effective with respect to initial retrieval in our TREC 6 experiments (Sumner et al. 1998) As is the case with phrases, Lnu weight experiments did not investigate its effects on retrieval beyond the first feedback iteration. In our past TREC experiments, we ....
Buckley, C., Singhal, A., & Mitra, M. (1997). Using query zoning and correlation within SMART: TREC 5. In E.
....We used this technique in previous TRECs (see [Evans et al. 1994, 1996] but in TREC 6 we focused primarily on the evaluation of methods for extracting positive features. We note that the Cornell group has begun using similar techniques for profile training ( query zone ) in recent TRECs [Buckley et al. 1997]. This was preceded by the work of the Xerox group that explored a concept of the local region (see [Schutze et al. 1995] In our TREC 6 experiments we simulated the routing of documents by using vector space retrieval, modified to use distribution statistics (IDF) from a reference corpus ....
Buckley, Chris, Singhal, Amit, Mitra, Mandar, "Using Query Zoning and Correlation Within SMART: TREC 5". In Voorhees, E.M., and Harman, D.K. (Editors), The Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238. Washington, DC: U.S. Government Printing Office, 1997, 105--118.
....words, provided this root form is in our online dictionary. Passage Feedback with IRIS (9) where q k is the weight of term k in the query, d ik is the weight of term k in document i, and t is the number of terms in the index. We used SMART Lnu weights for document terms (Buckley et al. 1996; Buckleyet al... 1997), and SMART ltc weights (Buckley et al. 1995) for query terms. Lnu weights attempt to match the probability of retrieval given a document length with the probability of relevance given that length (Singhal et al. 1996) Our implementation of Lnu weights was the same as that of Buckley et al. ....
Buckley, C., Singhal, A., & Mitra, M. (1997). Using query zoning and correlation within SMART: TREC 5. In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5) (NIST Spec. Publ. 500-238, pp. 105-118). Washington, DC: U.S. Government Printing Office.
....would be adequate as a first step. Because text segmentation is not straightforward and the process itself can have ambiguous outcomes, several previous attempts make use of single characters or all bigrams (adjacent overlapping character pairs) as representation [Chie95,LiLy96,BGHR9x, BuSM9x] in Chinese and in Japanese [FuCr93,OgIw95] These approaches are simple and efficient, provide exhaustive indexing, and do not rely on having a segmentation procedure nor large dictionaries. Thus, it would be interesting to compare the effectiveness of retrieval among these three types of ....
Buckley, C., Singhal, A & Mandar, M (199x). Using query zoning and correlation within SMART: TREC 5. In: Overview of the Fifth Text REtrieval Conference (TREC-5). Harman, D.K. (Ed.). to be published.
....IR systems therefore must either index each individual character or be provided with information about word boundaries. The word boundaries cannot be determined with complete accuracy, and it is unclear how errors in word segmentation degrade IR performance. At the recent TREC 5 conference, (Buckley, Singhal, Mitra 1996) reported excellent retrieval results in the Chinese track using simply the character as word segmentation algorithm discussed below. This led them to suggest that segmentation is a minor issue for retrieving Chinese and shouldn t be a major focus. However, while (Broglio, Callan, Croft 1996) ....
....Our baseline IR system was SMART, a publically available vector space system developed at Cornell. We made minor modifications to SMART to enable the system to process the extended character set used in the Chinese collection; these changes are similar to those described in Cornell s TREC 5 paper (Buckley, Singhal, Mitra 1996). For each experiment, we segmented the entire Chinese collection using a word segmentation algorithm and indexed the collection using SMART. In each case, the queries were also segmented using the same algorithm used to segment the collection 1 . Character as word A simple initial ....
[Article contains additional citation context not shown here]
Buckley, C.; Singhal, A.; and Mitra, M. 1996. Using query zoning and correlation within SMART: TREC 5. in (Harman 1996a).
....of document and query vectors, ik t k k i d q = 1 T d q , 1) where q k is the weight of term k in the query, d ik is the weight of term k in document i, and t is the number of terms in the index. We used SMART Lnu weights for document terms (Buckley, Singhal, Mitra, Salton, 1996; Buckley, Singhal, Mitra, 1997), and SMART ltc weights (Buckley, C. Salton, G. Allan, J. Singhal, A. 1995) for query terms. Lnu weights attempt to match the probability of retrieval given a document length with the probability of relevance given that length (Singhal, Buckley, Mitra, 1996) Our implementation of Lnu ....
....most of the findings were based on the performance of initial retrieval only and did not investigate the effect of automatically expanding the feedback query with phrase index terms. 5 Though Lnu weights with a slope of 0. 2 proved effective in both TREC 4 and TREC 5 (Buckley et al. 1996; Buckley, Singhal, Mitra, 1997), we found a slope of 0.3 to be more effective with respect to initial retrieval in our TREC 6 experiments (Sumner et al. 1998) As is the case with phrases, Lnu weight experiments did not investigate its effects on retrieval beyond the first feedback iteration. In our past TREC experiments, we ....
Buckley, C., Singhal, A., & Mitra, M. (1997). Using query zoning and correlation within SMART: TREC 5. In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5).
....d q = 1 T d q , 1) where q k is the weight of term k in the query, d ik is the weight of term k in document i, and t is the number of terms in the index. Document term weights are SMART Lnu weights, which were effective in both TREC 4 (Buckley, Singhal, Mitra, Salton, 1996) and TREC 5 (Buckley, Singhal, Mitra, 1997). According to Singhal, Buckley, and Mitra (1996) Lnu weights were created in an attempt to match the probability of retrieval given a document length with the probability of relevance given that length. Our implementation of Lnu weights was the same as that of Buckley et al. 1996, 1997) except ....
Buckley, C., Singhal, A., & Mitra, M. (1997). Using query zoning and correlation within SMART: TREC 5. In E. M. Voorhees & D. K. Harman (Eds.), The Fifth Text REtrieval Conference (TREC-5).
....for negative feedback information is the approach that we used last year also, 1] prefering to balance the number of relevant and non relevant used in training. An alternate approach is to use all of the collection other than the known relevant documents for negative information. Buckley et al.[3] have recently adopted an approach similar to ours which they call query zoning. 2.2 Additional concepts The same process described above was applied to find concepts based upon pairs of terms also. This is the same technique that we applied last year. 1] In this case, candidate pairs were ....
....query. The 250 new concepts were then added to that list and assigned the weight as described above. Note that this means that original query terms tended to have a weight that was 2 3 times that of the new concepts. Mistakes in that weighting were corrected in the next step. Buckley et al.[3] explored term co occurrences in their recent work, too. They used twice as many non relevant documents for statistics gathering, and used different measures for term selections, but the idea is similar. 2.3 Weight adjustments We again used the Dynamic Feedback Optimization approach of Buckley ....
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART : Trec 5. In D. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5). National Institute of Standards and Technology, 1996.
.... is a significant increase in retrieval effectiveness (for more details, see Sections 4 and 5) Pecent experiments by several groups participating in TPEC also suggest that, averaged over large query sets (with 50 queries) adhoc expansion yields significant improvements in overall performance [20, 1, 4, 11]. Thus, it seems worthwhile to continue using adhoc expansion in retrieval for short queries, while exploring ways to prevent quer l drift the alteration of the focus of a search topic caused by improper expansion (as described in the above example) Since the presence of a large proportion of ....
....we use the Title field in addition a. Our experiments use the SMART IR system. Documents and queries are indexed using single terms and statistical phrases. Term weights are computed using the Lnu.ltu weighting scheme proposed by Singhal et al. 19] We start with the standard Smart Lnu.ltu run [5, 4, 2] and retrieve the top 1,000 documents for a query aFor these queries, the use of the Description field alone is problematic. See [22] for a discussion of this issue. Task Measure No Pseudo Feedback With Pseudo Feedback (Baseline Query Expansion) TREC 3 Avg. P 0.2397 0.3335 ( 39.1 ) P 20 ....
[Article contains additional citation context not shown here]
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC5. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, 1997.
....document, thus a good short document containing all the query terms will be ranked highly. We hoped that this would enable our blind feedback query expansion to be based on more relevant documents and thus improved. The same phrase strategy (and phrases) used in all previous TRECs (for example [2, 3, 4, 1]) are used for TREC 8. Any pair of adjacent non stopwords is regarded as a potential phrase. The nal list of phrases is composed of those pairs of words occurring in 25 or more documents of the initial TREC 1 document set. Phrases are weighted with the same scheme as single terms. Note that no ....
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART : TREC 5. In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, 1997.
....is reasonably collection independent and thus should be valid across a wide range of new collections. No human expertise in the subject matter is required for either the initial collection creation, or the actual query formulation. The same phrase strategy (and phrases) used in all previous TRECs ([2, 1, 3, 4, 5]) are used for TREC 6. Any pair of adjacent non stopwords is regarded as a potential phrase. The final list of phrases is composed of those pairs of words occurring in 25 or more documents of the initial TREC 1 document set. Phrases are weighted with the same scheme as single terms. Text ....
....of one term is not a reasonable predictor of the presence of another) The top 50 documents are re ranked based on this new similarity and the top 20 in the resulting ranking are assumed relevant. Using this refined set of 20 documents in the feedback process yielded good improvements at TREC 5 [5]. We also experimented with other techniques to improve the initial retrieval. We used natural language processing techniques to identify phrases in queries and documents, hoping that the high quality phrases identified this way could be used to better predict the relevance of a document. We found ....
[Article contains additional citation context not shown here]
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART : TREC 5. In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-???, 1997.
....missing relevant documents. Rank the training data according to the original query, and assume that the top K (say 5,000) documents form the query zone. If some relevant document is ranked below the top K documents, include it in the query zone. This strategy was used in our TREC 5 participation. [7] Parameters involved: ff, fi, fl, and K. ffl QZ 2: All documents with similarity to the original query greater than some threshold (S) Queries can have narrow or broad domains. A query from a very narrow domain should have a smaller query zone than a broad query. To capture this notion, we use ....
C. Buckley, A. Singhal, and M. Mitra. Using query zoning and correlation within SMART : TREC 5. In Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication, 1997. To appear.
.... is a significant increase in retrieval effectiveness (for more details, see Sections 4 and 5) Recent experiments by several groups participating in TREC also suggest that, averaged over large query sets (with 50 queries) adhoc expansion yields significant improvements in overall performance [20, 1, 4, 11]. Thus, it seems worthwhile to continue using adhoc expansion in retrieval for short queries, while exploring ways to prevent query drift the alteration of the focus of a search topic caused by improper expansion (as described in the above example) Since the presence of a large proportion of ....
....use the Title field in addition 3 . Our experiments use the SMART IR system. Documents and queries are indexed using single terms and statistical phrases. Term weights are computed using the Lnu.ltu weighting scheme proposed by Singhal et al. 19] We start with the standard Smart Lnu.ltu run [5, 4, 2] and retrieve the top 1,000 documents for a query 3 For these queries, the use of the Description field alone is problematic. See [22] for a discussion of this issue. Task Measure No Pseudo Feedback With Pseudo Feedback (Baseline Query Expansion) TREC 3 Avg. P 0.2397 0.3335 ( 39.1 ) P 20 ....
[Article contains additional citation context not shown here]
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC5. In E. M. Voorhees and D. K. Harman, editors, Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, 1997.
....collection independent and thus should be valid across a wide range of new collections. No human expertise in the subject matter is required for either the initial collection creation, or the actual query formulation. The same phrase strategy (and phrases) used in all previous TRECs (for example [2, 3, 4, 1]) are used for TREC 7. Any pair of adjacent non stopwords is regarded as a potential phrase. The final list of phrases is composed of those pairs of words occurring in 25 or more documents of the initial TREC 1 document set. Phrases are weighted with the same scheme as single terms. When the text ....
....query in the proper form. It also has the side effect of making the task more difficult since reading and typing the query might take 30 seconds (10 of the available time) High Precision Methodology Our methodology for the TREC 7 HP task is very similar to those we ve used in the past 3 TRECs. [3, 4, 1]. The user s main task is to provide relevance judgements to be fed to our standard Rocchio relevance feedback algorithm. Direct modification of the query (adding deleting terms to from the query or directly modifying weights) was also occasionally (rarely) used by the searchers. The other ....
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART : TREC 5. In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5). NIST Special Publication 500-238, 1997.
....of about 13 over the baseline. Optimization of pair added queries yields even richer improvements than optimizing the non pair added queries, yielding an overall improvement of about 25 over our baseline. The above routing algorithm is quite similar to the routing algorithm we used in TREC 5 [2], except for minor variations. We tried various new techniques to improve upon the above routing algorithm, but none of the techniques we tried yielded better results that the above algorithm. Our first approach revolved around clustering the known relevant articles for a query. The main thought ....
....by Hearst s observations in [5] recently we have tried improving the quality of our relevance assumption by reranking the top fifty documents retrieved by the original query according to some precision criteria and using the top twenty documents from this reranked list in pseudo feedback. [2] One particular criteria that we have used is the presence of several query terms in a small window of text in a document (see Table 7) This year, we used a new method to rerank the top fifty documents to select the set of twenty documents used in pseudo feedback. This technique is based on a new ....
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART: TREC-5. In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5), 1997 (to appear).
....relevant documents above such non relevant articles. This would enhance precision at the top ranks. Phrases have been found to be useful indexing units by most of the leading groups participating at the NIST and DARPA sponsored Text REtrieval Conferences for performance evaluations of IR systems [10, 5, 4, 1, 11, 22, 21, 20, 7, 8]. In this study, we re examine the usefulness of phrases, particularly within the context of high precision retrieval. Statistical and Syntactic Phrases. We consider two classes of phrases: This study was supported in part by the National Science Foundation under grant IRI 9624639. 1. ....
.... termweighting method is improved [5, 1] In fact, a similar observation can be made about the use of phrases as well: in the past five years of TREC, overall retrieval effectiveness has more than doubled, but the added effectiveness due to statistical phrases has gone down from 7 to less than 1 [4]. The aim of this study is to examine the following questions in greater detail: ffl Given a good basic document ranking scheme, what additional improvements can be obtained by using phrases in indexing and retrieval ffl Is there a significant difference in the benefits obtained from using ....
[Article contains additional citation context not shown here]
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC5. In D. K. Harman, editor, Proceedings of the Fifth Text REtrieval Conference (TREC-5).
No context found.
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC-5. In Proceedings of TREC-5, pages 105-118, 1997.
No context found.
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC-5. In Proceedings of TREC-5, pages 105-118, 1997.
No context found.
C. Buckley, A. Singhal, and M. Mitra, "Using Query Zoning and Correlation Within SMART: TREC 5," Proc. TREC 5, Gaithersburg, MD, 1996.
No context found.
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC-5. In Proceedings of TREC-5, pages 105-118, 1997.
No context found.
C. Buckley, A. Singhal, and M. Mitra. Using Query Zoning and Correlation within SMART: TREC-5. In Proceedings of TREC-5, pages 105--118, 1997.
No context found.
Chris Buckley, Amit Singhal, and Mandar Mitra. Using query zoning and correlation within SMART: TREC-5. In E. M. Voorhees and D. K. Harman, editors, Proceedings of TREC-5, pages 105--118, Gaithersburg MD, November 1996. NIST special publication 500-238, http://trec.nist.gov.
No context found.
C. Buckley, A. Singhal and M. Mitra, Using query zoning and correlation within SMART: TREC-5, in: E.M. Voorhees and D.K. Harman (Eds.), Proc. 5th Text Retrieval Conference (TREC-5), Gaithersburg MD, U.S. National Institute Standards Technology, NIST Special Publication 500-238, pp. 105--118, 1996.
No context found.
Chris Buckley, Amit Singhal, and Mandar Mitra. 1996. Using query zoning and correlation within smart: Trec 5. In Proceedings of the Fifth Text Retrieval Conference (TREC-5).
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC