| Singhal, A. (1997). Learning routing queries in a query zone. In Proceedings of the 20th International Conference on Research and Development in Information Retrieval. |
....that give good overall performance for a document set requires careful fine tuning. Even in case of 2 D feature space, it is easy to construct cases where for a fixed set of parameters, a large number of non relevant judgements move the query vector toward the non relevant region. Singhal et.al. [5] propose a dynamic query zoning scheme for learning queries in Rocchio s framework by using a restricted set of non relevant documents in place of the entire set of non relevant documents. In a recent study [6] Dunlop concludes that with Rocchio s formula using non relevance feedback, the results ....
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proc. of SIGIR, Philadelphia, PA, 1997.
....in both the unclassified document and all classified documents. That is, it compares each of the test documents with each of the documents in the training set. The bigger training set, the higher computational cost. We will later argue more in detail why this is the case. Now, as earlier research [3, 13, 14, 25, 28] has shown the prototype per category remedy this cost. 2.2 Rocchio Instead of considering similarity of an unclassified document to all docs in a category, a natural alternative is to somehow take a single representative document per category, called a prototype, and to compare the unclassified ....
....in category k and then by calculating the score for each of the co occurring weights dividing by the number of documents in the category. Then subtracting the average score for each of the weights in the document vectors not in this category, C k . This gives the prototype vector #c k [14, 25, 28]. w ik . 2.1) where is the number of documents in the category. If an instance in the prototype vector has a value below 0 it is set to 0. The # and # in (2.1) are parameters that adjust the relative impact of positive and negative training examples. For our experiments we adopt the ....
Buckley Singhal, Mitra. Learning routing queries in a query zone. Technical report, AT & T Labs Research, Dept. ofCS, Cornell University, Sabir Research. 63
....this problem an appropriate set of training examples must be selected. We call this training subset the category zone . This notion of category zone is similar to the local regions described in Wiener et al. 39] and Ng et al. 27] but is inspired by the query zone proposed by Singhal et al. [34] for text routing. Their query zoning is based on the observation that in a large collection a query will have a set of documents that constitutes its domain. Nonrelevant documents that are outside the domain are easy to identify, but it is more dicult to di erentiate between relevant and ....
....observation that in a large collection a query will have a set of documents that constitutes its domain. Nonrelevant documents that are outside the domain are easy to identify, but it is more dicult to di erentiate between relevant and non relevant documents within the query domain. Singhal et al. [34] de ne a procedure that tries to approximate the domain of the query and then they use this domain to train their routing method. We suggest that in text categorization, each category also has its own domain. It will be easier to train a learning algorithm with those documents from the category ....
[Article contains additional citation context not shown here]
Singhal A, Mitra M, and Buckley C. Learning routing queries in a query zone. Research and Development in Information Retrieval, Philadelphia, PA, pp. 25{ 32, July 1997.
....optimal query vector is the difference vector of the centroid vectors for the relevant and the non relevant documents. The creation of the optimal query vector is similar to moving the original vector towards the relevant document vectors, and shifting away from the non relevant document vectors [16]. In order to maintain the characteristics of the original query in this process, researchers have found it useful to include the original query in the feedback query creation process. Coefficients to control the effects of relevant and non relevant document vectors have also been introduced to ....
Singhal A, Mitra M, and Buckley C, "Learning Routing Queries in a Query Zone", in Proceedings of SIGIR'97, pp 25-32, 1997 Gustaf Brandberg, Uppsala University/KDD R&D Laboratories: Query Expansion using Collaborative Filtering Algorithm 39 Other references
....ones, they do in k NN 1 neg , since each of the k most similar documents, however semantically distant, brings a little weight to the final sum of which the CSV consists. A similar observation lies at the heart of the use of query zoning techniques in the context of Rocchio classifiers [14, 12]; here, the idea is that in learning a concept, the most interesting negative instances of this concept are the least negative ones (i.e. the negative instances most similar to the positive ones) in that they are more di#cult to separate from the positive instances. Similarly, support vector ....
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pages 25--32, Philadelphia, US, 1997.
....cation. Below, we discuss two algorithms which can be used for relevance feedback ranking. Rocchio s Algorithm A basic relevance feedback technique in information retrieval is Rocchio s algorithm, which was developed in 1961, and has been shown to perform relatively well despite its simplicity [SMB97]. The idea behind Rocchio s algorithm is to nd what is called the optimal query, which in theory ideally describes the representation of a query that would return the relevant documents. Since the optimal query represents what the user is looking for in a document, we can use it to calculate, by ....
....pivoted normalization = 1.0 slope) old normalization where the slope is the amount to tilt. In addition to normalizing the document vectors, training Rocchio on only the query domain (the set of documents in the topic of the query) instead of the entire corpus will also yield better results [SMB97]. In the original algorithm, non relevant documents outside the query topic a ected the formulation of the optimal query, but this could lead to problems. Consider the case where the user asks Which Volkswagon models have performed well in crash safety tests The word car points us to the ....
A. Singhal, M. Mitra, C. Buckley. Learning Routing Queries in a Query Zone. Proceedings of SIGIR '97, August 1997, pp. 25-32. 70
....The right plot shows the optimal T9U threshold. As we will prove in Section 3.5.2, the optimal T9U threshold is the intersection of the probability densities of relevant and non relevant document scores, weighted as 2r and n r respectively. 6 For the query zoning method, see Section 3.6. 2 or [12]. 3.5.2 Optimizing Linear Utility Functions Let U any linear utility function of the form U ( 1 ; 2 ; 3 ; 4 ) 1 R 2 N 3 R 4 N ; 10) where 1 ; 2 ; 3 ; 4 denote the gain or cost associated with each document that falls in the corresponding category. The linearity of ....
....query Qn . We do not use such an initial query elimination for the runs with decay since the initial query decays anyway. For the batch adaptive task, is set at the one fourth of . Since larger training sets are given for this task, the danger of over tting is smaller. When using query zones, [12] have shown that = is a reasonable setting. This also explains why we set = also for the adaptive tasks. Thresholding document scores during ltering can been seen as a form of on the y query zoning. Any nonrelevant documents retrieved in this way are indeed the most interfering with the ....
[Article contains additional citation context not shown here]
A. Singhal, C. Buckley, and M. Mitra. Learning Routing Queries in a Query Zone. In N. Belkin, D. Narasimhalu, and P. Willett, editors, Proceedings of the 20st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25-32. ACM Press, New York, July 1997.
....general subject area, and T removes a specific subset of those documents, leaving a small set of highly related documents. This suggests a straightforward algorithm to achieve the same goal directly. This local clustering approach is similar to an unsupervised version of Rocchio with Query Zoning (Singhal, 1997). Further analysis of ICA on similar collections reveals other interesting behavior on large datasets. For example, it is known that ICA will attempt to find an unmixing matrix that is full rank. This is in conflict with the notion that these collections actually reside in a much smaller ....
Singhal, A. (1997). Learning routing queries in a query zone. In Proceedings of the 20th International Conference on Research and Development in Information Retrieval.
....for each classi ers is selected from the training set. We call this training subset the category zone . This concept of category zone is similar to the local regions described in Weiner et al. 7] and Ng, Goh, Low [5] but is inspired by the query zone proposed by Singhal, Mitra, Buckley [6] for text routing. 4 Experiments Our experiments use the OHSUMED collection [1] which is a subset of the MEDLINE collection that contains 348,543 records from 1987 to 1991. From this collection we selected the subset of 233,455 records hat have title, abstract, and MeSH terms. We selected the ....
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25-32, 1997.
....accuracy has been suggested by many researchers. Bartell [7] adds significant evidence that suggests this hypothesis is true. Earlier contributions include a widely used query expansion technique introduced by Rocchio [70] This method has been improved using machine learning approaches [48, 49, 56, 80, 78] and heuristic optimization techniques [10] on very large collections. Most of the previous work for document classification has focused on supervised methods that use training documents with known relevance. Since relevance assessments are not always available, it is desirable to improve ....
A. Singhal, M. Mitra, and C. Buckley ,"Learning Routing Queries in a Query Zone," Proceedings of ACM SIGIR, 25-33, 1997.
....but that have little to no surprise value represent topic features covering broad news topics such as politics, death, destruction, and warfare. We expect that topic and event level features can be combined in a meaningful way, perhaps with the topic features providing a form of query zone. [18]) Two of the runs in Figure 3 represent the tracking ability of queries generated using surprise values. Because many of the surprising features appear to be strong indicators of the event being discussed, we had expected they could be used to build superior tracking queries. Unfortunately, the ....
Amit Singhal, Mandar Mitra, and Chris Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR '97, pages 25--32, 1997.
....domain related to the query [XuCr96] Making the query longer may improve another round of 1 st stage retrieval. And retrieved document local statistics reweight terms in the 2 nd stage using the set of domain related documents rather than the whole collection as used during the initial stage [SiMB97]. For collection enrichment, we Query Type Title Short Desc Medium All Long Relv.Ret (at most) 2983 0 (4674) 3034 2 (4674) 3162 6 (4674) Avg.Prec .2427 0 .2543 5 .2723 12 P 10 .4480 0 .4600 3 .4960 11 P 20 .3770 0 .3930 4 .4340 15 P 30 .3353 0 .3613 8 .3947 18 R.Prec .2705 0 .2831 5 .2960 9 ....
Singhal, A, Mitra, M & Buckley C (1997). Learning routing queries in a query zone. In: Proc. of 20 th Ann. Intl. ACM-SIGIR Conf. On R&D in IR. Belkin, N.J, Narasimhalu, D, &Willett, P (eds). pp.25-32.
....have little to no surprise value represent event class features covering broad news areas such as politics, death, destruction, and warfare. We expect that event classand event level features can be combined in a meaningful way, perhaps with the class features providing a form of query zone. [19]) Because many of the surprising features appear to be strong indicators of the event being discussed, we had expected they could be used to build superior tracking queries. Unfortunately, the evaluation does not support that hope. Two problems arise: 1) the surprising words do not provide a ....
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR '97, pages 25--32, 1997.
....algorithms initialized using Rocchio s relevance feedback formula [12, 11] We therefore compared the effectiveness of RankBoost with the latest in this series of algorithms. This version, which we will call Rocchio QZ DFO here, incorporates dynamic feedback optimization [3] and query zoning [18]. The algorithm has 5 parameterized phases and is described in detail elsewhere [16] 6. EXPERIMENTS We now report our experimental results on applying RankBoost to train ranking models. Study 1 focuses on issues of model fitting, while Study 2 focuses on the character of learned term weighting ....
Amit Singhal, Mandar Mitra, and Chris Buckley. Learning routing queries in a query zone. In Proceedings of the 20th Annual International Conference on Research and Development in Information Retrieval, pages 25--32, 1997.
....linear models was used to retrieve the top 5000 scoring AP88 documents. Then a second set of linear models was trained using the union of the judged AP88 documents and the top 5000 AP88 documents, with unjudged documents in the top 5000 being treated as non relevant, i.e. a query zoning approach. [13] The second set of linear models were run back over the training documents to score them, and a threshold for each model was chosen that optimized the desired effectiveness measure (filtering measure F1 or F3) The models and thresholds were then applied to the test (AP89 90) documents, with the ....
Amit Singhal, Mandar Mitra, and Chris Buckley. Learning routing queries in a query zone. In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25--32. Association for Computing Machinery, New York, July 1997.
....most such studies use a weak version of Rocchio s algorithm, not well suited for text filtering. In recent years, the IR community has proposed several modifications to Rocchio s algorithm that have vastly improved the performance of this algorithm: better term weighting [26, 35] query zoning [36], and dynamic feedback optimization [6] being the three most notable improvements. In this study, we adapt a state of the art Rocchio s algorithm for the text filtering task, and compare it to a fairly new ML algorithm called boosting. We first develop a text filtering algorithm based on Freund ....
....in recent years [26, 35] Better term weights in the training documents yield a better Rocchio query. A better Rocchio query along with better term weights for the test documents yields much improved scores (i.e. better ranking) for the test documents. 2. Query Zoning: Recently Singhal et al. [36] have proposed that only a selected set of non relevant documents that have some relationship to a user s interest should be used in Rocchio s method. They proposed sampling of the non relevant documents to form a query zone. 3. Dynamic Feedback Optimization: Buckley et al. 6] have shown that ....
[Article contains additional citation context not shown here]
Amit Singhal, Mandar Mitra, and Chris Buckley. Learning routing queries in a query zone. In Proceedings of the Twentieth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25--32, July 1997.
No context found.
Singhal, A. (1997). Learning routing queries in a query zone. In Proceedings of the 20th International Conference on Research and Development in Information Retrieval.
No context found.
Singhal, A., Mitra, M.& Buckley, C. (1997), Learning routing queries in a query zone, in `Proceedings SIGIR'97, 20th ACM International Conference on Research and Development in Information Retrieval', pp. 25--32.
No context found.
A. Singhal, M. Mitra and C. Buckley (1997). Learning Routing Queries in a Query Zone. Proceedings SIGIR '97, pp. 25-32.
No context found.
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25--32, Philadelphia, PA, July 1997. ACM, ACM Press.
No context found.
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR-97, pages 25--32, Philadelphia, US, 1997.
No context found.
A. Singhal, M. Mitra, and C. Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR-97, pages 25--32, Philadelphia, US, 1997.
No context found.
Amit Singhal, Mandar Mitra, and Christopher Buckley. Learning routing queries in a query zone. In Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, pages 25-32, Philadelphia, US, 1997.
No context found.
Amit Singhal, Mandar Mitra, and Chris Buckley. 1997. Learning routing queries in a query zone. In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 25-32.
No context found.
Amit Singhal and Mandar Mitra Buckley. Learning routing queries in a query zone. In Proceedings of the 20th International Conference on Research and Development in Information Retrieval (SIGIR-97), pages 25--32, Philadelphia,PA, July 1997.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC