Threshold calibration in CLARIT adaptive filtering (1999) [17 citations — 2 self]
Abstract:
Abstract In this paper, we describe the system and methods used for the CLARITECH entries in the TREC–7 Filtering Track. Our main aim was to study algorithms, designs, and parameters for Adaptive Filtering, as this comes closest to actual applications. For efficiency's sake, however, we adapted a system largely geared towards retrieval and introduced a few critical new components. The first of these components, the delivery ratio mechanism, is used to obtain a profile threshold when no feedback has been received. A second method, which we call beta–gamma regulation, is used for threshold updating. It takes into account the number of judged documents processed by the system as well as an expected bias in optimal threshold calculation. Several parameters were determined empirically: apart from the parameters pertaining to the new components, we also experimented with different choices for the reference corpus, and different "chunk " sizes for processing news stories. Gradually increasing chunk sizes over "time " appears to help profile learning. Finally, we examined the effect of terminating underperforming queries over the AP90 corpus and found that the utility metric over AP88–AP89 was a good predictor. All of the above innovations contributed to the success of the CLARITECH system in the adaptive filtering track. 1
Citations
| 562 | Automatic Text Processing – Salton - 1989 |
| 559 | Relevance feedback in information retrieval, The – Rocchio - 1971 |
| 63 | D.C.: Government Printing Office – Washington - 1902 |
| 32 | Automatic Indexing using Selective NLP and First-Order Thesauri – Evans, Ginther-Webster, et al. - 1991 |
| 1 | Effectivenss of Clustering in AdHoc Retrieval," This Volume – Evans, Huettner, et al. |

