Results 1 - 10
of
38
Inferring query performance using pre-retrieval predictors
- In Proc. Symposium on String Processing and Information Retrieval
, 2004
"... Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this pape ..."
Abstract
-
Cited by 46 (4 self)
- Add to MetaCart
Abstract. The prediction of query performance is an interesting and important issue in Information Retrieval (IR). Current predictors involve the use of relevance scores, which are time-consuming to compute. Therefore, current predictors are not very suitable for practical applications. In this paper, we study a set of predictors of query performance, which can be generated prior to the retrieval process. The linear and non-parametric correlations of the predictors with query performance are thoroughly assessed on the TREC disk4 and disk5 (minus CR) collections. According to the results, some of the proposed predictors have significant correlation with query performance, showing that these predictors can be useful to infer query performance in practical applications. 1
Learning to estimate query difficulty: including applications to missing content detection and distributed information retrieval
- In SIGIR ’05: Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
, 2005
"... In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality ..."
Abstract
-
Cited by 39 (5 self)
- Add to MetaCart
In this article we present novel learning methods for estimating the quality of results returned by a search engine in response to a query. Estimation is based on the agreement between the top results of the full query and the top results of its sub-queries. We demonstrate the usefulness of quality estimation for several applications, among them improvement of retrieval, detecting queries for which no relevant content exists in the document collection, and distributed information retrieval. Experiments on TREC data demonstrate the robustness and the effectiveness of our learning algorithms.
University of glasgow at trec 2005: Experiments in terabyte and enterprise tracks with terrier
- In Proceedings of TREC-05
, 2005
"... With our participation in TREC 2005, we continue experiments using Terrier, a modular and scalable Information Retrieval (IR) framework, in 4 tasks from the Terabyte and Enterprise tracks. In the Terabyte track, we investigate new Divergence From Randomness weighting models, and a novel query expa ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
With our participation in TREC 2005, we continue experiments using Terrier, a modular and scalable Information Retrieval (IR) framework, in 4 tasks from the Terabyte and Enterprise tracks. In the Terabyte track, we investigate new Divergence From Randomness weighting models, and a novel query expansion approach that can take into account various document fields, namely content, title and anchor text. In addition, we test a new selective query expansion mechanism which determines the appropriateness of using query expansion on a per-query basis, using statistical information from a low-cost query performance predictor. In the Enterprise track, we investigate combining document fields evidence with other information occurring in an Enterprise setting. In the email known item task, we also investigate temporal and thread priors suitable for email search. In the expert search task, for each candidate, we generate profiles of expertise evidence from the W3C collection. Moreover, we propose a model for ranking these candidate profiles in response to a query.
University of Glasgow at TREC2004: Experiments in Web, Robust and Terabyte tracks with Terrier
- In Proceedings of TREC 2004
, 2004
"... With our participation in TREC2004, we test Terrier, a modular and scalable Information Retrieval framework, in three tracks. For the mixed query task of the Web track, we employ a decision mechanism for selecting appropriate retrieval approaches on a per-query basis. For the robust track, in order ..."
Abstract
-
Cited by 17 (6 self)
- Add to MetaCart
With our participation in TREC2004, we test Terrier, a modular and scalable Information Retrieval framework, in three tracks. For the mixed query task of the Web track, we employ a decision mechanism for selecting appropriate retrieval approaches on a per-query basis. For the robust track, in order to cope with the poorlyperforming queries, we use two pre-retrieval performance predictors and a weighting function recommender mechanism. We also test a new training approach for the automatic tuning of the term frequency normalisation parameters. In the Terabyte track, we employ a distributed version of Terrier and test the effectiveness of techniques, such as using the anchor text, pseudo query expansion and selecting different weighting models for each query. 1
On ranking the effectiveness of searches
- In: Proc. of the 29th Annual Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval
, 2006
"... There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and est ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
There is a growing interest in estimating the effectiveness of search. Two approaches are typically considered: examining the search queries and examining the retrieved document sets. In this paper, we take the latter approach. We use four measures to characterize the retrieved document sets and estimate the quality of search. These measures are (i) the clustering tendency as measured by the Cox-Lewis statistic, (ii) the sensitivity to document perturbation, (iii) the sensitivity to query perturbation and (iv) the local intrinsic dimensionality. We present experimental results for the task of ranking 200 queries according to the search effectiveness over the TREC (discs 4 and 5) dataset. Our ranking of queries is compared with the ranking based on the average precision using the Kendall τ statistic. The best individual estimator is the sensitivity to document perturbation and yields Kendall τ of 0.521. When combined with the clustering tendency based on the Cox-Lewis statistic and the query perturbation measure, it results in Kendall τ of 0.562 which to our knowledge is the highest correlation with the average precision reported to date.
Query Hardness Estimation Using Jensen-Shannon Divergence Among Multiple Scoring Functions
"... Abstract. We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Abstract. We consider the issue of query performance, and we propose a novel method for automatically predicting the difficulty of a query. Unlike a number of existing techniques which are based on examining the ranked lists returned in response to perturbed versions of the query with respect to the given collection or perturbed versions of the collection with respect to the given query, our technique is based on examining the ranked lists returned by multiple scoring functions (retrieval engines) with respect to the given query and collection. In essence, we propose that the results returned by multiple retrieval engines will be relatively similar for “easy ” queries but more diverse for “difficult ” queries. By appropriately employing Jensen-Shannon divergence to measure the “diversity ” of the returned results, we demonstrate a methodology for predicting query difficulty whose performance exceeds existing state-ofthe-art techniques on TREC collections, often remarkably so. 1
Reducing the risk of query expansion via robust constrained optimization
- Proceedings of the Eighteenth International Conference on Information and Knowledge Management (CIKM 2009). ACM. Hong
"... We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expans ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We introduce a new theoretical derivation, evaluation methods, and extensive empirical analysis for an automatic query expansion framework in which model estimation is cast as a robust constrained optimization problem. This framework provides a powerful method for modeling and solving complex expansion problems, by allowing multiple sources of domain knowledge or evidence to be encoded as simultaneous optimization constraints. Our robust optimization approach provides a clean theoretical way to model not only expansion benefit, but also expansion risk, by optimizing over uncertainty sets for the data. In addition, we introduce risk-reward curves to visualize expansion algorithm performance and analyze parameter sensitivity. We show that a robust approach significantly reduces the number and magnitude of expansion failures for a strong baseline algorithm, with no loss in average gain. Our approach is implemented as a highly efficient post-processing step that assumes little about the baseline expansion method used as input, making it easy to apply to existing expansion methods. We provide analysis showing that this approach is a natural and effective way to do selective expansion, automatically reducing or avoiding expansion in risky scenarios, and successfully attenuating noise in poor baseline methods.
G.: Matching task profiles and user needs in personalized web search
- In: CIKM ’08: Proceeding of the 17th ACM conference on Information and knowledge mining
, 2008
"... Personalization has been deemed one of the major challenges in information retrieval with a significant potential for providing better search experience to individual users. Especially, the need for enhanced user models better capturing elements such as users ’ goals, tasks, and contexts has been id ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Personalization has been deemed one of the major challenges in information retrieval with a significant potential for providing better search experience to individual users. Especially, the need for enhanced user models better capturing elements such as users ’ goals, tasks, and contexts has been identified. In this paper, we introduce a statistical language model for user tasks representing different granularity levels of a user profile, ranging from very specific search goals to broad topics. We propose a personalization framework that selectively matches the actual user information need with relevant past user tasks, and allows to dynamically switch the course of personalization from re-finding very precise information to biasing results to general user interests. In the extreme, our model is able to detect when the user’s search and browse history is not appropriate for aiding the user in satisfying her current information quest. Instead of blindly applying personalization to all user queries, our approach refrains from undue actions in these cases, accounting for the user’s desire of discovering new topics, and changing interests over time. The effectiveness of our method is demonstrated by an empirical user study.
Using Coherence-based Measures to Predict Query Difficulty
"... Abstract. We investigate the potential of coherence-based scores to predict query difficulty. The coherence of a document set associated with each query word is used to capture the quality of a query topic aspect. A simple query coherence score, QC-1, is proposed that requires the average coherence ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Abstract. We investigate the potential of coherence-based scores to predict query difficulty. The coherence of a document set associated with each query word is used to capture the quality of a query topic aspect. A simple query coherence score, QC-1, is proposed that requires the average coherence contribution of individual query terms to be high. Two further query scores, QC-2 and QC-3, are developed by constraining QC-1 in order to capture the semantic similarity among query topic aspects. All three query coherence scores show the correlation with average precision necessary to make them good predictors of query difficulty. Simple and efficient, the measures require no training data and are competitive with language model-based clarity scores. 1
Towards Robust Query Expansion: Model Selection In The Language Modeling Framework
, 2007
"... We propose a language-model-based approach for addressing the performance robustness problem — with respect to free-parameters’ values — of pseudo-feedback-based queryexpansion methods. Given a query, we create a set of language models representing different forms of its expansion by varying the par ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We propose a language-model-based approach for addressing the performance robustness problem — with respect to free-parameters’ values — of pseudo-feedback-based queryexpansion methods. Given a query, we create a set of language models representing different forms of its expansion by varying the parameters ’ values of some expansion method; then, we select a single model using criteria originally proposed for evaluating the performance of using the original query, or for deciding whether to employ expansion at all. Experimental results show that these criteria are highly effective in selecting relevance language models that are not only significantly more effective than poor performing ones, but that also yield performance that is almost indistinguishable from that of manually optimized relevance models.

