Results 1 - 10
of
10
Selecting Good Expansion Terms for Pseudo-Relevance Feedback
"... Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality – many expansion terms identified in traditional approaches are indeed unrelated to the que ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
Pseudo-relevance feedback assumes that most frequent terms in the pseudo-feedback documents are useful for the retrieval. In this study, we re-examine this assumption and show that it does not hold in reality – many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. We also show that good expansion terms cannot be distinguished from bad ones merely on their distributions in the feedback documents and in the whole collection. We then propose to integrate a term classification process to predict the usefulness of expansion terms. Multiple additional features can be integrated in this process. Our experiments on three TREC collections show that retrieval effectiveness can be much improved when term classification is used. In addition, we also demonstrate that good terms should be identified directly according to their possible impact on the retrieval effectiveness, i.e. using supervised learning, instead of unsupervised learning.
Clickthrough-Based Translation Models for Web Search: from Word Models to Phrase Models
"... Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume t ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
Web search is challenging partly due to the fact that search queries and Web documents use different language styles and vocabularies. This paper provides a quantitative analysis of the language discrepancy issue, and explores the use of clickthrough data to bridge documents and queries. We assume that a query is parallel to the titles of documents clicked on for that query. Two translation models are trained and integrated into retrieval models: A word-based translation model that learns the translation probability between single words, and a phrase-based translation model that learns the translation probability between multi-term phrases. Experiments are carried out on a real world data set. The results show that the retrieval systems that use the translation models outperform significantly the systems that do not. The paper also demonstrates that standard statistical machine translation techniques such as word alignment, bilingual phrase extraction, and phrase-based decoding, can be adapted for building a better Web document retrieval system.
Web directories as topical context
- In Proceedings of the 9th Dutch-Belgian Workshop on Information Retrieval (DIR
, 2009
"... In this paper we explore whether the Open Directory (or DMOZ) can be used to classify queries into topical categories on different levels and whether we can use this topical context to improve retrieval performance. We have set up a user study to let test persons explicitly classify queries into top ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
In this paper we explore whether the Open Directory (or DMOZ) can be used to classify queries into topical categories on different levels and whether we can use this topical context to improve retrieval performance. We have set up a user study to let test persons explicitly classify queries into topical categories. Categories are either chosen freely from DMOZ, or from a list of suggestions created by several automatic topic categorization techniques. The results of this user study show that DMOZ categories are suitable for topic categorization. Either free search or evaluation of a list of suggestions can be used to elicit the topical context. Free search leads to more specific topic categories than the list of suggestions. Different test persons show moderate agreement between their individual judgments, but broadly agree on the initial levels of the chosen categories. When we use the topic categories selected by the free search as topical context, this leads to significant improvements over the baseline retrieval results. The more general topic categories selected from the suggestions list, and top level categories do not lead to significant improvements. 1.
Information theory
"... The main obstacle for providing focused search is the relative opaqueness of search request—searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can leads to more effective web information retr ..."
Abstract
- Add to MetaCart
The main obstacle for providing focused search is the relative opaqueness of search request—searchers tend to express their complex information needs in only a couple of keywords. Our overall aim is to find out if, and how, topic-based language models can leads to more effective web information retrieval. In this paper we explore retrieval performance of a topic-based model that combines topical models with other language models based on cross-entropy. We first define our topical categories and train our topical models on the.GOV2 corpus by building parsimonious language models. We then test the topic-based model on TREC8 small Web data collection for ad-hoc search. Our experimental results show that the topic-based model outperforms the standard language model and parsimonious model.
Finding Entities or Information Using Annotations
"... User-generated content often provides more information than just textual data, i.e. tags or annotations are added to label data, in discussions users can comment on the data, and links exist not only between pages, but also between users and annotations. In this paper we explore the use of annotatio ..."
Abstract
- Add to MetaCart
User-generated content often provides more information than just textual data, i.e. tags or annotations are added to label data, in discussions users can comment on the data, and links exist not only between pages, but also between users and annotations. In this paper we explore the use of annotations in the form of categories to find entities or information in Wikipedia. Main differences between entity ranking and ad hoc retrieval are the assessment criteria, and the provision of target categories for entity ranking topics. From analyzing the relevance assessment sets we can see that entity ranking
General
"... We describe a series of experiments conducted in our participation in the Relevance Feedback Track. We evaluate two traditional weighting models (BM25 and DFR) for the phase 1 task, which are widely used in text retrieval domain. We also evaluate a statistics-based feedback model and our proposed fe ..."
Abstract
- Add to MetaCart
We describe a series of experiments conducted in our participation in the Relevance Feedback Track. We evaluate two traditional weighting models (BM25 and DFR) for the phase 1 task, which are widely used in text retrieval domain. We also evaluate a statistics-based feedback model and our proposed feedback model for the phase 2 task. Currently, we are waiting for the overview paper to facilitate further analyses.
Explicit Extraction of Topical Context
"... This article studies one of the main bottlenecks in providing more effective information access: the poverty on the query end. We explore whether users can classify keyword queries into categories from the DMOZ directory on different levels and whether this topical context can help retrieval perform ..."
Abstract
- Add to MetaCart
This article studies one of the main bottlenecks in providing more effective information access: the poverty on the query end. We explore whether users can classify keyword queries into categories from the DMOZ directory on different levels and whether this topical context can help retrieval performance. We have conducted a user study to let participants classify queries into DMOZ categories, either by freely searching the directory or by selection from a list of suggestions. Results of the study show that DMOZ categories are suitable for topic categorization. Both free search and list selection can be used to elicit topical context. Free search leads to more specific categories than the list selections. Participants in our study show moderate agreement on the categories they select, but broad agreement on the higher levels of chosen categories. The free search categories significantly improve retrieval effectiveness. The more general list selection categories and the top-level categories do not lead to significant improvements. Combining topical context with blind relevance feedback leads to better results than applying either of them separately. We conclude that DMOZ is a suitable resource for interacting with users on topical categories applicable to their query, and can lead to better search results.
A Study of Ontology-based Query Expansion
"... Abstract. With enormous data emerging on the Web, traditional keyword searching is challenged by short queries posed by users to vaguely describe their information need. Query expansion has been researched for decades and a variety of expansion strategies have improved retrieval effectiveness. At pr ..."
Abstract
- Add to MetaCart
Abstract. With enormous data emerging on the Web, traditional keyword searching is challenged by short queries posed by users to vaguely describe their information need. Query expansion has been researched for decades and a variety of expansion strategies have improved retrieval effectiveness. At present, knowledge-based query expansion approaches are popular as the Web becomes more semantic. This paper studies state-of-the-art in ontologybased query expansion approaches, and expands on practical strategies to exploit the rich semantics of domain ontologies. This paper, one the one hand, focuses on finding out the success factors for ontology-based query expansion; on the other hand, it emphasizes the tradeoff between the gained retrieval effectiveness and the incurred computation cost. 1
Feature Article: Luepol Pipanmaekaporn and Yuefeng Li 17 Mining a Data Reasoning Model for Personalized Text Classification
"... Abstract—It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feed ..."
Abstract
- Add to MetaCart
Abstract—It is a big challenge to acquire correct user profiles for personalized text classification since users may be unsure in providing their interests. Traditional approaches to user profiling adopt machine learning (ML) to automatically discover classification knowledge from explicit user feedback in describing personal interests. However, the accuracy of ML-based methods cannot be significantly improved in many cases due to the term independence assumption and uncertainties associated with them. This paper presents a novel relevance feedback approach for personalized text classification. It basically applies data mining to discover knowledge from relevant and non-relevant text and constraints specific knowledge by reasoning rules to eliminate some conflicting information. We also developed a Dempster-Shafer (DS) approach as the means to utilise the specific knowledge to build high-quality data models for classification. The experimental results conducted on Reuters Corpus Volume 1 and TREC topics support that the proposed technique achieves encouraging performance in comparing with the state-of-the-art relevance feedback models. Index Terms—Personalized text Classification, User Profiles, Relevance Feedback, Reasoning Model, and Data Mining
Contextual Query Classification For Personalizing Informational Search
, 2009
"... It is widely assumed that exploiting multiple sources of evidence in information retrieval approaches allows improving the search accuracy. For instance personalized information retrieval exploits evidence issued from the user profile elements like interests, preferences and tasks in the information ..."
Abstract
- Add to MetaCart
It is widely assumed that exploiting multiple sources of evidence in information retrieval approaches allows improving the search accuracy. For instance personalized information retrieval exploits evidence issued from the user profile elements like interests, preferences and tasks in the information retrieval process in order to better fit the user’s expectations. Recent studies exploit another evidence like the user intent behind the query, classified as informational, navigational or transactional in order to carry out a specific search. However, the strategies involved focus solely on exploiting document features to leverage the relevance estimation according to the user intent category. In this paper, we show how to incorporate both user intent and user profile evidences, in the same framework, for personalizing the informational search. Our framework description includes user intent prediction based on contextual query classification method, and user profiling based on modeling the user interests. Query classification relies on using both query features and the current query session category called the query profile. Then, personalizing informational search emphasizes the user profile in a personalized document ranking. Preliminary experimental results were carried out using TREC data collection and show that our approach is promising.

