Results 1 - 10
of
24
What Makes a Query Difficult?
, 2006
"... This work tries to answer the question of what makes a query difficult. It addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty. The three components of a topic are the textual expression describing the information ne ..."
Abstract
-
Cited by 71 (6 self)
- Add to MetaCart
This work tries to answer the question of what makes a query difficult. It addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty. The three components of a topic are the textual expression describing the information need (the query or queries), the set of documents relevant to the topic (the Qrels), and the entire collection of documents. We show experimentally that topic difficulty strongly depends on the distances between these components. In the absence of knowledge about one of the model components, the model is still useful by approximating the missing component based on the other components. We demonstrate the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.
Improvements that don’t add up: Ad-hoc retrieval results since
- Proc. CIKM
, 1998
"... The existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups and over time. Such comparisons, however, are rarely made. Most researchers only report results from their own experiments, a practice that allows lack of ..."
Abstract
-
Cited by 45 (3 self)
- Add to MetaCart
The existence and use of standard test collections in information retrieval experimentation allows results to be compared between research groups and over time. Such comparisons, however, are rarely made. Most researchers only report results from their own experiments, a practice that allows lack of overall improvement to go unnoticed. In this paper, we analyze results achieved on the TREC Ad-Hoc, Web, Terabyte, and Robust collections as reported in SIGIR (1998–2008) and CIKM (2004–2008). Dozens of individual published experiments report effectiveness improvements, and often claim statistical significance. However, there is little evidence of improvement in ad-hoc retrieval technology over the past decade. Baselines are generally weak, often being below the median original TREC system. And in only a handful of experiments is the score of the best TREC automatic run exceeded. Given this finding, we question the value of achieving even a statistically significant result over a weak baseline. We propose that the community adopt a practice of regular longitudinal comparison to ensure measurable progress, or at least prevent the lack of it from going unnoticed. We describe an online database of retrieval runs that facilitates such a practice.
Querying the Web: A Multiontology Disambiguation Method
, 2006
"... The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural La ..."
Abstract
-
Cited by 40 (11 self)
- Add to MetaCart
The lack of explicit semantics in the current Web can lead to ambiguity problems: for example, current search engines return unwanted information since they do not take into account the exact meaning given by user to the keywords used. Though disambiguation is a very well-known problem in Natural Language Processing and other domains, traditional methods are not flexible enough to work in a Webbased context. In this paper we have identified some desirable properties that a Web-oriented disambiguation method should fulfill, and make a proposal according to them. The proposed method processes a set of related keywords in order to discover and extract their implicit semantics, obtaining their most suitable senses according to their context. The possible senses are extracted from the knowledge represented by a pool of ontologies available in the Web. This method applies an iterative disambiguation algorithm that uses a semantic relatedness measure based on Google frequencies. Our proposal makes explicit the semantics of keywords by means of ontology terms; this information can be used for different purposes, such as improving the search and retrieval of underlying relevant information.
Clustering and Diversifying Web Search Results with Graph-Based Word Sense Induction
"... Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to repre ..."
Abstract
-
Cited by 21 (8 self)
- Add to MetaCart
(Show Context)
Web search result clustering aims to facilitate information search on the Web. Rather than presenting the results of a query as a flat list, these are grouped on the basis of their similarity and subsequently shown to the user as a list of possibly labeled clusters. Each cluster is supposed to represent a different meaning of the input query, thus taking into account the language ambiguity, i.e. polysemy, issue. However, Web clustering methods typically rely on some shallow notion of textual similarity of search result snippets. As a result, text snippets with no word in common tend to be clustered separately, even if they share the same meaning, whereas snippets with words in common may be grouped together even if they refer to different meanings of the input query. In this paper, we present a novel approach to Web search result clustering based on the automatic discovery of word senses from raw text, a task referred to as Word Sense Induction (WSI). Key to our approach is to first acquire the senses (i.e., meanings) of an ambiguous query and then cluster the search results based on their semantic similarity to the word senses induced. Our experiments, conducted on datasets of ambiguous queries, show that our approach outperforms both Web clustering and search engines. 1.
Dragon Toolkit: Incorporating Auto-learned Semantic Knowledge into Large-Scale Text Retrieval and Mining
- Proceedings of the 19 th IEEE International Conference on Tools with Artificial Intelligence (ICTAI
, 2007
"... The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that the extension with semantic knowledge could improve the results and various methods (most of them are heuristic) have be ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
(Show Context)
The majority of text retrieval and mining techniques are still based on exact feature (e.g. words) matching and unable to incorporate text semantics. Many researchers believe that the extension with semantic knowledge could improve the results and various methods (most of them are heuristic) have been proposed to account for concept hierarchy, synonymy, and other semantic relationships. However, the results with such semantic extension have been mixed, ranging from slight improvements to decreases in effectiveness, mostly likely due to the lack of a formal framework. Instead, we propose a novel method to address the semantic extension within the framework of language modeling. Our method extracts explicit topic signatures from documents and then statistically maps them into singleword features. The incorporation of semantic knowledge then reduces to the smoothing of unigram language models using semantic knowledge. The dragon toolkit reflects our method and its effectiveness is demonstrated by three tasks, text retrieval, text classification, and text clustering. 1.
Clustering Web Search Results with Maximum Spanning Trees
"... Abstract. We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves classical search result clustering methods in terms of both clustering quality and degree of diversification. 1
Polling the blogosphere: a rule-based approach to belief classification
- In International Conference on Weblogs and Social
, 2008
"... The research described here is part of a larger project with the objective of determining if a writer believes a proposition to be true or false. This task requires a deep understanding of a proposition’s semantic context, which is far beyond NLP’s state of the art. In light of this difficulty, this ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
The research described here is part of a larger project with the objective of determining if a writer believes a proposition to be true or false. This task requires a deep understanding of a proposition’s semantic context, which is far beyond NLP’s state of the art. In light of this difficulty, this paper presents a shallow semantic framework that addresses the sub-problem of finding a proposition’s truth-value at the sentence level. The framework consists of several classes of linguistic el-ements that, when linked to a proposition through specific lexico-syntactic connectors, change its truth-value. A pilot evaluation of a system implementing this framework yields promising results.
G.: IXA at CLEF 2008 Robust-WSD Task: using Word Sense Disambiguation for (Cross Lingual) Information Retrieval
, 2009
"... This paper describes the participation of the IXA NLP group at the CLEF 2008 Robust-WSD Task. This is our first time at CLEF, and we participated at both the monolingual (English) and the bilingual (Spanish to English) subtasks. We tried several query and document expansion and translation strategie ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
This paper describes the participation of the IXA NLP group at the CLEF 2008 Robust-WSD Task. This is our first time at CLEF, and we participated at both the monolingual (English) and the bilingual (Spanish to English) subtasks. We tried several query and document expansion and translation strategies, with and without the use of the word sense disambiguation results provided by the organizers. All expansions and translations were done using the English and Spanish wordnets as provided by the organizers and no other resource was used. We used Indri as the search engine, which we tuned in the training part. Our main goal was to improve (Cross Lingual) Information Retrieval results using WSD information, and we attained improvements in both mono and bilingual subtasks, although the improvement was only significant for the bilingual subtask. As a secondary goal, our best systems ranked 4th overall and 3rd overall in the monolingual and bilingual subtasks, respectively.
L.: A combined similarity measure for determining similarity of model-based and descriptive requirements
- In: Proceeding of the Artificial Intelligence Techniques in Software Engineering Workshop (AISEW) at the ECAI
, 2008
"... Abstract. RSL is a requirements specification language that was developed in the ReDSeeDS project. The language allows requirements specifications using a meta model that defines both modelbased and descriptive representations. Model-based representations provide activity and sequence diagrams simil ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Abstract. RSL is a requirements specification language that was developed in the ReDSeeDS project. The language allows requirements specifications using a meta model that defines both modelbased and descriptive representations. Model-based representations provide activity and sequence diagrams similar to UML while descriptive representations offer scenarios in constrained language and sentence lists. In this paper we tackle the problem of determining similarity of requirements that use both types of representations. We argue that in this case a combination of similarity measures is needed. In order to confirm this claim we assess similarity measures from different research areas with respect to their suitability for comparing requirements specifications written in RSL and suggest a concept for a combined similarity measure. 1
H.: Elhuyar-IXA: Semantic Relatedness and Cross-Lingual Passage Retrieval
- CLEF 2009 Workshop, Part I. LNCS
"... retrieval ..."
(Show Context)