Results 1 - 10
of
24
Cluster-based retrieval using language models
- In Proceedings of SIGIR
, 2004
"... Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. ..."
Abstract
-
Cited by 90 (6 self)
- Add to MetaCart
Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human.
Pagerank without hyperlinks: structural re-ranking using links induced by language models
- In Proceedings of SIGIR
, 2005
"... Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider gener ..."
Abstract
-
Cited by 66 (10 self)
- Add to MetaCart
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.
Respect My Authority! HITS Without Hyperlinks, Utilizing Cluster-Based Language Models
, 2006
"... We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on t ..."
Abstract
-
Cited by 33 (9 self)
- Add to MetaCart
We present an approach to improving the precision of an initial document ranking wherein we utilize cluster information within a graph-based framework. The main idea is to perform re-ranking based on centrality within bipartite graphs of documents (on one side) and clusters (on the other side), on the premise that these are mutually reinforcing entities. Links between entities are created via consideration of language models induced from them. We find that our cluster-document graphs give rise to much better retrieval performance than previously proposed document-only graphs do. For example, authority-based re-ranking of documents via a HITS-style cluster-based approach outperforms a previously-proposed PageRank-inspired algorithm applied to solely-document graphs. Moreover, we also show that computing authority scores for clusters constitutes an effective method for identifying clusters containing a large percentage of relevant documents.
Using Part-of-speech Patterns to Reduce Query Ambiguity
"... Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one technique that finds commonly occurring patterns of parts of speech near a one-word query and allows them to be transformed in ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
Query ambiguity is a generally recognized problem, particularly in Web environments where queries are commonly only one or two words in length. In this study, we explore one technique that finds commonly occurring patterns of parts of speech near a one-word query and allows them to be transformed into clarification questions. We use a technique derived from statistical language modeling to show that the clarification queries will reduce ambiguity much of the time, and often quite substantially.
Detecting and Browsing Events in Unstructured Text
, 2002
"... Previews and overviews of large, heterogeneous information resources help users comprehend the scope of collections and focus on particular subsets of interest. For narrative documents, questions of "what happened? where? and when?" are natural points of entry. Building on our earlier work at the Pe ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
Previews and overviews of large, heterogeneous information resources help users comprehend the scope of collections and focus on particular subsets of interest. For narrative documents, questions of "what happened? where? and when?" are natural points of entry. Building on our earlier work at the Perseus Project with detecting terms, place names, and dates, we have exploited co-occurrences of dates and place names to detect and describe likely events in document collections. We compare statistical measures for determining the relative significance of various events. We have built interfaces that help users preview likely regions of interest for a given range of space and time by plotting the distribution and relevance of various collocations. Users can also control the amount of collocation information in each view. Once particular collocations are selected, the system can identify key phrases associated with each possible event to organize browsing of the documents themselves.
A Search Result Clustering Method using Informatively Named Entities
- In Proceeding of the 7th ACM International Workshop on Web Information and Data Management (WIDM
, 2005
"... Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To real ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
Clustering the results of a search helps the user to overview the information returned. In this paper, we regard the clustering task as indexing the search results. Here, an index means a structured label list that can makes it easier for the user to comprehend the labels and search results. To realize this goal, we make three proposals. First is to use Named Entity Extraction for term extraction. Second is a new label selecting criterion based on importance in the search result and the relation between terms and search queries. The third is label categorization using category information of labels, which is generated by NE extraction. We implement a prototype system based on these proposals and find that it offers much higher performance than existing methods; we focus on news articles in this paper.
Interactive information retrieval using clustering and spatial proximity. User Modeling and User-Adapted Interaction
- Interaction
, 2004
"... Abstract. A web-based search engine responds to a user’s query with a list of documents. This list can be viewed as the engine’s model of the user’s idea of relevanceöthe engine ‘believes’ that the ¢rst document is the most likely to be relevant, the second is slightly less likely, and so on. We ext ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
Abstract. A web-based search engine responds to a user’s query with a list of documents. This list can be viewed as the engine’s model of the user’s idea of relevanceöthe engine ‘believes’ that the ¢rst document is the most likely to be relevant, the second is slightly less likely, and so on. We extend this idea to an interactive setting where the system accepts the user’s feedback and adjusts its relevance model. We develop three speci¢c models that are integrated as part of a system we call Lighthouse. The models incorporate document clustering and a spring-embedding visualization of inter-document similarity. We show that if a searcher were to use Lighthouse in ways consistent with the model, the expected e¡ectiveness improvesö i.e., the relevant documents are found more quickly in comparison to existing methods.
The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters
, 2008
"... Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage o ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model also exploits information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed; the performance also favorably compares with that of a state-of-the-art pseudo-feedback retrieval method.
Organizing literature information for clinical decision support
- Proceedings of 11th World Congress on Medical Informatics (MEDINFO) 2004 Imaging Informatics. September 7-11 2004
"... Answers to clinical questions occurring during healthcare practitioner/patient interaction can be often found in National Library of Medicine’s (NLM) databases. The recent advances in wireless handheld computers promise to make them a widely used tool to deliver needed information to the practitione ..."
Abstract
-
Cited by 8 (5 self)
- Add to MetaCart
Answers to clinical questions occurring during healthcare practitioner/patient interaction can be often found in National Library of Medicine’s (NLM) databases. The recent advances in wireless handheld computers promise to make them a widely used tool to deliver needed information to the practitioner at the point of service. This paper addresses challenges in organizing and presenting information obtained from NLM’s MEDLINE ® database of indexed citations in a way that will help practitioners reduce literature search time on handheld computers. We study two clustering algorithms and two methods of labeling document clusters. Keywords: Handheld Computer; Computer-Assisted Decision Making.

