Results 1 - 10
of
23
Towards Recency Ranking in Web Search
"... In web search, recency ranking refers to ranking documents by relevance which takes freshness into account. In this paper, we propose a retrieval system which automatically detects and responds to recency sensitive queries. The system detects recency sensitive queries using a high precision classifi ..."
Abstract
-
Cited by 11 (7 self)
- Add to MetaCart
In web search, recency ranking refers to ranking documents by relevance which takes freshness into account. In this paper, we propose a retrieval system which automatically detects and responds to recency sensitive queries. The system detects recency sensitive queries using a high precision classifier. The system responds to recency sensitive queries by using a machine learned ranking model trained for such queries. We use multiple recency features to provide temporal evidence which effectively represents document recency. Furthermore, we propose several training methodologies important for training recency sensitive rankers. Finally, we develop new evaluation metrics for recency sensitive queries. Our experiments demonstrate the efficacy of the proposed approaches.
Mining multifaceted overviews of arbitrary topics in a text collection
- In Proc. SIGKDD’08
, 2008
"... A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work ha ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
A common task in many text mining applications is to generate a multi-faceted overview of a topic in a text collection. Such an overview not only directly serves as an informative summary of the topic, but also provides a detailed view of navigation to different facets of the topic. Existing work has cast this problem as a categorization problem and requires training examples for each facet. This has three limitations: (1) All facets are predefined, which may not fit the need of a particular user. (2) Training examples for each facet are often unavailable. (3) Such an approach only works for a predefined type of topics. In this paper, we break these limitations and study a more realistic new setup of the problem, in which we would allow a user to flexibly describe each facet with keywords for an arbitrary topic and attempt to mine a multi-faceted overview in an unsupervised way. We attempt a probabilistic approach to solve this problem. Empirical experiments on different genres of text data show that our approach can effectively generate a multi-faceted overview for arbitrary topics; the generated overviews are comparable with those generated by supervised methods with training examples. They are also more informative than unstructured flat summaries. The method is quite general, thus can be applied to multiple text mining tasks in different application domains.
Entropy-biased Models for Query Representation On The Click Graph
, 2009
"... Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-elim ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Query log analysis has received substantial attention in recent years, in which the click graph is an important technique for describing the relationship between queries and URLs. State-of-the-art approaches based on the raw click frequencies for modeling the click graph, however, are not noise-eliminated. Nor do they handle heterogeneous query-URL pairs well. In this paper, we investigate and develop a novel entropy-biased framework for modeling click graphs. The intuition behind this model is that various query-URL pairs should be treated differently, i.e., common clicks on less frequent but more specific URLs are of greater value than common clicks on frequent and general URLs. Based on this intuition, we utilize the entropy information of the URLs and introduce a new concept, namely the inverse query frequency (IQF), to weigh the importance (discriminative ability) of a click on a certain URL. The IQF weighting scheme is never explicitly explored or statistically examined for any bipartite graphs in the information retrieval literature. We not only formally define and quantify this scheme, but also incorporate it with the click frequency and user frequency information on the click graph for an effective query representation. To illustrate our methodology, we conduct experiments with the AOL query log data for query similarity analysis and query suggestion tasks. Experimental results demonstrate that considerable improvements in performance are obtained with our entropy-biased models.
Time is of the essence: improving recency ranking using twitter data
- In WWW
, 2010
"... Realtime web search refers to the retrieval of very fresh content which is in high demand. An effective portal web search engine must support a variety of search needs, including realtime web search. However, supporting realtime web search introduces two challenges not encountered in non-realtime we ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
Realtime web search refers to the retrieval of very fresh content which is in high demand. An effective portal web search engine must support a variety of search needs, including realtime web search. However, supporting realtime web search introduces two challenges not encountered in non-realtime web search: quickly crawling relevant content and ranking documents with impoverished link and click information. In this paper, we advocate the use of realtime micro-blogging data for addressing both of these problems. We propose a method to use the micro-blogging data stream to detect fresh URLs. We also use micro-blogging data to compute novel and effective features for ranking fresh URLs. We demonstrate these methods improve effective of the portal web search engine for realtime web search.
Learning Latent Semantic Relations from Clickthrough Data for Query Suggestion
, 2008
"... For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users ’ inputs, most of the suggestion algorithms suffer from the ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
For a given query raised by a specific user, the Query Suggestion technique aims to recommend relevant queries which potentially suit the information needs of that user. Due to the complexity of the Web structure and the ambiguity of users ’ inputs, most of the suggestion algorithms suffer from the problem of poor recommendation accuracy. In this paper, aiming at providing semantically relevant queries for users, we develop a novel, effective and efficient two-level query suggestion model by mining clickthrough data, in the form of two bipartite graphs (user-query and query-URL bipartite graphs) extracted from the clickthrough data. Based on this, we first propose a joint matrix factorization method which utilizes two bipartite graphs to learn the low-rank query latent feature space, and then build a query similarity graph based on the features. After that, we design an online ranking algorithm to propagate similarities on the query similarity graph, and finally recommend latent semantically relevant queries to users. Experimental analysis on the clickthrough data of a commercial search engine shows the effectiveness and the efficiency of our method.
Global Ranking by Exploiting User Clicks
"... It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which record ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
It is now widely recognized that user interactions with search results can provide substantial relevance information on the documents displayed in the search results. In this paper, we focus on extracting relevance information from one source of user interactions, i.e., user click data, which records the sequence of documents being clicked and not clicked in the result set during a user search session. We formulate the problem as a global ranking problem, emphasizing the importance of the sequential nature of user clicks, with the goal to predict the relevance labels of all the documents in a search session. This is distinct from conventional learning to rank methods that usually design a ranking model defined on a single document; in contrast, in our model the relational information among the documents as manifested by an aggregation of user clicks is exploited to rank all the documents jointly. In particular, we adapt several sequential supervised learning algorithms, including the conditional random field (CRF), the sliding window method and the recurrent sliding window method, to the global ranking problem. Experiments on the click data collected from a commercial search engine demonstrate that our methods can outperform the baseline models for search results re-ranking.
An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities
"... Effectively organizing web search results into clusters is important to facilitate quick user navigation to relevant documents. Previous methods may rely on a training process and do not provide a measure for whether page clustering is actually required. In this paper, we reformalize the clustering ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Effectively organizing web search results into clusters is important to facilitate quick user navigation to relevant documents. Previous methods may rely on a training process and do not provide a measure for whether page clustering is actually required. In this paper, we reformalize the clustering problem as a word sense discovery problem. Given a query and a list of result pages, our unsupervised method detects word sense communities in the extracted keyword network. The documents are assigned to several refined word sense communities to form clusters. We use the modularity score of the discovered keyword community structure to measure page clustering necessity. Experimental results verify our method’s feasibility and effectiveness. 1
Searching and Tagging: Two Sides of the Same Coin?
"... This paper presents the duality hypothesis of search and tagging, two important behaviors of web users. The hypothesis states that if a user views a document D in the search results for query Q, the user would tend to assign document D a tag identical to or similar to Q; similarly, if a user tags a ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents the duality hypothesis of search and tagging, two important behaviors of web users. The hypothesis states that if a user views a document D in the search results for query Q, the user would tend to assign document D a tag identical to or similar to Q; similarly, if a user tags a document D with a tag T, the user would tend to view document D if it is in the search results obtained using T as a query. We formalize this hypothesis with a unified probabilistic model for search and tagging, and show that empirical results of several tasks on search log and tag data sets, including ad hoc search, query suggestion, and query trend analysis, all support this duality hypothesis. Since the availability of search log is limited due to the privacy concern, our study opens up a highly promising direction of using tag data to approximate or supplement search log data for studying user behavior and improving search engine accuracy.
Beyond Hyperlinks: Organizing Information Footprints in Search Logs to Support Effective Browsing
"... While current search engines serve known-item search such as homepage finding very well, they generally cannot support exploratory search effectively. In exploratory search, users do not know their information needs precisely and also often lack the needed knowledge to formulate effective queries, t ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
While current search engines serve known-item search such as homepage finding very well, they generally cannot support exploratory search effectively. In exploratory search, users do not know their information needs precisely and also often lack the needed knowledge to formulate effective queries, thus querying alone, as supported by the current search engines, is insufficient, and browsing into related information would be very useful. Currently, browsing is mostly done by following hyperlinks embedded on Web pages. In this paper, we propose to leverage search logs to allow a user to browse beyond hyperlinks with a multi-resolution topic map constructed based on search logs. Specifically, we treat search logs as “footprints ” left by previous users in the information space and build a multi-resolution topic map to semantically capture and organize them in multiple granularities. Such a topic map can support a user to zoom in, zoom out, and navigate horizontally over the information space, and thus provide flexible and effective browsing capabilities for end users. To test the effectiveness of the proposed methods of supporting browsing, we rely on real search logs and a commercial search engine to implement our proposed methods. Our experimental results show that the proposed topic map is effective to support browsing beyond hyperlinks.
Query Result Clustering for Object-level Search ∗
"... Query result clustering has recently attracted a lot of attention to provide users with a succinct overview of relevant results. However, little work has been done on organizing the query results for object-level search. Object-level search result clustering is challenging because we need to support ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Query result clustering has recently attracted a lot of attention to provide users with a succinct overview of relevant results. However, little work has been done on organizing the query results for object-level search. Object-level search result clustering is challenging because we need to support diverse similarity notions over object-specific features (such as the price and weight of a product) of heterogeneous domains. To address this challenge, we propose a hybrid subspace clustering algorithm called Hydra. Algorithm Hydra captures the user perception of diverse similarity notions from millions of Web pages and disambiguates different senses using featurebased subspace locality measures. Our proposed solution, by combining wisdom of crowds and wisdom of data, achieves robustness and efficiency over existing approaches. We extensively evaluate our proposed framework and demonstrate how to enrich user experiences in object-level search using a real-world product search scenarios.

