Results 1 - 10
of
195
Automatic identification of user goals in web search
, 2004
"... There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whet ..."
Abstract
-
Cited by 149 (3 self)
- Add to MetaCart
There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whether and how we can automate this goal-identification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic query-goal identification. We then propose two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90 % of the queries studied.
A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering
, 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract
-
Cited by 126 (5 self)
- Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.
A survey on web clustering engines
, 2009
"... Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preproces ..."
Abstract
-
Cited by 82 (7 self)
- Add to MetaCart
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
- in WWW
, 2008
"... This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from largescale data collections. The main motivation of this work is that many classification tasks working with short segments of ..."
Abstract
-
Cited by 78 (2 self)
- Add to MetaCart
(Show Context)
This paper presents a general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from largescale data collections. The main motivation of this work is that many classification tasks working with short segments of text & Web, such as search snippets, forum & chat messages, blog & news feeds, product reviews, and book & movie summaries, fail to achieve high accuracy due to the data sparseness. We, therefore, come up with an idea of gaining external knowledge to make the data more related as well as expand the coverage of classifiers to handle future data better. The underlying idea of the framework is that for each classification task, we collect a large-scale external data collection called “universal dataset”, and then build a classifier on both a (small) set of labeled training data and a rich set of hidden topics discovered from that data collection. The framework is general enough to be applied to different data domains and genres ranging from Web search results to medical text. We did a careful evaluation on several hundred megabytes of Wikipedia (30M words) and MEDLINE (18M words) with two tasks: “Web search domain disambiguation” and “disease categorization for medical text”, and achieved significant quality enhancement.
Exploiting query reformulations for web search result diversification
- In Proceedings of WWW
, 2010
"... When aWeb user’s underlying information need is not clearly specified from the initial query, an effective approach is to di-versify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search re-sult diversification, which explicitly accounts for ..."
Abstract
-
Cited by 78 (22 self)
- Add to MetaCart
(Show Context)
When aWeb user’s underlying information need is not clearly specified from the initial query, an effective approach is to di-versify the results retrieved for this query. In this paper, we introduce a novel probabilistic framework for Web search re-sult diversification, which explicitly accounts for the various aspects associated to an underspecified query. In particu-lar, we diversify a document ranking by estimating how well a given document satisfies each uncovered aspect and the extent to which different aspects are satisfied by the rank-ing as a whole. We thoroughly evaluate our framework in the context of the diversity task of the TREC 2009 Web track. Moreover, we exploit query reformulations provided by three major Web search engines (WSEs) as a means to uncover different query aspects. The results attest the effec-tiveness of our framework when compared to state-of-the-art diversification approaches in the literature. Additionally, by simulating an upper-bound query reformulation mechanism from official TREC data, we draw useful insights regarding the effectiveness of the query reformulations generated by the different WSEs in promoting diversity.
Learn from web search logs to organize search results
- In SIGIR
, 2007
"... Effective organization of search results is critical for improving the utility of any search engine. Clustering search results is an effective way to organize search results, which allows a user to navigate into relevant documents quickly. However, two deficiencies of this approach make it not alway ..."
Abstract
-
Cited by 69 (8 self)
- Add to MetaCart
(Show Context)
Effective organization of search results is critical for improving the utility of any search engine. Clustering search results is an effective way to organize search results, which allows a user to navigate into relevant documents quickly. However, two deficiencies of this approach make it not always work well: (1) the clusters discovered do not necessarily correspond to the interesting aspects of a topic from the user’s perspective; and (2) the cluster labels generated are not informative enough to allow a user to identify the right cluster. In this paper, we propose to address these two deficiencies by (1) learning “interesting aspects ” of a topic from Web search logs and organizing search results accordingly; and (2) generating more meaningful cluster labels using past query words entered by users. We evaluate our proposed method on a commercial search engine log data. Compared with the traditional methods of clustering search results, our method can give better result organization and more meaningful labels.
Towards effective browsing of large scale social annotations
- In WWW ’07: Proceedings of the 16th international conference on World Wide Web, 943–952
, 2007
"... This paper is concerned with the problem of browsing social annotations. Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. Due to the exponential increasing of the social annotatio ..."
Abstract
-
Cited by 55 (3 self)
- Add to MetaCart
(Show Context)
This paper is concerned with the problem of browsing social annotations. Today, a lot of services (e.g., Del.icio.us, Filckr) have been provided for helping users to manage and share their favorite URLs and photos based on social annotations. Due to the exponential increasing of the social annotations, more and more users, however, are facing the problem how to effectively find desired resources from large annotation data. Existing methods such as tag cloud and annotation matching work well only on small annotation sets. Thus, an effective approach for browsing large scale annotation sets and the associated resources is in great demand by both ordinary users and service providers. In this paper, we propose a novel algorithm, namely Effective Large Scale Annotation Browser (ELSABer), to browse large-scale social annotation data. ELSABer helps the users browse huge number of annotations in a semantic, hierarchical and efficient way. More specifically, ELSABer has the following features: 1) the semantic relations between annotations are explored for browsing of similar resources; 2) the hierarchical relations between annotations are constructed for browsing in a top-down fashion; 3) the distribution of social annotations is studied for efficient browsing. By incorporating the personal and time information, ELSABer can be further extended for personalized and time-related browsing. A prototype system is implemented and shows promising results.
Clustering the tagged web
- In ACM WSDM ’09
, 2009
"... Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
(Show Context)
Automatically clustering web pages into semantic groups promises improved search and browsing on the web. In this paper, we demonstrate how user-generated tags from largescale social bookmarking websites such as del.icio.us can be used as a complementary data source to page text and anchor text for improving automatic clustering of web pages. This paper explores the use of tags in 1) K-means clustering in an extended vector space model that includes tags as well as page text and 2) a novel generative clustering algorithm based on latent Dirichlet allocation that jointly models text and tags. We evaluate the models by comparing their output to an established web directory. We find that the naive inclusion of tagging data improves cluster quality versus page text alone, but a more principled inclusion can substantially improve the quality of all models with a statistically significant absolute F-score increase of 4%. The generative model outperforms K-means with another 8 % F-score increase.
Facetmap: A scalable search and browse visualization
- IEEE Transactions on Visualization and Computer Graphics
"... Abstract — The dominant paradigm for searching and browsing large data stores is text-based: presenting a scrollable list of search results in response to textual search term input. While this works well for the Web, there is opportunity for improvement in the domain of personal information stores, ..."
Abstract
-
Cited by 48 (3 self)
- Add to MetaCart
(Show Context)
Abstract — The dominant paradigm for searching and browsing large data stores is text-based: presenting a scrollable list of search results in response to textual search term input. While this works well for the Web, there is opportunity for improvement in the domain of personal information stores, which tend to have more heterogeneous data and richer metadata. In this paper, we introduce FacetMap, an interactive, query-driven visualization, generalizable to a wide range of metadata-rich data stores. FacetMap uses a visual metaphor for both input (selection of metadata facets as filters) and output. Results of a user study provide insight into tradeoffs between FacetMap’s graphical approach and the traditional text-oriented approach. Index Terms — Graphical visualization, interactive information retrieval, faceted metadata 1
Exploiting internal and external semantics for the clustering of short texts using world knowledge
- In Proceeding of the 18th ACM conference on Information and knowledge management
, 2009
"... Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term co-occurrence information, traditional text representatio ..."
Abstract
-
Cited by 43 (11 self)
- Add to MetaCart
(Show Context)
Clustering of short texts, such as snippets, presents great challenges in existing aggregated search techniques due to the problem of data sparseness and the complex semantics of natural language. As short texts do not provide sufficient term co-occurrence information, traditional text representation methods, such as “bag of words ” model, have several limitations when directly applied to short text tasks. In this paper, we propose a novel framework to improve the performance of short text clustering by exploiting the internal semantics from the original text and external concepts from world knowledge. The proposed method employs a hierarchical three-level structure to tackle the data sparsity problem of original short texts and reconstruct the corresponding feature space with the integration of multiple semantic knowledge bases – Wikipedia and WordNet. Empirical evaluation with Reuters and real web dataset demonstrates that our approach is able to achieve significant improvement as compared to the state-of-the-art methods.