• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Grouper: A Dynamic Clustering Interface to Web Search Results. Computer Networks. (1999)

by O Zamir, O Etzioni
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 309
Next 10 →

An Introduction to Information Retrieval

by Christopher D. Manning, et al. , 2007
"... ..."
Abstract - Cited by 1857 (13 self) - Add to MetaCart
Abstract not found

From frequency to meaning : Vector space models of semantics

by Peter D. Turney, Patrick Pantel - Journal of Artificial Intelligence Research , 2010
"... Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are begi ..."
Abstract - Cited by 347 (3 self) - Add to MetaCart
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term–document, word–context, and pair–pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field. 1.
(Show Context)

Citation Context

...ersen, & Tukey, 1992; Pantel & Lin, 2002b) or they may have a hierarchical structure (groups of groups) (Zhao & Karypis, 2002); they may be non-overlapping (hard) (Croft, 1977) or overlapping (soft) (=-=Zamir & Etzioni, 1999-=-). Clustering algorithms also differ in how clusters are compared and abstracted. With single-link clustering, the similarity between two clusters is the maximum of the similarities between their memb...

Scaling Question Answering to the Web

by Cody C. T. Kwok, Oren Etzioni, Daniel S. Weld , 2001
"... The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as "who was the first American in space?" or "what is the second tallest mountain in the world?" Yet today's most advanced web search services (e. ..."
Abstract - Cited by 238 (16 self) - Add to MetaCart
The wealth of information on the web makes it an attractive resource for seeking quick answers to simple, factual questions such as "who was the first American in space?" or "what is the second tallest mountain in the world?" Yet today's most advanced web search services (e.g., Google and AskJeeves) make it surprisingly tedious to locate answers to such questions. In this paper, we extend question-answering techniques, first studied in the information retrieval literature, to the web and experimentally evaluate their performance. First we introduce MULDER, which we believe to be the first general-purpose, fully-automated question-answering system available on the web. Second, we describe MULDER's architecture, which relies on multiple search-engine queries, natural-language parsing, and a novel voting procedure to yield reliable answers coupled with high recall. Finally, we compare MULDER's performance to that of Google and AskJeeves on questions drawn from the TREC-8 question track. We find that MULDER's recall is more than a factor of three higher than that of AskJeeves. In addition, we find that Google requires 6.6 times as much user effort to achieve the same level of recall as MULDER. 1.

Bringing order to the Web: Automatically categorizing search results.

by Hao Chen , Susan Dumais - Proc. CHI , 2000
"... ABSTRACT We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-thefly. A user study compared our new category interface with the ..."
Abstract - Cited by 182 (2 self) - Add to MetaCart
ABSTRACT We developed a user interface that organizes Web search results into hierarchical categories. Text classification algorithms were used to automatically classify arbitrary search results into an existing category structure on-thefly. A user study compared our new category interface with the typical ranked list interface of search results. The study showed that the category interface is superior both in objective and subjective measures. Subjects liked the category interface much better than the list interface, and they were 50% faster at finding information that was organized into categories. Organizing search results allows users to focus on items in categories of interest rather than having to browse through all the results sequentially.
(Show Context)

Citation Context

...ally derived Web site structure. Chen et al.’s Cha-Cha system [4] also organized search results into automatically derived site structures using the shortest path from the root to the retrieved page. Manually-created systems are quite useful but require a lot of initial effort to create and are difficult to maintain. Automatically-derived structures often result in heterogeneous criteria for category membership and can be difficult to understand. A second way to organize documents is by clustering. Documents are organized into groups based their overall similarity to one another. Zamir et al. [19, 20] grouped Web search results using suffix tree clustering. Hearst et al. [7, 8] used the scatter/gather technique to organize and browse documents. One problem with organizing search results in this way is the time required for on-line clustering algorithms. Single-link and group-average methods typically take O(n2) time, while complete-link methods typically take O(n3), where n is the number of documents returned. Linear-time algorithms like k-means are more efficient being O(nkT), where k is the number of clusters and T the number of iterations. In addition, it is difficult to describe the re...

Topical Locality in the Web

by Brian D. Davison - In Proceedings of the 23rd Annual International Conference on Research and Development in Information Retrieval (SIGIR 2000 , 2000
"... Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empi ..."
Abstract - Cited by 156 (8 self) - Add to MetaCart
Most web pages are linked to others with related content. This idea, combined with another that says that text in, and possibly around, HTML anchors describe the pages to which they point, is the foundation for a usable WorldWide Web. In this paper, we examine to what extent these ideas hold by empirically testing whether topical locality mirrors spatial locality of pages on the Web. In particular, we find that the likelihood of linked pages having similar textual content to be high; the similarity of sibling pages increases when the links from the parent are close together; titles, descriptions, and anchor text represent at least part of the target page; and that anchor text may be a useful discriminator among unseen child pages. These results show the foundations necessary for the success of many web systems, including search engines, focused crawlers, linkage analyzers, and intelligent web agents.
(Show Context)

Citation Context

...g other engines. While these services may do nothing more than present the results they obtained for the client, they may want to attempt to rank the results or perform additional processing. Grouper =-=[33]-=-, for example, performs result clustering. While Inquirus [22] fetches all documents for analysis on full-text, a simpler version (perhaps with little available bandwidth) might decide to fetch only t...

Automatic identification of user goals in web search

by Uichin Lee, Zhenyu Liu, Junghoo Cho , 2004
"... There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whet ..."
Abstract - Cited by 149 (3 self) - Add to MetaCart
There have been recent interests in studying the “goal ” behind a user’s Web query, so that this goal can be used to improve the quality of a search engine’s results. Previous studies have mainly focused on using manual query-log investigation to identify Web query goals. In this paper we study whether and how we can automate this goal-identification process. We first present our results from a human subject study that strongly indicate the feasibility of automatic query-goal identification. We then propose two types of features for the goal-identification task: user-click behavior and anchor-link distribution. Our experimental evaluation shows that by combining these features we can correctly identify the goals for 90 % of the queries studied.
(Show Context)

Citation Context

... endeavor, there has been a recent interest in identifying the “goal” of a user during a search [6, 7, 8], so that the identified goal can be used to improve page ranking [2, 3, 7], result clustering =-=[9, 10, 11]-=- and answer presentation [12, 13]. In their seminal studies, Broder [6] and Rose and Levinson [8] have independently found that the goal of a user can be classified into at least two categories: navig...

A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering

by Paolo Ferragina, Antonio Gulli , 2005
"... In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current ..."
Abstract - Cited by 126 (5 self) - Add to MetaCart
In this paper we propose a hierarchical clustering engine, called SnakeT, that is able to organize on-the-fly the search results drawn from 16 commodity search engines into a hierarchy of labeled folders. The hierarchy o#ers a complementary view to the flat-ranked list of results returned by current search engines. Users can navigate through the hierarchy driven by their search needs. This is especially useful for informative, polysemous and poor queries.

Hierarchical clustering of WWW image search results using visual, textual and link analysis

by Deng Cai, Xiaofei He, Zhiwei Li, Wei-ying Ma, Ji-rong Wen - ACM Multimedia , 1016
"... We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users ’ browsing. In this paper, we propose a hierarchical clustering me ..."
Abstract - Cited by 93 (5 self) - Add to MetaCart
We consider the problem of clustering Web image search results. Generally, the image search results returned by an image search engine contain multiple topics. Organizing the results into different semantic clusters facilitates users ’ browsing. In this paper, we propose a hierarchical clustering method using visual, textual and link analysis. By using a vision-based page segmentation algorithm, a web page is partitioned into blocks, and the textual and link information of an image can be accurately extracted from the block containing that image. By using block-level link analysis techniques, an image graph can be constructed. We then apply spectral techniques to find a Euclidean embedding of the images which respects the graph structure. Thus for each image, we have three kinds of representations, i.e. visual feature based representation, textual feature based representation and graph based representation. Using spectral clustering techniques, we can cluster the search results into different semantic clusters. An image search example illustrates the potential of these techniques.
(Show Context)

Citation Context

...e user who is looking for dog Pluto. A possible solution to this problem is to cluster search results into different groups with different topics. Many works have been done on web text search [19][28]=-=[29]-=-. In this paper, we consider the problem of clustering image search results. In web image search, a good organization of the search results is as important as the search accuracy. In traditional Conte...

A survey on web clustering engines

by Claudio Carpineto, Stanislaw Osiński, Giovanni Romano, Dawid Weiss , 2009
"... Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preproces ..."
Abstract - Cited by 82 (7 self) - Add to MetaCart
Web clustering engines organize search results by topic, thus offering a complementary view to the flat-ranked list returned by conventional search engines. In this survey, we discuss the issues that must be addressed in the development of a Web clustering engine, including acquisition and preprocessing of search results, their clustering and visualization. Search results clustering, the core of the system, has specific requirements that cannot be addressed by classical clustering algorithms. We emphasize the role played by the quality of the cluster labels as opposed to optimizing only the clustering structure. We highlight the main characteristics of a number of existing Web clustering engines and also discuss how to evaluate their retrieval performance. Some directions for future research are finally presented.

Engineering a lightweight suffix array construction algorithm (Extended Abstract)

by Giovanni Manzini, Paolo Ferragina
"... In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matchi ..."
Abstract - Cited by 81 (3 self) - Add to MetaCart
In this paper we consider the problem of computing the suffix array of a text T [1, n]. This problem consists in sorting the suffixes of T in lexicographic order. The suffix array [16] (or pat array [9]) is a simple, easy to code, and elegant data structure used for several fundamental string matching problems involving both linguistic texts and biological data [4, 11]. Recently, the interest in this data structure has been revitalized by its use as a building block for three novel applications: (1) the Burrows-Wheeler compression algorithm [3], which is a provably [17] and practically [20] effective compression tool; (2) the construction of succinct [10, 19] and compressed [7, 8] indexes; the latter can store both the input text and its full-text index using roughly the same space used by traditional compressors for the text alone; and (3) algorithms for clustering and ranking the answers to user queries in web-search engines [22]. In all these applications the construction of the suffix array is the computational bottleneck both in time and space. This motivated our interest in designing yet another suffix array construction algorithm which is fast and "lightweight" in the sense that it uses small space...
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University