Results 1 -
6 of
6
HyPursuit: A hierarchical network search engine that exploits content-link hypertext clustering
- PROCEEDINGS OF THE SEVENTH ACM CONFERENCE ON HYPERTEXT
, 1996
"... HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPurs ..."
Abstract
-
Cited by 88 (2 self)
- Add to MetaCart
HyPursuit is a new hierarchical network search engine that clusters hypertext documents to structure a given information space for browsing and search activities. Our content-link clustering algorithm is based on the semantic information embedded in hyperlink structures and document contents. HyPursuit admits multiple, coexisting cluster hierarchies based on different principles for grouping documents, such as the Library of Congress catalog scheme and automatically created hypertext clusters. HyPursuit's abstraction functions summarize cluster contents to support scalable query processing. The abstraction functions satisfy system resource limitations with controlled information loss. The result of query processing operations on a cluster summary approximates the result of performing the operations on the entire information space. We constructed a prototype system comprising 100 leaf World Wide Web sites and a hierarchy of 42 servers that route queries to the leaf sites. Experience with our system suggests that abstraction functions based on hypertext clustering can be used to construct meaningful and scalable cluster hierarchies. We are also encouraged by preliminary results on clustering based on both document contents and hyperlink structures.
Discover: A Resource Discovery System based on Content Routing
- IN PROC. 3RD INTERNATIONAL WORLD WIDE WEB CONFERENCE
, 1995
"... We have built an HTTP based resource discovery system called Discover that provides a single point ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
We have built an HTTP based resource discovery system called Discover that provides a single point
Content Routing: A Scalable Architecture for Network-Based Information Discovery
, 1996
"... This thesis presents a new architecture for information discovery based on a hierarchy of content routers that provide both browsing and search services to end users. Content routers catalog information servers, which may in turn be other content routers. The resulting hierarchy of content routers a ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This thesis presents a new architecture for information discovery based on a hierarchy of content routers that provide both browsing and search services to end users. Content routers catalog information servers, which may in turn be other content routers. The resulting hierarchy of content routers and leaf servers provides a rich set of services to end users for locating information, including query refinement and query routing. Query refinement helps a user improve a query fragment to describe the user's interests more precisely. Once a query has been refined and describes a manageable result set, query routing automatically forwards the query to relevant servers. These services make use of succinct descriptions of server contents called content labels. A unique contribution of this research is the demonstration of a scalable discovery architecture based on a hierarchical approach to routing.
Content routing for distributed information servers
- In Proceedings of the 4th International Conference on Extending Database Technology
, 1994
"... 1 Introduction The Internet contains over one million hosts that provide file service and other information servers specializing in topics such as news, technical reports, biology, geography, and politics. The Internet's vast collection of servers can be viewed as a distributed database containing a ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
1 Introduction The Internet contains over one million hosts that provide file service and other information servers specializing in topics such as news, technical reports, biology, geography, and politics. The Internet's vast collection of servers can be viewed as a distributed database containing a wealth of information. Unfortunately, this information is relatively inaccessible because there is no mechanism for browsing and searching it associatively. The difficulty of providing associative access to a large number of distributed information servers lies primarily in problems of scale. The scale of the Internet, which is today only a fraction of its eventual size, is so great as to render infeasible any comprehensive indexing plan based on a single global index. In addition, the cost of distributing a query throughout the Internet is prohibitive. Finally, in very large scale systems, the number of results to a typical user query is incomprehensibly large. For these reasons, we expect that efficient associative access will require both content routing and query refinement. Content routing is the process of directing user queries to appropriate servers. Query refinement helps a user formulate meaningful queries. These query services can be implemented by a content routing system that will:
Pricing Internet: The New Zealand Experience
, 1994
"... We would like to thank the Computing Centre managers at the Universities of Auckland, ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
We would like to thank the Computing Centre managers at the Universities of Auckland,
Indexing the Internet
, 1995
"... Navigation around information resources on the Internet is assisted by browsing software such as the gopher, lynx, Mosaic and Netscape. The variety and complexity of resources has meant there have been calls for metainformation so that there are resident on the network, tools for guidance to the res ..."
Abstract
- Add to MetaCart
Navigation around information resources on the Internet is assisted by browsing software such as the gopher, lynx, Mosaic and Netscape. The variety and complexity of resources has meant there have been calls for metainformation so that there are resident on the network, tools for guidance to the resources using classified or index term approaches. Searching software used in association with the browsing software has functionality that may be compared with that of vendor-operated information retrieval systems. Although this functionality is relatively limited, when used with HTML documents, relevance feedback assists retrieval via associative mechanisms. Documents published electronically using SGML standards are able to contain their own surrogates and therefore be self-indexed and self-classified by the authors, however because the author view does not necessarily represent the user view, it is still necessary to have user-oriented retrieval aids. Menu-linked approaches via gophers or via home pages constructed for WWW browsing software mean that users can superimpose their own classified view of resources. Therefore, rather than attempting an overall classification of what is on the Internet, multiple classification systems may be constructed, reflecting the orientations of the different disciplines of users. A model is suggested for construction of such classified approaches with reference to the way they may use existing retrieval systems, automatic indexing, and thesauri that have been manually or automatically constructed. 1

