Results 1 - 10
of
4,349
Authoritative Sources in a Hyperlinked Environment
- JOURNAL OF THE ACM
, 1999
"... The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and repo ..."
Abstract
-
Cited by 3632 (12 self)
- Add to MetaCart
an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph
The Encyclopedia of Integer Sequences
"... This article gives a brief introduction to the On-Line Encyclopedia of Integer Sequences (or OEIS). The OEIS is a database of nearly 90,000 sequences of integers, arranged lexicographically. The entry for a sequence lists the initial terms (50 to 100, if available), a description, formulae, programs ..."
Abstract
-
Cited by 871 (14 self)
- Add to MetaCart
, programs to generate the sequence, references, links to relevant web pages, and other
Focused crawling: a new approach to topic-specific Web resource discovery
, 1999
"... The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevan ..."
Abstract
-
Cited by 637 (10 self)
- Add to MetaCart
The rapid growth of the World-Wide Web poses unprecedented scaling challenges for general-purpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages
Wrapper Induction for Information Extraction
, 1997
"... The Internet presents numerous sources of useful information---telephone directories, product catalogs, stock quotes, weather forecasts, etc. Recently, many systems have been built that automatically gather and manipulate such information on a user's behalf. However, these resources are usually ..."
Abstract
-
Cited by 624 (30 self)
- Add to MetaCart
are usually formatted for use by people (e.g., the relevant content is embedded in HTML pages), so extracting their content is difficult. Wrappers are often used for this purpose. A wrapper is a procedure for extracting a particular resource's content. Unfortunately, hand-coding wrappers is tedious. We
Inferring Web Communities from Link Topology
, 1998
"... The World Wide Web grows through a decentralized, almost anarchic process, and this has resulted in a large hyperlinked corpus without the kind of logical organization that can be built into more traditionally-created hypermedia. To extract meaningful structure under such circumstances, we develop a ..."
Abstract
-
Cited by 415 (4 self)
- Add to MetaCart
of central, "authoritative" pages linked together by "hub pages"; and they exhibit a natural type of hierarchical topic generalization that can be inferred directly from the pattern of linkage. Our investigation shows that although the process by which users of the Web create pages
WebWatcher: A Tour Guide for the World Wide Web
- PROCEEDINGS OF IJCAI97
, 1997
"... We explore the notion of a tour guide software agent for assisting users browsing the World Wide Web. A Web tour guide agent provides assistance similar to that provided by ahuman tour guide in a museum -- it guides the user along an appropriate path through the collection, based on its knowledge of ..."
Abstract
-
Cited by 359 (8 self)
- Add to MetaCart
of the user's interests, of the location and relevance of various items in the collection, and of the way in which others have interacted with the collection in the past. This paper describes a simple but operational tour guide, called Web-Watcher, which has given over 5000 tours to people browsing CMU
Learning to Probabilistically Identify Authoritative Documents
- In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical assumpt ..."
Abstract
-
Cited by 156 (2 self)
- Add to MetaCart
We describe a model of document citation that learns to identify hubs and authorities in a set of linked documents, such as pages retrieved from the world wide web, or papers retrieved from a research paper archive. Unlike the popular HITS algorithm, which relies on dubious statistical
Personalizing search via automated analysis of interests and activities
, 2005
"... We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that ..."
Abstract
-
Cited by 303 (29 self)
- Add to MetaCart
that leverage implicit information about the user’s interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages
REAL LIFE, REAL USERS, AND REAL NEEDS: A STUDY AND ANALYSIS OF USER QUERIES ON THE WEB
, 2000
"... We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions- changes in queries during a session, number of pages viewed, and use of relevance feedback, (ii) queries- the number of search terms, and the u ..."
Abstract
-
Cited by 286 (25 self)
- Add to MetaCart
We analyzed transaction logs containing 51,473 queries posed by 18,113 users of Excite, a major Internet search service. We provide data on: (i) sessions- changes in queries during a session, number of pages viewed, and use of relevance feedback, (ii) queries- the number of search terms
Silk from a Sow's Ear: Extracting Usable Structures from the Web
, 1996
"... In its current implementation, the World-Wide Web lacks much of the explicit structure and strong typing found in many closed hypertext systems. While this property has directly fueled the explosive acceptance of the Web, it further complicates the already difficult problem of identifying usable str ..."
Abstract
-
Cited by 270 (9 self)
- Add to MetaCart
of the Xerox's WWW space. Linear equations and spreading activation models are employed to arrange Web pages based upon functional categories, node types, and relevancy. Keywords Information Visualization, World Wide Web, Hypertext.
Results 1 - 10
of
4,349