Results 1  10
of
471
Authoritative Sources in a Hyperlinked Environment
 JOURNAL OF THE ACM
, 1999
"... The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and repo ..."
Abstract

Cited by 3632 (12 self)
 Add to MetaCart
(Show Context)
The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of contexts on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authoritative ” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages ” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristics for linkbased analysis.
Focused crawling: a new approach to topicspecific Web resource discovery
, 1999
"... The rapid growth of the WorldWide Web poses unprecedented scaling challenges for generalpurpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevan ..."
Abstract

Cited by 637 (10 self)
 Add to MetaCart
(Show Context)
The rapid growth of the WorldWide Web poses unprecedented scaling challenges for generalpurpose crawlers and search engines. In this paper we describe a new hypertext resource discovery system called a Focused Crawler. The goal of a focused crawler is to selectively seek out pages that are relevant to a predefined set of topics. The topics are specified not using keywords, but using exemplary documents. Rather than collecting and indexing all accessible Web documents to be able to answer all possible adhoc queries, a focused crawler analyzes its crawl boundary to find the links that are likely to be most relevant for the crawl, and avoids irrelevant regions of the Web. This leads to significant savings in hardware and network resources, and helps keep the crawl more uptodate. To achieve such goaldirected crawling, we designed two hypertext mining programs that guide our crawler: a classifier that evaluates the relevance of a hypertext document with respect to the focus topics, ...
A Faster Algorithm for Betweenness Centrality
 Journal of Mathematical Sociology
, 2001
"... The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require #(n ) time and #(n ) space, where n is the number of actors in the network. ..."
Abstract

Cited by 554 (5 self)
 Add to MetaCart
(Show Context)
The betweenness centrality index is essential in the analysis of social networks, but costly to compute. Currently, the fastest known algorithms require #(n ) time and #(n ) space, where n is the number of actors in the network.
TopicSensitive PageRank
, 2002
"... In the original PageRank algorithm for improving the ranking of searchquery results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search resu ..."
Abstract

Cited by 543 (10 self)
 Add to MetaCart
(Show Context)
In the original PageRank algorithm for improving the ranking of searchquery results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate queryspecific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topicsensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topicsensitive PageRank scores using the topic of the context in which the query appeared.
Rank Aggregation Methods for the Web
, 2010
"... We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building metasearch engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. Wed ..."
Abstract

Cited by 478 (6 self)
 Add to MetaCart
(Show Context)
We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building metasearch engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. Wedevelop a set of techniques for the rank aggregation problem and compare their performance to that of wellknown methods. A primary goal of our work is to design rank aggregation techniques that can effectively combat "spam," a serious problem in Web searches. Experiments show that our methods are simple, efficient, and effective.
The Web as a graph: measurements, models, and methods
, 1999
"... . The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, ..."
Abstract

Cited by 373 (11 self)
 Add to MetaCart
(Show Context)
. The pages and hyperlinks of the WorldWide Web may be viewed as nodes and edges in a directed graph. This graph is a fascinating object of study: it has several hundred million nodes today, over a billion links, and appears to grow exponentially with time. There are many reasons  mathematical, sociological, and commercial  for studying the evolution of this graph. In this paper we begin by describing two algorithms that operate on the Web graph, addressing problems from Web search and automatic community discovery. We then report a number of measurements and properties of this graph that manifested themselves as we ran these algorithms on the Web. Finally, we observe that traditional random graph models do not explain these observations, and we propose a new family of random graph models. These models point to a rich new subfield of the study of random graphs, and raise questions about the analysis of graph algorithms on the Web. 1 Overview Few events in the history of comput...
Trawling the Web for emerging cybercommunities
 Computer Networks
, 1999
"... Abstract: The web harbors a large number of communities groups of contentcreators sharing a common interest each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest ..."
Abstract

Cited by 369 (8 self)
 Add to MetaCart
Abstract: The web harbors a large number of communities groups of contentcreators sharing a common interest each of which manifests itself as a set of interlinked web pages. Newgroups and commercial web directories together contain of the order of 20000 such communities; our particular interest here is on emerging communities those that have little or no representation in such fora. The subject of this paper is the systematic enumeration of over 100,000 such emerging communities from a web crawl: we call our process trawling. We motivate a graphtheoretic approach to locating such communities, and describe the algorithms, and the algorithmic engineering necessary to find structures that subscribe to this notion, the challenges in handling such a huge data set, and the results of our experiment.
Graph structure in the web
, 2003
"... The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of ..."
Abstract

Cited by 304 (7 self)
 Add to MetaCart
The study of the web as a graph is not only fascinating in its own right, but also yields valuable insight into web algorithms for crawling, searching and community discovery, and the sociological phenomena which characterize its evolution. We report on experiments on local and global properties of the web graph using two Altavista crawls each with over 200 million pages and 1.5 billion links. Our study indicates that the macroscopic structure of the web is considerably more intricate than suggested by earlier experiments on a smaller scale.