## Authoritative Sources in a Hyperlinked Environment (1999)

Venue: | JOURNAL OF THE ACM |

Citations: | 3631 - 12 self |

### Citations

4669 | The anatomy of a large-scale hypertextual Web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...tionless” version of the www link structure. Both of these approaches are based principally on counting node degrees, parallel to the structure of Garfield’s impact factor. In contrast, Brin and Page =-=[8]-=- have recently proposed a ranking measure based on a node-to-node weight-propagation scheme and its analysis via eigenvectors. Specifically, they begin from a model of a user randomly following hyperl... |

3774 | Indexing by Latent Semantic Analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...s been applied to citation data by Noma [43], for jointly clustering citing and cited documents. In the context of information retrieval, the Latent Semantic Indexing methodology of Deerwester et al. =-=[16]-=- applied a centroid scaling approach to a vectorspace model of documents [48, 49]; this allowed them to represent terms and documents in a common low-dimensional space, in which natural geometrically ... |

3408 |
Principal Component Analysis
- Jolliffe
- 1986
(Show Context)
Citation Context ...y have a positive co-citation value. Pitkow and Pirolli [47] apply this algorithm to study the link-based relationships among a collection of www pages. One can also use principal components analysis =-=[31, 34]-=- and related dimension-reduction techniques such as multidimensional scaling to cluster a collection of nodes. In this framework, one begins with a matrix M containing the similarity information betwe... |

1537 |
Spectral Graph Theory
- Chung
- 1996
(Show Context)
Citation Context ...es to clustering that have been applied to link structures. The area of spectral graph partitioning was initiated by the work of Donath and Hoffman [18] and Fiedler [23]; see the recent book by Chung =-=[12]-=- for an overview. Spectral graph partitioning methods relate sparsely connected partitions 19 of an undirected graph G to the eigenvalues and eigenvectors of its adjacency matrix A. Each eigenvector o... |

958 |
Analysis of a complex of statistical variables into principal components
- Hotelling
- 1933
(Show Context)
Citation Context ...y have a positive co-citation value. Pitkow and Pirolli [47] apply this algorithm to study the link-based relationships among a collection of www pages. One can also use principal components analysis =-=[31, 34]-=- and related dimension-reduction techniques such as multidimensional scaling to cluster a collection of nodes. In this framework, one begins with a matrix M containing the similarity information betwe... |

799 |
Automatic Text Processing
- Salton
- 1988
(Show Context)
Citation Context ...nd cited documents. In the context of information retrieval, the Latent Semantic Indexing methodology of Deerwester et al. [16] applied a centroid scaling approach to a vectorspace model of documents =-=[48, 49]-=-; this allowed them to represent terms and documents in a common low-dimensional space, in which natural geometrically defined clusters often separate multiple senses of a query term. 6 Multiple Sets ... |

777 | Scatter/gather: A cluster-based approach to browsing large document collections.
- Cutting, Karger, et al.
- 1992
(Show Context)
Citation Context ...a collection of large clusters that we never explicitly represent. At a very high level, our motivation in this sense is analogous to that of an information retrieval technique such as Scatter/Gather =-=[14]-=-, which seeks to represent very large document clusters through text-based methods. In section 3, we related the hubs and authorities we computed to the principal eigenvectors of the matrices AT A and... |

471 | Improved algorithms for topic distillation in a hyperlinked environment.
- Bharat, Henzinger
- 1998
(Show Context)
Citation Context ... of this phenomenon is the subject of on-going work; in Sections 8 and 9, we briefly discuss current work on the use of textual content for the purpose of focusing our approach to link-based analysis =-=[6, 10, 11]-=-. The use of non-principal eigenvectors, combined with basic term-matching, can be a simple way to extract collections of authoritative pages that are more relevant to a specific query topic. For exam... |

438 |
Co-citation in the scientific literature: a new measure of the relationship between two documents.
- Small
- 1973
(Show Context)
Citation Context ...ers from this similarity function. Two basic similarity functions on documents to emerge from the study of bibliometrics are bibliographic coupling (due to Kessler [36]) and co-citation (due to Small =-=[52]-=-). For a pair of documents p and q,the former quantity is equal to the number of documents cited by both p and q, and the latter quantity is the number of documents that cite both p and q. Co-citation... |

430 |
A new status index derived from sociometric analysis
- Katz
- 1953
(Show Context)
Citation Context ... weights, corresponding to the strength of different endorsements; let A denote the matrix whose (i, j) th entry represents the strength of the endorsement from a node i ∈ V to a node j ∈ V . 14Katz =-=[35]-=- proposed a measure of standing based on path-counting, a generalization of ranking based on in-degree. For nodes i and j, letP 〈r〉 ij denote the number of paths of length exactly r from i to j. Let b... |

415 | Inferring Web Communities from Link Topology
- Gibson, Kleinberg, et al.
- 1998
(Show Context)
Citation Context ...linkage information contained in the AltaVista index. With Gibson and Raghavan, we have used the algorithms described here to explore the structure of “communities” of hubs and authorities on the www =-=[28]-=-. We find that the notion of topic generalization discussed in Section 7 provides one valuable perspective from which to view the overlapping organization of such communities. In a separate direction,... |

334 | The WorldWide Web. - Berners-Lee, Cailliau, et al. - 1994 |

322 | S.: Latent semantic indexing: A probabilistic analysis
- Papadimitriou, Raghavan, et al.
- 1998
(Show Context)
Citation Context ...erstanding of a range of link-based algorithms. Some work of this type has been undertaken in the context of the latent semantic indexing technique in information retrieval [16]: Papadimitriou et al. =-=[44]-=- have provided a theoretical analysis of latent semantic indexing applied to a basic probabilistic model of term use in documents. In another direction, motivated in part by our work here, Frieze, Kan... |

316 | Automatic resource compilation by analyzing hyperlink structure and associated text, in:
- Chakrabarti, Dom, et al.
- 1998
(Show Context)
Citation Context ...of the page being pointed to when assessing the relevance of that page. The use of anchor text appeared in one of the oldest www search engines, McBryan’s World Wide Web Worm [40]; it is also used in =-=[8, 11, 10]-=-. Another direction of work on the integration of links into www search is the construction of search formalisms capable of handling queries that involve predicates over both text and links. Arocena, ... |

270 | Silk from a sow"s ear: extracting usable structures from the web [on-line]. Available:
- PIROLLI, PITKOW, et al.
- 1996
(Show Context)
Citation Context ...e rely on combinations of textual and link-based information. Combinations of such measures have been studied by Shaw [50, 51] in the context of bibliometrics. More recently, Pirolli, Pitkow, and Rao =-=[46]-=- have used a combination of link topology and textual similarity to group together and categorize pages on the www. Finally, we discuss two other general eigenvector-based approaches to clustering tha... |

267 |
Bibliographic coupling between scientific papers.
- Kessler
- 1963
(Show Context)
Citation Context ...ts, and a method for producing clusters from this similarity function. Two basic similarity functions on documents to emerge from the study of bibliometrics are bibliographic coupling (due to Kessler =-=[36]-=-) and co-citation (due to Small [52]). For a pair of documents p and q,the former quantity is equal to the number of documents cited by both p and q, and the latter quantity is the number of documents... |

261 |
Citation analysis as a tool in journal evaluation.
- Garfield
- 1972
(Show Context)
Citation Context ...y are concerned with evaluating standing in a particular type of social network — that of papers or journals linked by citations. The most well-known measure in this field is Garfield’s impact factor =-=[26]-=-, used to provide a numerical assessment of journals in Journal Citation Reports of the Institute for Scientific Information. Under the standard definition, the impact factor of a journal j in a given... |

237 | Fast Monte-Carlo algorithms for finding low rank approximations,
- Frieze, Kannan, et al.
- 2004
(Show Context)
Citation Context ...tion, motivated in part by our work here, Frieze, Kannan, and Vempala have analyzed sampling methodologies capable of approximating the singular value decomposition of a large matrix very efficiently =-=[24]-=-; understanding the concrete connections between their work and our sampling methodology in Section 2 would be very interesting. Finally, the further development of link-based methods to handle inform... |

213 | Structural analysis of hypertexts: identifying hierarchies and useful metrics.
- BOTAFOGO, RIVLIN, et al.
- 1992
(Show Context)
Citation Context ...Hypertext and WWW Rankings. There have been several approaches to ranking pages in the context of hypertext and the www. In work predating the emergence of the www, Botafogo, Rivlin, and Shneiderman =-=[7]-=- worked with focused, stand-alone hypertext environments. They defined the notions of index nodes and reference nodes — an index node is one whose out-degree is significantly larger than the average o... |

181 |
Lower bounds for the partitioning of graphs.
- Donath, Hoffman
- 1973
(Show Context)
Citation Context ...discuss two other general eigenvector-based approaches to clustering that have been applied to link structures. The area of spectral graph partitioning was initiated by the work of Donath and Hoffman =-=[18]-=- and Fiedler [23]; see the recent book by Chung [12] for an overview. Spectral graph partitioning methods relate sparsely connected partitions 19of an undirected graph G to the eigenvalues and eigenv... |

180 | Clustering categorical data: an approach based ong dynamical systems”,
- Gibson, Kleinberg, et al.
- 2000
(Show Context)
Citation Context ... with Gibson and Raghavan, we have investigated extensions of the present work to the analysis of relational data, and considered a natural, non-linear analogue of spectral heuristics in this setting =-=[29]-=-. There a number of interesting further directions suggested by this research, in addition to the currently on-going work mentioned above. We will restrict ourselves here to three such directions. Fir... |

170 | Strong regularities in World Wide Web surfing. - Huberman, Pirolli, et al. - 1998 |

165 |
Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics.
- Pinski, Narin
- 1976
(Show Context)
Citation Context ... period of measurement (see e.g. Egghe [21]), we observe that the impact factor is a ranking measure based fundamentally on a pure counting of the in-degrees of nodes in the network. Pinski and Narin =-=[45]-=- proposed a more subtle citation-based measure of standing, stemming from the observation that not all citations are equally important. They argued that a journal is “influential” if, recursively, it ... |

137 | WWWW: Tools for Taming the Web,”
- McBryan, “GENVL
(Show Context)
Citation Context ...yperlink as a descriptor of the page being pointed to when assessing the relevance of that page. The use of anchor text appeared in one of the oldest www search engines, McBryan’s World Wide Web Worm =-=[40]-=-; it is also used in [8, 11, 10]. Another direction of work on the integration of links into www search is the construction of search formalisms capable of handling queries that involve predicates ove... |

125 | The Connectivity Server: fast access to linkage information on the Web - Bharat, Broder, et al. - 1998 |

116 |
Bibliometrics of the World Wide Web: an exploratory analysis of the intellectual structure of cyberspace,
- Larson
- 1996
(Show Context)
Citation Context ...e number of documents cited by both p and q, and the latter quantity is the number of documents that cite both p and q. Co-citation has been used as a measure of the similarity of www pages by Larson =-=[37]-=- and by Pitkow and Pirolli [47]. Weiss et al. [56] define linked-based similarity measures for pages in a hypertext environment that generalize co-citation and bibliographic coupling to allow for arbi... |

112 | The quest for correct information of the web: hyper search engines.
- Marchiori
- 1997
(Show Context)
Citation Context ...al heuristics. Specifically, in his framework, the relevance of a page in hypertext to a particular query is based in part on the relevance of the pages it links to. Marchiori’s HyperSearch algorithm =-=[39]-=- is based on such a methodology applied to www pages: A relevance score for a page p is computed by a method that incorporates the relevance of pages reachable from p, diminished by a damping factor t... |

110 | HyPursuit: A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering.
- Weiss, Velez, et al.
- 1996
(Show Context)
Citation Context ...he latter quantity is the number of documents that cite both p and q. Co-citation has been used as a measure of the similarity of www pages by Larson [37] and by Pitkow and Pirolli [47]. Weiss et al. =-=[56]-=- define linked-based similarity measures for pages in a hypertext environment that generalize co-citation and bibliographic coupling to allow for arbitrarily long chains of links. Several methods have... |

108 | How to personalize the web. In
- Barrett, Maglio, et al.
- 1997
(Show Context)
Citation Context ...al questions that can be asked about www traffic, involving both the modeling of such traffic and the development of algorithms and tools to exploit information gained from traffic patterns (see e.g. =-=[2, 3, 33]-=-). It would be interesting to ask how the approach developed here might be integrated into a study of user traffic patterns on the www. Second, the power of eigenvector-based heuristics is not somethi... |

108 |
Webquery: Searching and visualizing the web through connectivity
- Carriere, Kazman
- 1997
(Show Context)
Citation Context ...in-degree is significantly larger than the average in-degree. They also proposed measures of centrality based on node-to-node distances in the graph defined by the link structure. Carrière and Kazman =-=[9]-=- proposed a ranking measure on www pages, for the goal of re-ordering search results. The rank of a page in their model is equal to the sum of its in-degree and its out-degree; thus, it makes use of a... |

105 | ParaSite: Mining Structural Information on the Web. - Spertus - 1997 |

101 |
Searching for Information in a Hypertext Medical Handbook.
- Frisse
- 1987
(Show Context)
Citation Context ...nvokes a textbased search and then computes numerical scores for the pages in a relatively small subgraph constructed from the initial search results. Other Link-Based Approaches to WWW Search Frisse =-=[25]-=- considered the problem of document retrieval in singly-authored, stand-alone works of hypertext. He proposed basic heuristics by which hyperlinks can enhance notions of relevance and hence the perfor... |

86 |
The structure of scientific literatures, I: Identifying and graphing specialties.
- Small, Griffith
- 1974
(Show Context)
Citation Context ...o allow for arbitrarily long chains of links. Several methods have been proposed in this context to produce clusters from a set of nodes annotated with such similarity information. Small and Griffith =-=[54]-=- use breadth-first search to compute the connected components of the undirected graph in which two nodes are joined by an edge if and only if they have a positive co-citation value. Pitkow and Pirolli... |

73 |
Introduction to Informetrics
- Egghe, Rousseau
- 1990
(Show Context)
Citation Context ... AT ) −1e. Before discussing the relation of these measures to our work, we consider the way in which they were extended by research in the field of bibliometrics. Scientific Citations. Bibliometrics =-=[22]-=- is the study of written documents and their citation structure. Research in bibliometrics has long been concerned with the use of citations to produce quantitative estimates of the importance and “im... |

71 | Applications of a Web query language.
- Arocena, Mendelzon, et al.
- 1997
(Show Context)
Citation Context ...work on the integration of links into www search is the construction of search formalisms capable of handling queries that involve predicates over both text and links. Arocena, Mendelzon, and Mihaila =-=[1]-=- have developed a framework supporting www queries that combines standard keywords with conditions on the surrounding link structure. Clustering of Link Structures Link-based clustering in the context... |

63 |
An input-output approach to clique identification.
- Hubbell
- 1965
(Show Context)
Citation Context ... obtain a direct matrix formulation of this measure: sj is proportional to the jth column sum of the matrix (I − bA) −1 − I, whereIdenotes the identity matrix and all entries of A are 0 or 1. Hubbell =-=[32]-=- proposed a similar model of standing by studying the equilibrium of a certain weight-propagation scheme on nodes of the network. Recall that Aij, the(i, j) th entry of our matrix A, represents the st... |

37 | A.: Experiments in topic distillation - Chakrabarti, Dom, et al. - 1998 |

30 |
Cocited author mapping as a valid representation of intellectual structure.
- McCain
- 1986
(Show Context)
Citation Context ...lgebra (e.g. [30]) in fact provide a precise sense in which projection onto the first k eigenvectors produces the minimum distortion over all k-dimensional projections of the data. Small [53], McCain =-=[41]-=-, and others have applied this technique to journal and author co-citation data. The application of dimension-reduction techniques to cluster www pages based on co-citation has been employed by Larson... |

26 |
lawfulness on the electronic frontier
- PITKOW, PIROLLI
(Show Context)
Citation Context ...both p and q, and the latter quantity is the number of documents that cite both p and q. Co-citation has been used as a measure of the similarity of www pages by Larson [37] and by Pitkow and Pirolli =-=[47]-=-. Weiss et al. [56] define linked-based similarity measures for pages in a hypertext environment that generalize co-citation and bibliographic coupling to allow for arbitrarily long chains of links. S... |

17 |
The analysis of square matrices of scientometric transaction
- PRICE
- 1981
(Show Context)
Citation Context ...iestimates for the next iteration. Finally, there was been work aimed at the troublesome issue of how to handle journal self-citations (the diagonal elements of the matrix A); see e.g. de Solla Price =-=[15]-=- and Noma [42]. Let us consider the connections between this previous work and our algorithm to compute hubs and authorities. We also begin by observing that pure in-degree counting, as manifested by ... |

14 |
An improved method for analyzing square scientometric transaction matrices
- Noma
- 1982
(Show Context)
Citation Context ... the next iteration. Finally, there was been work aimed at the troublesome issue of how to handle journal self-citations (the diagonal elements of the matrix A); see e.g. de Solla Price [15] and Noma =-=[42]-=-. Let us consider the connections between this previous work and our algorithm to compute hubs and authorities. We also begin by observing that pure in-degree counting, as manifested by the impact fac... |

14 | The synthesis of specialty narratives from cocitation clusters, Journal of the American Society for Information Science 37(3
- Small
- 1986
(Show Context)
Citation Context ...s of linear algebra (e.g. [30]) in fact provide a precise sense in which projection onto the first k eigenvectors produces the minimum distortion over all k-dimensional projections of the data. Small =-=[53]-=-, McCain [41], and others have applied this technique to journal and author co-citation data. The application of dimension-reduction techniques to cluster www pages based on co-citation has been emplo... |

14 | Web search using automated classification. - Chekuri, Goldwasser, et al. - 1997 |

11 |
Mathematical relations between impact factors and average number of citations,
- Egghe
- 1988
(Show Context)
Citation Context ... of citations received by papers published in the previous two years of journal j [22]. Disregarding for now the question of whether two years is the appropriate period of measurement (see e.g. Egghe =-=[21]-=-), we observe that the impact factor is a ranking measure based fundamentally on a pure counting of the in-degrees of nodes in the network. Pinski and Narin [45] proposed a more subtle citation-based ... |

10 |
Algebraic Connectivity of Graphs
- Fielder
- 1973
(Show Context)
Citation Context ... general eigenvector-based approaches to clustering that have been applied to link structures. The area of spectral graph partitioning was initiated by the work of Donath and Hoffman [18] and Fiedler =-=[23]-=-; see the recent book by Chung [12] for an overview. Spectral graph partitioning methods relate sparsely connected partitions 19of an undirected graph G to the eigenvalues and eigenvectors of its adj... |

9 |
Joint-space analysis of "pick-any" data: Analysis of choices from an unconstrained set of alternatives
- Levine
- 1979
(Show Context)
Citation Context ...esponding to the large negative coordinates of the same eigenvector. In a different direction, centroid scaling is a clustering method designed for representing two types of objects in a common space =-=[38]-=-. Consider, for example, a set of people who have provided answers to the questions of a survey — one may wish to represent both the people and the possible answers in a common space, in a way so that... |

7 |
Measuring the relative standing of disciplinary journals
- Doreian
- 1988
(Show Context)
Citation Context ...distribution of the following random process: beginning with an arbitrary journal j, one chooses a random reference that has appeared in j and moves to the journal specified in the reference. Doreian =-=[19, 20]-=- showed that one can obtain a measure of standing that corresponds very closely to influence weights by repeatedly iterating the computation underlying Hubbell’s measure of standing: In the first iter... |

7 | The gossip problem - Berman - 1973 |

6 |
The Connectivity Server: Fast Access to
- Bharat, Broder, et al.
- 1998
(Show Context)
Citation Context ...eveloped here; see Bharat and Henzinger [6] and Chakrabarti et al. [10, 11]. The implementation of the Bharat-Henzinger system made 27use of the recently developed Connectivity Server (Bharat et al. =-=[5]-=-), which provides very efficient retrieval for linkage information contained in the AltaVista index. With Gibson and Raghavan, we have used the algorithms described here to explore the structure of “c... |

5 |
Co-citation Analysis and the Invisible College
- Noma
- 1984
(Show Context)
Citation Context ...rdinates in the representations they produce; rather, the goal is to infer a notion of similarity among a set of objects by geometric means. Centroid scaling has been applied to citation data by Noma =-=[43]-=-, for jointly clustering citing and cited documents. In the context of information retrieval, the Latent Semantic Indexing methodology of Deerwester et al. [16] applied a centroid scaling approach to ... |

5 |
Subject and Citation Indexing. Part I: The clustering structure of composite representations in the cystic fibrosis document collection
- Shaw
(Show Context)
Citation Context ...ow and Pirolli [47]. The clustering of documents or hyperlinked pages can of course rely on combinations of textual and link-based information. Combinations of such measures have been studied by Shaw =-=[50, 51]-=- in the context of bibliometrics. More recently, Pirolli, Pitkow, and Rao [46] have used a combination of link topology and textual similarity to group together and categorize pages on the www. Finall... |

5 |
Subject and Citation Indexing. Part II: The optimal, cluster-based retrieval performance of composite representations
- Shaw
(Show Context)
Citation Context ...ow and Pirolli [47]. The clustering of documents or hyperlinked pages can of course rely on combinations of textual and link-based information. Combinations of such measures have been studied by Shaw =-=[50, 51]-=- in the context of bibliometrics. More recently, Pirolli, Pitkow, and Rao [46] have used a combination of link topology and textual similarity to group together and categorize pages on the www. Finall... |

4 |
Flow-interception problems. In Facility location : a survey of applications and methods
- Berman, Hodgson, et al.
- 1995
(Show Context)
Citation Context ...al questions that can be asked about www traffic, involving both the modeling of such traffic and the development of algorithms and tools to exploit information gained from traffic patterns (see e.g. =-=[2, 3, 33]-=-). It would be interesting to ask how the approach developed here might be integrated into a study of user traffic patterns on the www. Second, the power of eigenvector-based heuristics is not somethi... |

4 |
A measure of standing for citation networks within a wider environment
- Doreian
- 1994
(Show Context)
Citation Context ...distribution of the following random process: beginning with an arbitrary journal j, one chooses a random reference that has appeared in j and moves to the journal specified in the reference. Doreian =-=[19, 20]-=- showed that one can obtain a measure of standing that corresponds very closely to influence weights by repeatedly iterating the computation underlying Hubbell’s measure of standing: In the first iter... |

3 |
On the citation influence methodology of Pinski and
- Geller
- 1978
(Show Context)
Citation Context ...atural parallel between this and our self-referential construction of hubs and authorities; we will discuss the connections below. The concrete construction of Pinski and Narin, as modified by Geller =-=[27]-=-, is the following. The measure of standing of journal 15j will be called its influence weight and denoted wj. The matrix A of connection strengths will have entries specified as follows: Aij denotes... |

1 |
Upfal “Web search using automated classification,” poster at
- Chekuri, Goldwasser, et al.
- 1997
(Show Context)
Citation Context ...rom one another in the graph Gσ for a variety of reasons. For example, (1) The query string σ may have several very different meanings. E.g. "jaguar" (a useful example we learned from Chandra Chekuri =-=[13]-=-). (2) The string may arise as a term in the context of multiple technical communities. E.g. "randomized algorithms". 20(3) The string may refer to a highly polarized issue, involving groups that are... |

1 | Sources in a Hyperlinked Environment CARRIÈRE - J, KAZMAN - 1997 |