| Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., and Kleinberg, J. Mining the web's link structure. Computer, 32(8):60--67, 1999. |
....is simple: using the HTTP [16] protocol, web clients running on users machines request objects from web servers. Previous studies have examined many aspects of the web, including web workloads [2, 8, 15, 29] characterizing web objects [3, 11] and even modeling the hyperlink structure of the web [6, 21]. These studies suggest that most web objects are small (5 10KB) but the distribution of object sizes is heavy tailed and very large objects exist. Web objects are accessed with a Zipf popularity distribution, as are web servers. The number of web objects is enormous (in the billions) and rapidly ....
S. Chakrabarti, B.E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8), 1999.
....is simple: using the HTTP [16] protocol, web clients running on users machines request objects from web servers. Previous studies have examined many aspects of the web, including web workloads [2, 8, 15, 29] characterizing web objects [3, 11] and even modeling the hyperlink structure of the web [6, 21]. These studies suggest that most web objects are small (5 10KB) but the distribution of object sizes is heavy tailed and very large objects exist. Web objects are accessed with a Zipf popularity distribution, as are web servers. The number of web objects is enormous (in the billions) and rapidly ....
S. Chakrabarti, B.E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8), 1999.
....if it provides links to good authorities. In recent research work on link analysis of hyperlinked documents, HITS is applied to the research area of topic distillation and several kinds of link weights are involved to indicate the significance of links in hypedinked documents. In the Clever system [10], weights tuned empirically are added to distinguish same site links and others. In [3] the metrics of similarity of whole contents in linked documents are applied on link weights and the use of text surrounding the links as keyword based evidence to determine a weight for each link is proposed ....
....according to their hub or authority values when a general link analysis algorithm is applied. One may assign weights on each link of a page to reduce the effects of noise links. In recent researches of topic distillation and link analysis, there are some weighting mechanisms proposed in [3] 7][10]. However, due to the lack of query terms, these weighting mechanisms do not work well when we want to mine the information structure of a Web site. Therefore, a new weighting mechanism which considers the significance of links and the contents of information they carry in a Web site, is needed. ....
S. Chakrabarti, B. Dom, S. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8), pages 60-67, August 1999.
....3. DISCUSSION OF RELATED WORK As mentioned in the beginning, there is a large amount of recent academic and industrial work on web search technology. Most closely related to our work is the also quite extensive literature on the analysis and utilization of the hyperlink structure of the web; see [3, 15, 26] for an overview. In general, link based ranking is closely related to other applications of link analysis, such as clustering, categorization, or data mining, although a discussion of these issues is beyond the scope of this paper. Various heuristics for link based ranking, such as counting ....
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999.
....characterization project. I believe that links very widely in nature, even on a given page, and are not necessarily a good basis for relevance assessments. They are a most convenient structure to use, however. There is a large body of literature on the topic of focused crawling, with Chakrabarti [27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38] probably being the most prolific. It is probably beneficial to start a focused crawl with a hub , in Kleinberg s terminology[69] The ARC system [30] augments Kleinberg s HITS technology [53] by adding the textual content of the link anchors to the mix, as did Brin and Page [19] Others [45, ....
S. Chakrabarti, B. Dom, D. Gibson, S. R. Kumar, A. Tomkins, J. Kleinberg, P. Raghavan, and S. Rajagopalan. Mining the Web's link structure. IEEE Computer, 38(8):60--67, 1999.
....growth and diversi cation of the Web. Probably the most widely recognized example of this kind is the PageRank index employed by the Google search engine [6] PageRank is but one of many models and algorithms to rank Web resources according to their position in a link structure (see, e.g. [25, 20, 9, 1, 5, 8]) Our goal is to supplement rankings with a meaningful visualization of the graph they are computed on. While graph visualization is an active area of research as well [10, 19] its integration with quantitative analysis is only beginning to receive attention. It is, however, rather dicult to ....
S. Chakrabarti, B. E. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. S. Tomkins, D. Gibson, and J. M. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60-67, 1999.
.... that occur in a minimum number of observed objects [29] Discovery of common substructures in graphs, such as in the above mentioned Subdue system, is another example of this kind of discovery task [10] Web mining is a KDD DM related issue that attracts much attention in recent years [5] [6]. Although the World Wide Web is not a database in the classical sense, it is itself an enormously large resource of data. It puts much challenge to KDD DM system developers, since data on the Web are highly unstructured and stored in a number of different formats. However, some minimum structure ....
Chakrabarti et al., Mining the Web's Link Structure. IEEE Computer, August 1999., pp. 50-57.
....families of graphs [11, 13, 15] However, these techniques do not seem to be applicable to web graphs, and we thus rely on standard coding and text compression techniques in our construction. For recent surveys of information retrieval on the Web with emphasis on link based methods, we refer to [5, 8]. Examples of ranking techniques based on link structure are the Pagerank algorithm of Brin and Page [4] and the HITS algorithm of Kleinberg [16] Recent large scale studies of the graph structure of the web are reported in [6, 18, 17] The Connectivity Server of Bharat et al. 2] used in the ....
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999.
....model. The algorithm assumes no knowledge of the model, and is well defined regardless of the model s accuracy. 1. Introduction Kleinberg s seminal paper [20] on hubs and authorities introduced a natural paradigm for classifying and ranking web pages, setting off an avalanche of subsequent work [7, 8, 10, 15, 22, 9, 3, 12, 19, 2, 5, 27, 1, 13]. Kleinberg s ideas were implemented in HITS as part of the CLEVER project [7, 10] Around the same time, Brin and Page [6, 25, 18] developed a highly successful search engine, Google [17] which orders search results according to Microsoft Research, One Microsoft Way, Redmond, WA 98052. optas ....
.... Kleinberg s seminal paper [20] on hubs and authorities introduced a natural paradigm for classifying and ranking web pages, setting off an avalanche of subsequent work [7, 8, 10, 15, 22, 9, 3, 12, 19, 2, 5, 27, 1, 13] Kleinberg s ideas were implemented in HITS as part of the CLEVER project [7, 10]. Around the same time, Brin and Page [6, 25, 18] developed a highly successful search engine, Google [17] which orders search results according to Microsoft Research, One Microsoft Way, Redmond, WA 98052. optas microsoft.com y Dept. of Computer Science, Tel Aviv University, Tel Aviv, ....
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
....execution of user queries. Many academic and industrial researchers have looked at web search technology over the last few years, including crawling strategies, storage, indexing, ranking techniques, and a significant amount of work on the structural analysis of the web and web graph; see [1, 7] for overviews of some recent work and [26, 2] for basic techniques. Thus, highly efficient crawling systems are needed in order to download the hundreds of millions of web pages indexed by the major search engines. In fact, search engines compete against each other primarily based on the size and ....
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999.
....ffl Descriptions: Page descriptions can either be constructed from the metatags or submitted by Webmasters or reviewers. ffl Hyperlinks: Hyperlinks contain high quality semantic clues to a page s topic. A hyperlink to a page represents an implicit endorsement of the page being pointed to. [10] ffl Hyperlink text: Hyperlink text is normally a title or brief summary of the target page. ffl Keywords: Keywords can be extracted from full text documents or metatags. ffl Page title: The title tag, which is only valid in a head section, defines the title of an HTML document. ffl Text with ....
....in the search results or can be used to find related Web pages [14] Analyzing the interconnections of a series of related pages can identify the authorities and hubs for a particular topic. A simple method to update a nonnegative authority weight x p and a nonnegative hub weight y p is given in [10]. If a page is pointed to by many good hubs, its authority weight is updated by the following formula: x p = X q such that q p y q ; 1) 5 where the notation q p indicates that q links to p. Similarly, if a page points to many good authorities, its hub weight is updated via y p = X q ....
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and John Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60-67, August 1999.
....the hubauthority model in addition to focusing techniques in order to perform the crawl effectively. Starting with these methods, there has been some recent work on various crawlers whichusesimilar concepts [8, 14] in order to improve the efficiency of the crawl. Other related work may be found in [2, 4, 5, 11, 9, 10]. The basic idea of crawling selectively is quite attractive from a performance perspective# however, the focused crawling technique [6] relies on a restrictive model based on certain pre ordained notions of how the linkage structure of the world wide web behaves. Specifically,thework on focused ....
S. Chakrabarti et al. Mining the Web's Link Structure. IEEE Computer, 32(8):60-67, August 1999.
....more effort is spent exploring the structure of the documents, including the link structure. The hyperlinks of the web offer rich latent information that does not usually appear in a text document. An important application which takes advantage of this is the Hyperlink Induced Topic Search (HITS) [8,9,10]. The Hyperlink Induced Topic Search, or HITS, does not crawl the web but relies on a search engine. The data that the HITS method processes comes from a search engine and the ranking is performed on a subgraph of the web that includes pages that match the query from a search engine and pages ....
S. Chakrabarti, et al. Mining the Web's link structure. IEEE Computer, (32) 8, pages 60-67, August 1999.
....it before knowing it is useless. The result of keyword searching is imprecise and inconsistent. Unless used in a deliverily restricted way HTML is not useful for structured data queries. Another approach is to build a hyperlink structure on the web and use tools to mine the hyperlink structure [4]. However, this method also uses keywords to interpret the content and needs a human to view the pages. 1. Introduction 3 1.2.2 Why Not Use SQL Relational database technology and Structured Query Language (SQL) can answer the query we present above. However, they still have problems for precise ....
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, and David Gibson andJon Kleinberg. Mining the web's link structure. Computer, 32(8):60 -- 67, August 1999.
....ffl Descriptions: Web page descriptions can either be constructed from the metatags or submitted by Webmasters or reviewers. ffl Hyperlinks: Hyperlinks contain high quality semantic clues to a page s topic. A hyperlink to a Web page represents an implicit endorsement of the page being pointed to. [7] ffl Keywords: Keywords can be extracted from fulltext documents or metatags. ffl Page title: The title tag, which is only valid in a head section, defines the title of an HTML document. ffl The first sentence: The first sentence of a Web document is also likely to give crucial information ....
....the best source of information on the topic; and ffl Hubs, which provide collections of links to authorities. Authorities and hubs are given top ranks of the search results. Analyzing the interconnections of a series of related pages can identifying the authorities and hubs for a particular topic [5, 6, 7, 12, 23]. 4. INFORMATION RETRIEVAL (IR) IR techniques are widely used in Web document searches [16] Among them, relevance feedback and data clustering are two of the most popular techniques used by search engines. Relevance Feedback An initial query is usually a wild guess. Retrieved query results ....
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and John Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60-67, August 1999.
....of Web pages with respect to a given topic (instead of a global importance without referring to any topic) 33] Note that a query can be considered as a topic. IBM s Clever Project is to develop a search engine that employs the technique of computing Web page authorities for a given query [7]. Another way to utilize the linkage information is as follows. When a page A has a link to page B, a set of terms known as anchor terms is usually associated with the link. The purpose of using the anchor terms is to provide information regarding the contents of page B to facilitate navigation ....
S. Chakrabarti, et al. Mining the Web's Link Structure. IEEE Computer, Vol. 32, No. 8, August 1999, pp. 60-67. 47
.... exist linked either by interest or by other social factors and internet auctions are a success; and directory services, such as Yahoo, although facing the overwhelming task of manually indexing up to 300 million web pages, are the preferred starting point for navigation as they are more reliable (Chakrabarti, 1999). In our proposal, users have at their disposal available information about several themes, consisting of a compilation of internet addresses. These compilations are called rough guides , and they are much pretty the same as their city guides or tourist guides analogs. Rough guides can be used ....
Chakrabarti, S. "Mining the Web's link structure", Computer, 32, 8, (1999); pp. 60-67.
.... exist linked either by interest or by other social factors and internet auctions are a success; and directory services, such as Yahoo, although facing the overwhelming task of manually indexing up to 300 million web pages, are the preferred starting point for navigation as they are more reliable (Chakrabarti, 1999). In our proposal, users have at their disposal available information about several themes, consisting of a compilation of internet addresses. These compilations are called rough guides , and they are much pretty the same as their city guides or tourist guides analogs. Rough guides can be used ....
Chakrabarti, S. "Mining the Web's link structure", Computer, 32, 8, (1999); pp. 60-67.
....tag that provides information about a Web page. ffl Descriptions: Web page descriptions can be constructed from the page title, the first sentence of a page, metatags, or descriptions from Webmasters or reviewers. ffl Hyperlinks: Hyperlinks contain high quality semantic clues to a page s topic [2]. A hyperlink of a Web page represents an implicit endorsement of the page being pointed to. However, exploiting this link information is challenging because it is highly noisy. 3.2 Classifying Web Documents The WebClass builds a height three modified decision tree, which splits the root, ....
Soumen Chakrabarti, et. al., "Mining the Web's link structure," IEEE Computer, Vol. 32, No. 8, pp. 60-67, August 1999.
....earlier attempt to integrate information from text, hyperlinks and DOM structure for topic distillation. In our case, segmentation is not the end goal but falls naturally out of the analysis. Our work is most closely related to work on hyperlink induced topic search (HITS) 13] topic distillation [2, 7], and the PageRank algorithm used in Google [4] These algorithms work at the coarse grained hypertext graph level and are reviewed in 2. 2 Preliminaries Kleinberg s original HITS algorithm [13] started with a query q which was sent to a text search engine. The returned set of pages Rq (called ....
....set centroid. Discarding authorities on such grounds seems reasonable, but hubs are often mixed, and the B H heuristic may discard some valuable links based on the aggregate relevance of the entire page. Thus, B H pruning may reduce the score of relevant authorities. By the time the Clever system [7] was built, mixed hubs were already in evidence. Because HITS modeled a page as a single node with a single h score, high authority scores could di#use from relevant links to less relevant links. In Clever, links within a fixed number of tokens of query terms were assigned a large edge weight ....
[Article contains additional citation context not shown here]
S. Chakrabarti, B. E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60-- 67, Aug. 1999.
....15 . Contamination can be reduced by recognizing that hyperlinks that contain award or awards near the anchor text are more relevant for this query than other edges. The Automatic Resource Compilation (ARC) and Clever systems incorporate such query dependent modification of edge weights [15, 17]. Query results are significantly improved. In user studies, the results compared favorably with lists compiled by humans, such as Yahoo and Infoseek 16 . 6.1.4 Outlier filtering Bharat and Henzinger have invented another way to integrate textual content and thereby avoid contamination of 14 ....
S. Chakrabarti, B. E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60--67, Aug. 1999. Feature article.
No context found.
Chakrabarti, S., Dom, B. E., Kumar, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., and Kleinberg, J. Mining the web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B.E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8), 1999.
No context found.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B. E. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J. M. Kleinberg. Mining the Web's Link Structure. IEEE Computer, Vol. 32, No. 8, pp. 60-67, 1999.
No context found.
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson and Jon Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. M. Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B. E. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, J. M. Kleinberg. Mining the Web's Link Structure. IEEE Computer, Vol. 32, No. 8, pp. 60-67, 1999.
No context found.
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B.E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Klienberg. "Mining the Web's Link Structure". IEEE Computer, Vol. 32(8):pages 60--67, August 1999.
No context found.
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60--67, 1999.
No context found.
Soumen Chakrabarti, Byron. E. Dom, S. Ravi Kumar, Prabhakar. Raghavan, Sridhar. Rajagopalan, Andrew. Tomkins, David. Gibson, and Jon. Kleinberg, Mining the Web's link structure, in IEEE Computer (Vol. 32 No. 8: 6067) , 1999
No context found.
S. Chakrabarti, B.E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8), 1999.
No context found.
S. Chakrabarti, B. Dom, R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, David Gibson, and J. Kleinberg. Mining the web's link structure. IEEE Computer, 32(8):60--67, 1999.
No context found.
Soumen Chakrabarti, Byron E. Dom, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew S. Tomkins, David Gibson, and Jon M. Kleinberg. Mining the Web's link structure. IEEE Computer, 32(8):60--67, 1999.
No context found.
S. Chakrabarti, B.E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Klienberg. "Mining the Web's Link Structure". IEEE Computer, Vol. 32(8):pages 60--67, August 1999.
No context found.
S. Chakrabarti, B.E. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8), 1999.
No context found.
S. Chakrabarti et al., "Mining the Web's Link Structure, " Computer, Aug. 1999, pp. 60-67.
No context found.
Chakrabarti, S., Dom, B., Kumar, S. Raghavan, P., & Rajagopalan, S. A. Tomkins, D. Gibson, J. Kleinberg, Mining the Web's Link Structure, IEEE Computer, August 1999.
No context found.
Soumen Chakrabarti, Byron E. Dom, David Gibson, Jon M. Kleinberg, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Mining the Web's link structure. IEEE Computer, pages 60--67, August 1999.
No context found.
Chakrabarti, S., Dom, B., Ktmmx, S. R., Raghavan, P., Rajagopalan, S., Tomkins, A., Gibson, D., and Kleinberg, J.: Mining the Web's Link Structure. IEEE Computer, 32(8) (1999), pp. 60-67.
No context found.
S. Chakrabarti, B. E. Dom, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J. Kleinberg. Mining the Web's link structure. Computer, 32(8):60--67, 1999.
No context found.
Soumen Chakrabarti, Byron E. Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. Mining the Web's link structure. Computer, 32(8):60-67, 1999.
No context found.
S. Chakrabarti, B. Dom, S.R. Kumar, P. Raghavan, S. Rajagopalan, A. Tomkins, D. Gibson, and J.M. Kleinberg. Mining the web's link structure. IEEE Computer, 32:60--67, 1999.
No context found.
Chakrabarti, S.; Dom, B. E.; Gibson, D.; Kleinberg, J. M.; Kumar, S. R.; Raghavan, P.; Rajagopalan, S.; and Tomkins, A. 1999b. Mining the web's link structure. IEEE Computer 60--67.
No context found.
Soumen Chakrabarti, Byron E. Dom, David Gibson, Jon M. Kleinberg, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Mining the web's link structure. IEEE Computer, pages 60--67, August 1999.
No context found.
Soumen Chakrabarti, Byron Dom, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, David Gibson, and Jon Kleinberg. Mining the web's link structure. IEEE Computer Magazine, 32(8):60--67, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC