20 citations found. Retrieving documents...
S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In Proc. SIGIR Workshop on Hypertext Information Retrieval on the Web, 1998.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
On Approximation Algorithms for Data Mining Applications - Afrati (2002)   (Correct)

....authorities. Thus hubs and authorities have this mutually depending relationship: good hubs link to many authorities and good authorities are linked by many hubs. Work in [47, 66] has shown that the concepts of hubs and authorities is a fundamental structural feature of the web. The CLEVER system [27] builds on the algorithmic framework of hub and authorities. Techniques in both [20] and [61] for measuring the importance of a web page use iterative procedures. In [62] it is explained how these techniques are essentially spectral methods. The algorithm in [20] can be thought of as seeking an ....

S. Chakrabarti, B. Dom, R. Kumar, S. R. P. Raghavan, and A. Tomkins. Experiments in topic distillation. In SIGIR workshop on hypertext information retrieval, 1998.


TREC2002 Web, Novelty and Filtering Track Experiments using .. - Kwok, Deng, Dinstl, Chan   (Correct)

....containing what the user wants that is named in the query. Topic distillation is new, and is concerned with locating the most useful pages (out of many) that best and comprehensively describe a user s topic, either by content or via links. Previous investigations on topic distillation such as [2,3,4,5] mostly tie the process to quality identification. They employed Kleinberg s HITS [6] algorithm as the primary method, and added content weighting as secondary improvement. Authority and hub pages found were identified with topic distillation answers. In this experiment, we employ page content ....

Chakrabarti, S, Dom, B, S, Gibson, D, Kumar, R, Raghavan, P, Rajagopalan & Tomkins, A. (1998b). Experiments in topic distillation. ACM SIGIR'98 Post Conf. Workshop on Hypertext IR for the Web.


A Probabilistic Model for Optimal Searching of the Deep Web - Mukherjee   (Correct)

....matrix for the system, P be as below. 0.2 0.5 0 il P = 0.1 0.0 0.5 0.6 0.8 0. We set the parameter M to say M = 2 for restricting the search domain and assume that the value ofc is 0.4. With the above settings, the expression for Z to be maximized, tums out as Z = 0.2 Xll q 0. 1 X12 0.1 X13 0.3 X21 0.4 X22 0.1 X23 q 0.2 X31 0.4 X32 q 0.5 X33 (lO) subject to the consmints Xll X12 X13 2 (11 a) X21 q X22 q X23 2 (11 b) X31 q X32 q X33 2 (11 c) The solution to this integer programming will give X12 = X23 = X32 = X33 = 1 and Xij = 0 for all other i ....

....say M = 2 for restricting the search domain and assume that the value ofc is 0.4. With the above settings, the expression for Z to be maximized, tums out as Z = 0.2 Xll q 0.1 X12 0.1 X13 0.3 X21 0.4 X22 0.1 X23 q 0.2 X31 0.4 X32 q 0. 5 X33 (lO) subject to the consmints Xll X12 X13 2 (11 a) X21 q X22 q X23 2 (11 b) X31 q X32 q X33 2 (11 c) The solution to this integer programming will give X12 = X23 = X32 = X33 = 1 and Xij = 0 for all other i andj. Thus the best predicted result is when K1 is assigned to L2, K 2 to L3, K 3 to L 2 and L 3 and entire query ....

[Article contains additional citation context not shown here]

S. Chakrabarti, B. Dom, D. Gibson, S. Kumar, P. Raghavan, A. Rajagopal, A. Tomkins, "Experiments in Topic Distillation", Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval from the Web, ACM Press, April, 1998


Background Readings for Collection Synthesis - Bibliography (2002)   (Correct)

....Another way to put it is that a focused crawler tries to guess whether the next link might lead to something relevant to the topic at hand. The main issues for a focused crawler are: How to determine whether a down loaded page is on topic and In what order should URLs be visited 3 Chakrabarti [35] says that the Web s link structure is largely hand built, and thus link structure is a good basis for relevance assessments. But is this true This would be a good question for research in a link characterization project. I believe that links very widely in nature, even on a given page, and are ....

....characterization project. I believe that links very widely in nature, even on a given page, and are not necessarily a good basis for relevance assessments. They are a most convenient structure to use, however. There is a large body of literature on the topic of focused crawling, with Chakrabarti [27, 28, 29, 30, 31, 33, 34, 35, 36, 37, 38] probably being the most prolific. It is probably beneficial to start a focused crawl with a hub , in Kleinberg s terminology[69] The ARC system [30] augments Kleinberg s HITS technology [53] by adding the textual content of the link anchors to the mix, as did Brin and Page [19] Others [45, ....

[Article contains additional citation context not shown here]

S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia, 1998. ACM. Available: <http: //www.almaden.ibm.com/cs/k53/abstract.html>.


Collection Synthesis - Bergmark (2002)   (4 citations)  (Correct)

....on the fly. Initial results show this might be something worth implementing. Vary the initial centroid length. So far 20 term centroids look better than 40 term centroids. Vary the minimum acceptable overlap between document terms and centroid terms. Chakrabarti s paper on CLEVER [14] suggests 3 further things we might try (prioritize links so o# site links are higher; don t follow links from documents almost identical to ones previously seen; use only a single point of entry into any given site) 8. ACKNOWLEDGMENTS This work was funded in part by the NSF grant on Project ....

S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia, 1998. ACM. Available: <http: //www.almaden.ibm.com/cs/k53/abstract.html>.


Authoritative Sources in a Hyperlinked Environment - Kleinberg (1999)   (565 citations)  (Correct)

....one treats the text surrounding a hyperlink as a descriptor of the page being pointed to when assessing the relevance of that page. The use of anchor text appeared in one of the oldest www search engines, McBryan s World Wide Web Worm [McBryan 1994] it is also used in Brin and Page [1998] and Chakrabarti et al. 1998a; 1998b] Another direction of work on the integration of links into www search is the construction of search formalisms capable of handling queries that involve predicates over both text and links. Arocena et al. 1997] have developed a framework supporting www queries that combines standard ....

.... The problem of returning more specific answers in the presence of this phenomenon is the subject of on going work; in Sections 8 and 9, we briefly discuss current work on the use of textual content for the purpose of focusing our approach to link based analysis [Bharat and Henzinger 1998; Chakrabarti et al. 1998a; 1998b] The use of nonprincipal eigenvectors, combined with basic term matching, can be a simple way to extract collections of authoritative pages that are more relevant to a specific query topic. For example, consider the following fact: Among the sets of hubs and authorities corresponding to ....

[Article contains additional citation context not shown here]

CHAKRABARTI, S., DOM, B., GIBSON, D., KUMAR,S.R.,RAGHAVAN, P., RAJAGOPALAN, S., AND TOMKINS, A. 1998. Experiments in topic distillation. In Proceedings of the ACM SIGIR Workshop on Hypertext Information Retrieval on the Web (Melbourne, Australia). ACM, New York.


Authoritative Sources in a Hyperlinked Environment - Kleinberg (1997)   (565 citations)  (Correct)

....based on anchor text, in which one treats the text surrounding a hyperlink as a descriptor of the page being pointed to when assessing the relevance of that page. The use of anchor text appeared in one of the oldest www search engines, McBryan s World Wide Web Worm [40] it is also used in [8, 11, 10]. Another direction of work on the integration of links into www search is the construction of search formalisms capable of handling queries that involve predicates over both text and links. Arocena, Mendelzon, and Mihaila [1] have developed a framework supporting www queries that combines ....

.... Web Virtual Library: Mathematics The problem of returning more specific answers in the presence of this phenomenon is the subject of on going work; in Sections 8 and 9, we briefly discuss current work on the use of textual content for the purpose of focusing our approach to link based analysis [6, 10, 11]. The use of non principal eigenvectors, combined with basic term matching, can be a simple way to extract collections of authoritative pages that are more relevant to a specific query topic. For example, consider the following fact: Among the sets of hubs and authorities corresponding to the ....

[Article contains additional citation context not shown here]

S. Chakrabarti, B. Dom, D. Gibson, S.R. Kumar, P. Raghavan, S. Rajagopalan and A. Tomkins, "Experiments in Topic Distillation," ACM SIGIR Workshop on Hypertext Information Retrieval on the Web, 1998.


Improved Classification via Connectivity Information - Broder, Krauthgamer..   (Correct)

....substantially higher accuracy than a classifier that does not use link information. The fact that links provide useful information on 1 2 the Web has most clearly been demonstrated by the HITS technique of Kleinberg (and various extensions thereof) for finding hub and authority web sites [2, 8, 4], and the PageRank algorithm used by Google for ranking web sites [3] The specific idea of using link information to improve a classifier was suggested and tested through experiments by Chakrabarti, Dom, and Indyk in [5] They showed that an approach based on iteratively relabeling pages using ....

S. Chakrabarti, B. Dom, D. Gibson, S.R. Kumar, S. Rajagopalan, and P. Raghavan. Experiments in topic distillation. In ACM SIGIR workshop on hypertext information retrieval on the Web, 1998.


Annotate - A Web-based Knowledge Management Support System for.. - Ginsburg (1998)   (2 citations)  (Correct)

....In recent work, based on Jon Kleinberg s linear algebra algorithm (Gibson et al. 1998) Kleinberg, 1998) specialized methods are introduced to identify hubs (pages which reference topic authorities) and the authority pages for a given topic. This technique is also used by IBM s CLEVER project (Chakrabarti et al. 1998). Less mathematical link mining is carried out by Pirolli, Pitkow and Rao (Pirolli et al. 1996) and by Spertus in her ParaSite system which makes use of the document s depth in the server filesystem hierarchy as well 12 . A related attempt to link mine was carried out by Li (Li, 1998) He ....

Chakrabarti, S., Dom, B. E., Gibson, D., Kumar, R., Raghavan, P., Rajagopalan, S., and Tomkins, A. (1998). Experiments in topic distillation. Technical report, IBM Alamaden Research Center, San Jose, CA.


A Comparison of Techniques to Find Mirrored Hosts on the WWW - Bharat, Broder, Dean, al. (1999)   (17 citations)  (Correct)

....techniques that work well on classical collections can return nonauthoritative results when applied to the Web. Recent web IR research has begun exploiting connectivity (i.e. linkage between documents) to compute a document s authority score. Examples of such techniques are PageRank [13] CLEVER [9, 8, 11], and Topic Distillation [4] The assumption underlying these algorithms is that each linking document provides an independent testimonial to the quality of the linked document. Mirroring violates this independence assumption. Hence documents pointed to by mirrors get artificially higher scores. ....

....neither feasible, nor (as we believe) necessarily more useful than comparing URL strings and connectivity. See also [19] for a discussion on the applicability of clustering algorithms to the WWW. Connectivity information has been applied to the problem of improving precision of WWW search results [13, 9, 8, 11, 4]. The graph model used by these methods is perturbed by duplicate pages. We believe our hconn and conn algorithms represent the first attempt to use connectivity information to finding mirrors. Furthermore, these two algorithms can be easily integrated within the methods cited above to alleviate ....

S. Chakrabarti, B. Dom, D. Gibson, S. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In ACM--SIGIR'98 PostConference Workshop on Hypertext Information Retrieval for the Web, 1998.


Finding Related Pages in the World Wide Web - Dean, Henzinger (1999)   (79 citations)  (Correct)

....that HITS computes the principal eigenvector of the matrix AA T , where A is the adjacency matrix of the above graph, and suggested using non principal eigenvectors for finding related pages. Finally, he gave anecdotal evidence that HITS might work well. Consecutively, a sequence of papers [10, 8] presented improvements on HITS and used it to populate a given hierarchy of categories. These improvements are not directly relevant to the task of finding related pages. 6 Conclusion We have presented two different algorithms for finding related pages in the WWW. They significantly outperform ....

S. Chakrabarti, B. Dom, D. Gibson, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In ACM--SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web, 1998.


From Resource Discovery to Knowledge Discovery on the Internet - Zaïane (1998)   (Correct)

....PR(P k ) C(P k ) d is a dumping factor between 0 and 1. Intuitively, PageRank represents the probability to choose a page after a random browsing[23, 28] CLEVER also is heavily based on link frequency. It ranks documents by measuring links between them. The main purpose of CLEVER, as stated in [26], is not to simply rank documents in a search result, but to find authoritative resources in a pool of web pages. The objective is to build directories like Yahoo directories but automatically find the authoritative entries in each category. Starting from categories (i.e. category hierarchy) ....

....document indicate the popularity of the document, while links coming out of a document indicate the richness or may be the variety of topics covered in the document. This can be compared to bibliographical citations. When a paper is cited often, it ought to be important. The PageRank[23] and CLEVER[26] methods take advantage of this information conveyed by the links to find pertinent web pages. The Multi Layered Database (MLDB) approach[61, 144] proposes a multi level database representation of the Internet where levels are obtained via successive transformation and generalizations performed on ....

S. Chakrabarti, B. Dom, D. Gibson, S.R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In ACM SIGIR workshop on Hypertext Information Retrieval on the Web, Melbourne, Australia, 1998.


The Web as a graph - Kumar, Raghavan, Rajagopalan..   (27 citations)  Self-citation (Kumar Raghavan Rajagopalan Tomkins)   (Correct)

No context found.

S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on Hypertext IR, 1998.


Searching the Workplace Web - Fagin, Kumar, McCurley, Novak.. (2003)   (5 citations)  Self-citation (Kumar)   (Correct)

....corpus of web documents (e.g. site search) In Internet search, many of the most significant techniques that lead to quality results are successful because they exploit the reflection of social forces in the way people present information on the web. The most famous of these are probably the HITS [24, 10, 11] and the PageRank [7] algorithms. These methods are motivated by the observation that a hyperlink from a document to another is an implicit conveyance of authority to the target page. The HITS algorithm takes advantage of a social process in which authors create hubs of links to authoritative ....

....from another page and edit the contents without changing the tags in order to preserve the common look and feel of an intranet. This index contains terms from approximately 4,343,510 non duplicate documents. Anchortext index. Anchortext is well known to be valuable in the context of web search [11, 5], and has also been recently used to classify documents [19] For us, anchortext is defined as the text that appears within the bounds of a hypertext link; thus, we chose not to include pre and posttext surrounding the link. For each document we take all of the anchortext for links to that ....

Soumen Chakrabarti, Byron E. Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. Experiments in topic distillation. In SIGIR Workshop on Hypertext Information Retrieval, pages 13--21, 1998.


The Shape of the Web and Its Implications for.. - Efe, Raghavan, Chu.. (2000)   (1 citation)  Self-citation (Raghavan)   (Correct)

....required text based heuristics. These were the problems of topic drift and topic generalization. In both cases, the HITS algorithm drifts toward more heavily linked regions in the graph, and some autocontrol mechanism is needed to prevent this drip. A simple idea used in the CLEVER project [13] [15] is based on the observation that text around the anchor of a link generally gives a good idea about the page being pointed to (e.g. click here to post a message on our message board ) By comparing the search terms against the text around the link, a relevance weight is computed for each link. ....

Soumen Chakrabarti, Byron E. Dom, David Gibson, Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, and Andrew Tomkins. \Experiments in Topic Distillation" ACM SIGIR'98 Post-Conference Workshop on Hypertext Information Retrieval for the Web, 1998.


The Web as a graph - Kumar, Raghavan, Rajagopalan.. (2000)   (27 citations)  Self-citation (Kumar Raghavan Rajagopalan Tomkins)   (Correct)

....these weights matter not their actual magnitudes. In practice, the relative ordering of hub authority scores becomes stabile with far fewer iterations than needed to stabilize the actual magnitudes. Typically five iterations of the algorithm is enough to achieve this stability. In subsequent work [6, 11, 13], the HITS algorithm has been generalized by modifying the entries of A so that they are no longer boolean. These modifications take into account the content of the pages in the base set, the internet domains in which they reside, and so on. Nevertheless, most of these modifications retain the ....

S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on Hypertext IR, 1998.


Trawling the Web for Emerging Cyber-Communities - Kumar, Raghavan.. (1999)   (95 citations)  Self-citation (Kumar Raghavan Rajagopalan Tomkins)   (Correct)

No context found.

Chakrabarti et. al. (2) 98 Soumen Chakrabarti, Byron Dom, David Gibson, S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan and Andrew Tomkins. Experiments in Topic Distillation. In Proceedings of the ACM SIGIR workshop on Hypertext Information Retrieval on the Web (1998), Melbourne, Australia.


The Web as a graph: measurements, models, and methods - Kleinberg, Kumar.. (1999)   (6 citations)  Self-citation (Kumar Raghavan Rajagopalan Tomkins)   (Correct)

....range of queries. For instance, when tested on the sample query search engines , the top authorities returned by HITS were Yahoo , Excite, Magellan, Lycos, and AltaVista even though none of these pages (at the time of the experiment) contained the phrase search engines. In subsequent work [5, 9, 10], the HITS algorithm has been generalized by modifying the entries of A so that they are no longer boolean. These modifications take into account the content of the pages in the base set, the internet domains in which they reside, and so on. Nevertheless, most of these modifications retain the ....

S. Chakrabarti, B. Dom, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. SIGIR workshop on hypertext IR, 1998.


Navigation-Aided Retrieval - Shashank Pandit Shashank   (Correct)

No context found.

S. Chakrabarti, B. E. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Experiments in topic distillation. In Proc. SIGIR Workshop on Hypertext Information Retrieval on the Web, 1998.


Recognizing Nepotistic Links on the Web - Davison (2000)   (11 citations)  (Correct)

No context found.

Chakrabarti, S.; Dom, B. E.; Gibson, D.; Kumar, R.; Raghavan, P.; Rajagopalan, S.; and Tomkins, A. 1998a. Experiments in Topic Distillation. In ACM SIGIR Workshop on Hypertext Information Retrieval on the Web.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC