90 citations found. Retrieving documents...
S. Chakrabarti, B. Dom and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD Conference, ACM, 1998

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

Documents 51 to 90  Previous 50

Predicting Next Page Access By Time Length Reference In The.. - Yalçinkaya (2002)   (Correct)

....page to another page in a web site. In other words, it is focused on the structure of the hyperlinks within the web itself. Most research on the web structure mining can be thought of a mixture of content and structure mining and add content information to the link structures such as Clever System[10] and Google[11] Web Usage Mining focuses on techniques that could predict user behavior while the user interacts through the web. We define the mined data in this category as the secondary data since they are all the result of interactions. We could classify them into the usage data that reside ....

....Application Areas of Web Usage Mining Web Usage Mining Business Intelligence System Improvement Site Modification Personalization General As shown in the figure, usage patterns extracted from web data have been applied to a wide range of research areas. Projects such as WebSIFT [9] WUM [10], SpeedTracer [30] have focused on web usage mining in general. 2.1 General Web Usage Mining The aim of a general web usage mining system is to discover general behavior and patterns from the log files by adapting well known data mining techniques or new approaches proposed. Most of the ....

[Article contains additional citation context not shown here]

S. Chakrabarti, B. Dom and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD Conference, ACM, 1998


SimRank: A Measure of Structural-Context Similarity - Jeh, Widom (2002)   (13 citations)  (Correct)

....show that the use of structure can greatly improve Web search versus text alone. The algorithms in [10, 16] analyze individual pages with respect to the global structure, whereas our similarity measure analyzes relationships between pairs of pages. In the classifier for Web pages presented in [5], the classification of the neighbors of a Web page p is used to improve upon the textual classification of p through a probabilistic model. In contrast, SimRank computes scores for pairs of pages (instead of a single page) by comparing their neighbors. Our algorithm is not limited to discrete ....

Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, June 1998.


Towards Structural Logistic Regression: Combining.. - Popescul, Ungar.. (2002)   (2 citations)  (Correct)

....central modeling component and calls statistical modeling from within the inner structure navigation loop to supply new predicates. Various other techniques have been applied to learning hypertext classifiers. These vary in methodology and information exploited. For example, Chakrabarti et al. [2] use predicted labels of neighboring documents to reinforce classification decisions for a given document. Glover et al. 8] provide an analysis of the utility of text in citing documents for classification. Yang et al. 17] include a more complete discussion of hypertext classification methods as ....

Soumen Chakrabarti, Byron E. Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Laura M. Haas and Ashutosh Tiwary, editors, Proceedings of the ACM International Conference on Management of Data, SIGMOD-98, pages 307 318. ACM Press, 1998.


Using Web Structure for Classifying and Describing Web.. - Glover.. (2002)   (19 citations)  (Correct)

....Eric J. Glover , Kostas Tsioutsiouliklis ,2, Steve Lawrence David M. Pennock , Gary W. Flake compuman, kt, lawrence, dpennock, flake research.nj .nec. com 1 kt cs . princeton.edu 2 NEC Research Institute Computer Science Department s 4 Independence Way Princeton University Princeton, NJ 08540 Princeton, NJ 08540 ABSTRACT The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative ....

....Tsioutsiouliklis ,2, Steve Lawrence David M. Pennock , Gary W. Flake compuman, kt, lawrence, dpennock, flake research.nj .nec. com 1 kt cs . princeton.edu 2 NEC Research Institute Computer Science Department s 4 Independence Way Princeton University Princeton, NJ 08540 Princeton, NJ 08540 ABSTRACT The structure of the web is increasingly being used to improve organization, search, and analysis of information on the web. For example, Google uses the text in citing documents (documents that link to the target document) for search. We analyze the relative utility of document ....

[Article contains additional citation context not shown here]

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hypedinks. SIGMOD Record (ACM Special Interest Group on Management of Data), 27(2):307-318, June 1998.


Information Retrieval on the World Wide Web and.. - Barfourosh.. (2002)   (Correct)

....varying between 14 26 topics. The number of subtopics in lower levels depends on the topic and available web pages about it. There is some research that aims to automate classification and construction of these hierarchies, and some of these projects result in an acceptable level of precision [23, 39, 41, 42]. Portals are very efficient for finding common information, but they are unable to organize everything, so specific information is not nearly as easy to find [19] This is one of the first rules to know when deciding between a crawler and a portal in order to find information. Some of the ....

....application of classification is in constructing directories in portals. Web searchers may find directories easier to use when finding some general information. Automatic construction and maintenance of such portals is one of the leading research areas in applying classification in web IR systems [23, 39, 40]. 4.1.1 Nave Bayes classifier In this model of learning [7, 44] it is assumed that text documents are generated from a parametric model. This model estimates the model parameters from training data. For each new document, using the estimated parameters and Bayes rule, the classifier calculates ....

S. Chakrabarti, B. Dom and P. Indyk, Enhanced hypertext categorization using hyperlinks, Proceedings of ACM SIGMOD, 1998.


Towards Structural Logistic Regression: Combining.. - Popescul, Ungar.. (2002)   (2 citations)  (Correct)

....central modeling component and calls statistical modeling from within the inner structure navigation loop to supply new predicates. Various other techniques have been applied to learning hypertext classi ers. These vary in methodology and information exploited. For example, Chakrabarti et al. [2] use predicted labels of neighboring documents to reinforce classi cation decisions for a given document. Glover et al. 8] provide an analysis of the utility of text in citing documents for classi cation. Yang et al. 17] include a more complete discussion of hypertext classi cation methods as ....

Soumen Chakrabarti, Byron E. Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Laura M. Haas and Ashutosh Tiwary, editors, Proceedings of the ACM International Conference on Management of Data, SIGMOD-98, pages 307-318. ACM Press, 1998.


Searching the Web - Arasu, Cho, Garcia-Molia, Paepcke.. (2001)   (6 citations)  (Correct)

....of) Web pages. The user can browse the hierarchy to locate the information that he is interested in. These hierarchies are typically manually compiled. The problem of automatically classifying documents based on some example classifications has been well studied in IR. Chakrabarti, Dom and Indyk [14] have extended the IR techniques to include hyperlink information. The basic idea is to use not only the words in the document but also the categories of the neighboring pages for classification. They have shown experimentally that their technique enhances the accuracy of classification. A ....

Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of the International Conference on Management of Data, 1998.


Schema Extraction and Levelizatino for XML Data - Kim   (Correct)

.... is of increasing importance. The extraction of the schema from a semistructured data follows either unsupervised categorization or supervised categorization. The former we call clustering, the latter classification. Many researchers focus on classification of semistructured data or documents [4, 10]. One example of classification is summarization of documents [8] The approach presented in this paper is instead to cluster data. In general, semistructured data sets are clustered according to XML elements. After a schema implicit in the cluster of semistructured data is extracted, ....

S. Chakrabarti, B Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proc. of the ACMSIGMOD Conf On Management of Data, pages 307-318, 1998.


Background Readings for Collection Synthesis - Bibliography (2002)   (Correct)

....used. Northern Light northernlight.com is cited in [40] as organizing its folders according to these categories: source (sites, domains) type (personal page, product review) language, and subject. Other research in hypertext classifiers and feature extraction is described by Chakrabarti [29, 32]. This work computes discrimination values of words found in documents linking to and linked from a starting set of documents. Reference is made to Salton for the discrimination analysis. Stata [101] describes the usefulness of a very fast term vector database in topic distillation and Web page ....

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of the ACM SIGMOD International 22 Conference on Management of Data, pages 307--318, Seattle Wa., 1998.


Evaluating Strategies for Similarity Search on the Web - Haveliwala, Gionis, Klein.. (2002)   (20 citations)  Self-citation (Indyk)   (Correct)

....et al. 12] propose algorithms, which we discussed in Sections 1 and 5.1, for nding related pages based on the connectivity of the Web only and not on the text of pages. The idea of using hyperlink text for document representation has been exploited in the past to attack a variety of IR problems [1, 3, 8, 9, 11, 18]. The For display, the terms were unstemmed with the most commonly occurring variant. novelty of our paper, however, consists in the fact that we do not make any a priori assumption about what are the best features for document representation. Rather, we develop an evaluation methodology that ....

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced Hypertext Categorization Using Hyperlinks. Proceedings of SIGMOD, 1998.


The Structure of Broad Topics on the Web - Chakrabarti, Joshi, Punera.. (2002)   (13 citations)  Self-citation (Chakrabarti)   (Correct)

....concerned about the absolute accuracy (although larger accuracy is obviously better) By using a heldout data set, we can estimate which topics are frequently confused by our classifier, and suitably discount such errors when interpreting collected data. Recent hypertext classification algorithms [8] may reduce the error by using hyperlink features, but this may produce misleading numbers: first the classifier uses link structure to estimate classes, then we correlate the class with link structure the results are likely to show artificial coupling between text and link features. A ....

....Bayesian text classifiers build a class conditional estimate of document probability, Pr(d c) in terms of the textual tokens appearing in d. Pages on the Web are not isolated entities, and the (estimated) topics of the neighbors N(u) of page u may lend valuable evidence to estimate the topic of u [8, 19]. Thus we need to estimate a joint distribution for Pr c(u) c(N(u) which is a direct application of the topic citation matrix. Enhanced focused crawling: A focused crawler is given a topic taxonomy, a trained topic classifier, and sample URLs from a specific topic. The goal is to ....

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD Conference. ACM, 1998. Online at http://www.cs.berkeley.edu/~soumen/sigmod98. ps.


Accelerated Focused Crawling through Online Relevance.. - Chakrabarti, Punera.. (2002)   (14 citations)  Self-citation (Chakrabarti)   (Correct)

....than two random pages, and that HREFs close together on a page link to documents which are more similar than targets which are far apart. Menczer has made similar observations [23] The HyperClass hypertext classifier also uses such locality patterns for better semisupervised learning of topics [7], as does IBM s Automatic Resource Compilation (ARC) and Clever topic distillation systems [6, 8] Two important advances have been made beyond the baseline best first focused crawler: the use of context graphs by Diligenti et al. 14] and the use of reinforcement learning by Rennie and McCallum ....

S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD Conference. ACM, 1998. Online at http://www.cs.berkeley.edu/~soumen/sigmod98. ps.

Documents 51 to 90  Previous 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC