24 citations found. Retrieving documents...
S. Chakrabarti. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2):1--11, January 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Usage Mining for and on the Semantic Web - Stumme, Hotho   (Correct)

....A distinction is generally made between Web mining that operates on the Web resources themselves (often further differentiated into content and structure mining) and mining that operates on visitors usage of these resources. Web content mining is a form of text mining (for recent overviews, see [10, 45]) It concentrates on the content of individual pages, which is contained in the HTML, script, etc. code that generates a page. It can therefore take advantage of the semistructured nature of these types of text. The hyperlink structure between those pages is, on the one hand, a structure over and ....

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explorations, 1:1--11, 2000.


Towards Semantic Web Mining - Berendt, Hotho, Stumme (2002)   (10 citations)  (Correct)

....from human understandable content to machine understandable semantics. Three areas of Web mining are commonly distinguished: content mining, structure mining, and usage mining (see Fig. 1) Content text of Web pages. Web content mining is a form of text mining (for an overview, see [3]) The primary Web resource that is being mined is an individual page. Web content mining can take advantage of the semi structured nature of Web page text. The HTML tags of today s Web pages, and even more so the XML markup of tomorrow s Web pages, bear information that concerns not only layout, ....

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explo- rations, 1:1-11, 2000.


Mining Newsgroups Using Networks Arising From Social.. - Agrawal, Rajagopalan.. (2003)   (4 citations)  (Correct)

....Network 1. INTRODUCTION Information retrieval has recently witnessed remarkable advances, fueled almost entirely by the growth of the web. The fundamental feature distinguishing recent forms of information retrieval from the classical forms [25] is the pervasive use of link information [3] [5]. Drawing upon the success of search engines that use linkage information extensively, we study the feasibility of extending linkbased methods to new application domains. Interactions between individuals have two components: Copyright is held by the author owner(s) WWW2003, May 20 24, 2003, ....

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explorations, 1, 2000.


Hyperlink Ensembles: A Case Study in Hypertext Classification - Fürnkranz (2001)   (Correct)

.... 7] Prerequisite for this type of research is the availability of extended document indexing techniques that allow fast access to both outgoing and incoming hyperlinks of a page [6] An excellent overview of recent research in the area, which is nowadays referred to as web mining, can be found in [11]. Not surprisingly, the potential of hyperlinks as additional information sources for text categorization tasks has also been looked at in recent research. Previous work on hypertext categorization in one way or another addressed this problem by merging (parts of) the text of the predecessor pages ....

Soumen Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD explorations, 1(2):1--11, January 2000. 20


Frequent Term-Based Text Clustering - Beil, Ester (2002)   (5 citations)  (Correct)

....advantage in today s information society. To structure large sets of hypertexts available in a company s intranet, again methods of text clustering can be used. Compared to previous applications of clustering, three major challenges must be addressed for clustering (hyper)text databases (see also [1]) Very high dimensionality of the data ( 10,000 terms dimensions) this requires the ability to deal with sparse data spaces or a method of dimensionality reduction. Very large size of the databases (in particular, of the world wide web) therefore, the clustering algorithms must be very ....

Chakrabarti S.: Data mining for hypertext: A tutorial survey, A CM S1GKDD Explorations, 2000, pp. 1-11.


An Approach to Build a Cyber-Community Hierarchy - Reddy, Kitsuregawa (2002)   (1 citation)  (Correct)

....to identify the patterns in collections. Also, most of the search engines perform both link as well as text analysis to improve the quality of search results. Based on link analysis many researchers proposed schemes [17, 18, 19, 20, 21, 22, 23] to find related information from the Web. Chakrabarti [24] provides a survey of the research works in the area of hypertext mining. Community detection: The principle component of the cocitation analysis measures the number of documents that have cited a given pair of documents together. In [25] a cocitation analysis technique [26] has been extended ....

S.Chakrabarti, Data mining for hypertext: A tutorial survey, ACM SIGKDD Explorations, 1(2), pp. 1-11, 2000.


C4-1: Building a community hierarchy for the Web based on.. - Reddy, Kitsuregawa (2002)   (Correct)

....to iden tify patterns in collections. Also, most of the search engines perform both link as well as text analysis to improve the quality of search results. Based on link analysis many researchers proposed schemes [16, 17, 18, 19, 20, 21, 22] to find related information from the Web. Chakrabarti [23] surveys research works in the area of hypertext mining. Community detection: In [24] cocitation analysis technique [25] has been extended to cluster the Web pages by considering that a hyperlink provides a semantic linkages between pages in the same manner that citations link documents to ....

S.Chakrabarti, Data mining for hypertext: A tutorial survey, ACM SIGKDD Explorations, 1(2), pp. 1-11, 2000.


Information Retrieval on the World Wide Web and.. - Barfourosh.. (2002)   (Correct)

.... impressive job in extending their reach, though web growth itself has exceeded the crawling ability of search engines [5, 6] Even the largest popular search engines, such as Alta Vista and HotBot index less than 18 of the accessible web as of February 1999 [6] down from 35 in late 1997 [7]. Figure 2 shows the percentage of the web coverage by the end of 2001. Today Google is probably biggest search engine, and has gathered more than 2 billion pages and covers only about 40 percent of the publicly available web pages. Internet Host Number (Source: Internet Software Consortium ....

....in the vector model space are the weight of the words in document. Latent Semantic Indexing (LSI) is a variation of the vector space model of information retrieval that uses techniques of singular value decomposition (SVD) to reduce the dimensionality of the vector space. Soumen Chakrabarti [7] This method considers the co occurrence of words in a document, so it can derive a relationship between words and inherent concepts. The similarity of two documents is calculated by the cosine of the reduced vector space of two documents. Since this approach uses conceptual matching rather than ....

[Article contains additional citation context not shown here]

S. Chakrabarti, Data mining for hypertext: A tutorial survey, SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM 1(2): 1-11, 2000.


Exploiting Confusion Matrices for Automatic Generation of Topic.. - Godbole (2002)   (1 citation)  (Correct)

....grouping of classes enables us to use SVMs for higher accuracy at the level of smaller groups of classes(level 2 classifiers in the experimental section. Using Naive Bayes classifiers itself gives us quite a boost in accuracy. Consider the multinomial model for Naive Bayesian text classification [Cha00]. P r(d c) d 2 n(d, t) # t#d c,t Here # c,t = 1 T n(d,#) We note here that the parameter smoothing quantity # c,t is dependent on the number of word occurrences in documents of a class c and the total size of the vocabulary T . In breaking up the classification ....

Soumen Chakrabarti. Data mining for hypertext: A tutorial survey. In ACM SIGKDD Explorations, 1(2), 2000.


Document Classification as an Internet service: Choosing the best .. - Godbole (2001)   (Correct)

....in section [3.3] In our particular setting of document classification across multiple sites, we make no assumptions about the kinds of individual classifiers used. For evaluation purposes CHAPTER 2. EVIDENCE BOOSTING 10 however, we use a Naive bayes text classifier using the multinomial model [2] at each of the sites. We impose a small requirement upon each of the classifiers. Along with the predictions on the test instance and all the evidences, the classifier should provides us with a class probability vector. In a practical setting, evidence selection is left to the site itself and it ....

....Navigator, nbsp, quot. 3.2 Experimental setting The experiments were conducted in a simulated distributed site setting. Di#erent classifiers representing di#erent sites were run on the same machine. The classifiers used the Naive bayes method of text classification using the multinomial model[2]. For all classification purposes, the training and test data were obtained after a standard 70 30 split. The 70 of the data was then split across all the participating sites. All experimental test measurements as reported in section[3.3] were those which were obtained on the remaining 30 of the ....

[Article contains additional citation context not shown here]

Soumen Chakrabarti. Data mining for hypertext - a tutorial survey. In SIGKDD Explorations - Volume 1, Issue 2, pages 1 -- 11, Jan 2000.


Web Mining Research: A Survey - Kosala, Blockeel (2000)   (36 citations)  (Correct)

....search service when they want to nd speci c information on the Web. When a user uses search service he or she usually inputs a simple keyword query and the query response is the list of pages ranked based on their similarity to the query. However today s search tools have the following problems [23]. The rst problem is low precision, which is due to the irrelevance of many of the search results. This results in a diculty nding the relevant information. The second problem is low recall, which is due to the inability to index all the information available on the Web. This results in a ....

....in a special issue of the Machine Learning journal. They also indicate some fertile research areas for both communities. Garofalakis et al. 55] review some data mining techniques and the algorithms for Web mining that speci cally take into account the hyperlink information. Chakrabarti [23] provides a survey of data mining for hypertext. His paper mainly surveys the statistical techniques for Web content across the continuum of supervised, semi supervised and unsupervised learning, and social network analysis techniques for Web structure mining. Levy and Weld [84] wrote a survey in ....

S. Chakrabarti. Data mining for hypertext: A tutorial survey. ACM SIGKDD Explorations, 1(2):1-11, 2000.


Web Mining and Knowledge Discovery of Usage Patterns - Wang (2000)   (Correct)

....structure between the documents for document representation. As for the database view, in order to have the better information management and querying on the Web, the mining always tries to infer the structure of the Web site of to transform a Web site to become a database. S. Chakrabarti [19] provides a in depth survey of the research on the application of the techniques from machine learning, statistical pattern recognition, and data mining to analyzing hypertext. It s a good resource to be aware of the recent advances in content mining research. 6 Multimedia data mining is part of ....

S. Chakrabarti. Data Mining for hypertext: A tutorial survey


Emergent Semantics Systems - Aberer, Catarci.. (2004)   (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2):1--11, January 2000.


Part-Of-Speech Enhanced Context - Recognition Rasmus Elsborg (2004)   (Correct)

No context found.

S. Chakrabarti, "Data mining for hypertext: a tutorial survey," SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, vol. 1, pp. 1-- 11, 2000.


Emergent Semantics Systems - Aberer, Catarci, Cudre-Mauroux..   (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2):1--11, January 2000.


Emergent Semantics Principles and Issues - Karl Aberer Philippe   (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2):1--11, January 2000.


Pruning the Vocabulary for Better Context Recognition - Madsen, Sigurdsson, Hansen, .. (2004)   (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: a tutorial survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 1:1--11, 2000.


Web Mining - Fürnkranz (2004)   (1 citation)  (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD explorations, 1(2):1--11, January 2000.


Extraction Techniques for Mining Services from Web Sources - Davulcu, Mukherjee.. (2002)   (2 citations)  (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD: SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery and Data Mining, ACM, 1, 2000.


A Dynamic Adaptive Self-Organising Hybrid Model for Text.. - Hung, Wermter (2003)   (1 citation)  (Correct)

No context found.

S. Chakrabarti, "Data mining for hypertext: a tutorial survey", ACM SIGKDD Explorations, vol. 1, no. 2, 2000, pp. 111.


A Self-Organising Hybrid Model for Dynamic Text Clustering - Hung, Wermter (2003)   (Correct)

No context found.

Chakrabarti, S. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations, 2000, 1(2):1-11


Emergent Semantics Principles and Issues - Aberer, Cudré-Mauroux.. (2004)   (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: a tutorial survey. ACM SIGKDD Explorations Newsletter, 1(2):1--11, January 2000.


Reverse Engineering for Web Data: From Visual to Semantic.. - Chung, Gertz (2002)   (9 citations)  (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: A tutorial survey. In SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 1, 2000.


Hierarchical Document Clustering Using Frequent Itemsets - Fung (2003)   (3 citations)  (Correct)

No context found.

S. Chakrabarti. Data mining for hypertext: A tutorial survey. SIGKDD Explorations: Newsletter of the Special Interest Group (SIG) on Knowledge Discovery & Data Mining, ACM, 1:1--11, 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC