See this document in CiteSeerX!

TopCat: Data Mining for Topic Identification in a Text Corpus (2002)  (Make Corrections)  (9 citations)
Chris Clifton, Robert Cooley, Jason Rennie
Principles of Data Mining and Knowledge Discovery



  Home/Search   Context   Related

 
View or download:
mit.edu/~jrennie/p...cattkde2000.ps.gz
mit.edu/people/jre...cattkde2000.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  mit.edu/~jrennie/publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: TopCat (Topic Categories) is a technique for identifying topics that recur in articles in a text corpus. Natural language processing techniques are used to identify key entities in individual articles, allowing us to represent an article as a set of items. This allows us to view the problem in a database/data mining context: Identifying related groups of items. This paper presents a novel method for identifying related items based on "traditional" data mining techniques. Frequent itemsets are... (Update)

Cited by:   More
A Simple Algorithm for Topic Identification in 0-1 Data - Seppänen, Bingham, Mannila (2003)   (Correct)
A Fast Fixed-Point Algorithm For Independent Component.. - Bingham, Hyvärinen (2000)   (Correct)
Using Generic Corpora to Learn Domain-Specific - Terminology David Vogel (2003)   (Correct)

Similar documents (at the sentence level):
25.0%:   TopCat: Data Mining for Topic Identification in a Text Corpus - Clifton, Cooley (1999)   (Correct)

Active bibliography (related documents):   More   All
0.4:   Web Usage Mining: Discovery and Application of Interestin.. - Cooley (2000)   (Correct)
0.4:   Algorithms for Association Rule Mining - A General.. - Hipp, Güntzer.. (2000)   (Correct)
0.3:   Incremental Mining of Constrained Associations - Thomas, Chakravarthy (1998)   (Correct)

Similar documents based on text:   More   All
0.6:   GeoNODE: An End-to-End System from Research Components - Clifton, Griffith, Holland (1991)   (Correct)
0.3:   MITRE TDT-2000 Segmentation System - Greiff, Morgan, Fish, Richards.. (2000)   (Correct)
0.3:   Multilingual Topic Detection Using A Parallel Corpus - Lam, Meng, Hui (2000)   (Correct)

Related documents from co-citation:   More   All
4:   part I: An adaptive algorithm based on neuromimetic architecture (context) - Jutten, Herault et al. - 1991
4:   A generative model for sparse discrete bianry data with non-uniform categorical .. (context) - Girolami - 2000
4:   Similarity of attributes by external probes - Das, Mannila et al. - 1997

BibTeX entry:   (Update)

Clifton, C. and Cooley, R., TopCat: data mining for topic identification in a text corpus. In Proceedings of the 3rd European Conference of Principles and Practice of Knowledge Discovery in Databases, Prague, Czech Republic, 1999. http://citeseer.ist.psu.edu/article/clifton02topcat.html   More

@inproceedings{ clifton99topcat,
    author = "Chris Clifton and Robert Cooley",
    title = "TopCat: Data Mining for Topic Identification in a Text Corpus",
    booktitle = "Principles of Data Mining and Knowledge Discovery",
    pages = "174-183",
    year = "1999",
    url = "citeseer.ist.psu.edu/article/clifton02topcat.html" }
Citations (may not include all citations):
1256   Introduction to Modern Information Retrieval (context) - Salton, McGill - 1983
921   Mining association rules between sets of items in large data.. - Agrawal, Imielinski et al. - 1993
376   Text categorization with support vector machines: Learning w.. - Joachims - 1998
268   Mining generalized association rules - Srikant, Agrawal - 1995
215   A comparative study on feature selection in text categorizat.. - Yang, Pedersen - 1997
202   Introduction to wordnet: an on-line lexical database (context) - Miller, Fellbaum et al. - 1990
162   Multilevel hypergraph partitioning: Applications in VLSI dom.. - Karypis, Aggarwal et al. - 1997
54   Automatic structuring and retrieval of large text les (context) - Salton, Allan et al. - 1994
51   Word sense disambiguation using conceptual density - Agirre, Rigau - 1996
44   Natural language processing for information retrieval - Lewis, Jones - 1996
43   Beyond market baskets: Generalizing association rules to dep.. - Silverstein, Brin et al. - 1998
39   Fast and intuitive clustering of web documents - Zamir, Etzioni et al. - 1997
28   Discovering trends in text databases - Lent, Agrawal et al. - 1997
27   Mixed initiative development of language processing systems (context) - Day, Aberdeen et al. - 1997
20   Retrieval performance in FERRET: A conceptual information re.. (context) - Mauldin - 1991
16   A WordNet-based algorithm for word sense disambiguation - Li, Szpakowicz et al. - 1995
15   Maximal association rules: a new tool for mining for keyword.. (context) - Feldman, Aumann et al. - 1997
13   Language-oriented information retrieval (context) - Lewis, Croft et al. - 1989
11   A method for word sense disambiguation of unrestricted text - Mihalcea, Molovan - 1999
11   Exploiting background information in knowledge discovery fro.. (context) - Feldman, Hirsh - 1998
11   Clustering based on association rule hypergraphs (context) - Sam, George et al. - 1997
10   Generating association rules from semi-structured documents .. - Singh, Scheuermann et al. - 1997
9   Information extraction as a basis for high-precision text cl.. - Rilo, Lehnert - 1994
7   Computing Analysis of Present-day American English (context) - Francis, Kucera - 1967
7   Automated Library and Information Systems (context) - Porter, for et al. - 1980
6   An ecient algorithm for the incremental updation of associat.. (context) - Thomas, Bodagala et al. - 1997
5   Ecient algorithms for discovering frequent sets in increment.. (context) - Feldman, Aumann et al. - 1997
5   GeoNODE: Visualizing news in geospatial context (context) - Hyland, Clifton et al. - 1999
4   Query ocks: A generalization of association rule mining (context) - Tsur, rey et al. - 1998
4   European Conference on Machine Learning Workshop on Text Min.. (context) - Kodrato - 1998
4   ICML-99 Workshop on Machine Learning in Text Data Analysis (context) - Mladeni, Marko et al. - 1999
4   IJCAI'99 Workshop on Text Mining (context) - Feldman, Hirsh - 1999
3   Classication of news stories using support vector machines (context) - Cooley - 1999
3   classication and signature generation for organizing large t.. (context) - Chakrabarti, Dom et al. - 1998
1   tdt2 dec98 ocial results 19990204/index (context) - detection, phase et al.
1   st European Symposium on Principles of Data Mining and Knowl.. (context) - Ahonen, Heinonen et al. - 1997



The graph only includes citing articles where the year of publication is known.


Documents on the same site (http://www.ai.mit.edu/~jrennie/publications.html):   More
A Machine Learning Approach to Building.. - McCallum, Nigam.. (1999)   (Correct)
Automating the Construction of Internet Portals with.. - McCallum, Nigam.. (2000)   (Correct)
ifile: An Application of Machine Learning to E-Mail Filtering - Rennie (2000)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC