See this document in CiteSeerX!

An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering (2000)  (Make Corrections)  (1 citation)
Vasileios Hatzivassiloglou, Luis Gravano, Ankineedu Maganti
SIGIR 2000



  Home/Search   Context   Related

 
View or download:
columbia.edu/~gravano/P...sigir00.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  columbia.edu/~grav...publications (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase heads and proper names) in the context of document clustering. A statistical model for combining similarity information from multiple sources is described and applied to DARPA's Topic Detection and Tracking phase 2 (TDT2) data. This model, based on log-linear regression, alleviates the need for extensive search in order... (Update)

Similar documents based on text:   More   All
0.3:   Categorizing Web Queries According to - Geographical Locality Luis (2003)   (Correct)
0.2:   Characterizing Web Resources for Improved Search - Position Paper - Gravano (2000)   (Correct)
0.1:   SIMFINDER: A Flexible Clustering Tool for Summarization - Hatzivassiloglou.. (2001)   (Correct)

BibTeX entry:   (Update)

Vasileios Hatzivassiloglou, Luis Gravano, and Ankineedu Maganti. 2000. An investigation of linguistic features and clustering algorithms for topical document clustering. In Proceedings of the 23rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-00), pages 224--231, Athens, Greece, July. http://citeseer.ist.psu.edu/hatzivassiloglou00investigation.html   More

@inproceedings{ hatzivassiloglou00investigation,
    author = "Vasileios Hatzivassiloglou and Luis Gravano and Ankineedu Maganti",
    title = "An investigation of linguistic features and clustering algorithms for topical document clustering",
    booktitle = "SIGIR 2000",
    pages = "224-231",
    year = "2000",
    url = "citeseer.ist.psu.edu/hatzivassiloglou00investigation.html" }
Citations (may not include all citations):
463   Term weighting approaches in automatic text retrieval (context) - Salton, Buckley - 1988
282   Finding Groups in Data: An Introduction to Cluster Analysis (context) - Kaufman, Rousseeuw - 1990
243   Information Retrieval: Data Structures and Algorithms (context) - Frakes, Baeza-Yates - 1992
166   A re-examination of text categorization methods - Yang, Liu - 1999
93   Integration of heterogeneous databases without common domain.. - Cohen - 1998
63   Nonlinear Regression Analysis and its Applications (context) - Bates, Watts - 1988
59   Reexamining the cluster hypothesis: Scatter/Gather on retrie.. - Hearst, Pedersen - 1996
40   The Statistical Analysis of Discrete Data (context) - Santner, Duffy - 1989
38   A study on retrospective and on-line event detection - Yang, Pierce et al. - 1998
31   Progress in the application of natural language processing t.. (context) - Smeaton - 1992
27   Disambiguation of proper names in text - Wacholder, Ravin et al. - 1997
26   Towards multidocument summarization by reformulation: Progre.. - McKeown, Klavans et al. - 1999
16   the application of syntactic methodologies in automatic text.. (context) - Salton, Smith - 1989
15   MITRE: Description of the Alembic system as used for MUC (context) - Aberdeen, Burger et al. - 1995
12   Interpreting nominal compounds for information retrieval (context) - Gay, Croft - 1990
9   Text-based approaches for the categorization of images - Sable, Hatzivassiloglou - 1999
8   The beta-binomial mixture model and its application to TDT t.. - Lowe - 1999
7   Simplex NPs clustered by head: A method for identifying sign.. - Wacholder - 1998
6   UMass approaches to detection and tracking at TDT - Papka, Allan et al. - 1999
3   Available from http://www (context) - of, Technology et al. - 1998
1   Topic Detection and Tracking Principal Investigators meeting (context) - Liberman - 1998

Documents on the same site (http://www.cs.columbia.edu/~gravano/publications.html):   More
Metadata for Digital Libraries: Architecture and.. - Baldonado, Chang.. (1997)   (Correct)
Requirements for Deadlock-Free, Adaptive Packet Routing - Cypher, Gravano (1992)   (Correct)
The Stanford Digital Library Metadata Architecture - Baldonado, Chang, Gravano.. (1997)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC