(Enter summary)
Abstract: We investigate four hierarchical clustering methods (single-link, complete-link, groupwise-average, and single-pass) and two linguistically motivated text features (noun phrase heads and proper names) in the context of document clustering. A statistical model for combining similarity information from multiple sources is described and applied to DARPA's Topic Detection and Tracking phase 2 (TDT2) data. This model, based on log-linear regression, alleviates the need for extensive search in order... (Update)
Similar documents based on text: More All
0.3: Categorizing Web Queries According to - Geographical Locality Luis (2003)
(Correct)
0.2: Characterizing Web Resources for Improved Search - Position Paper - Gravano (2000)
(Correct)
0.1: SIMFINDER: A Flexible Clustering Tool for Summarization - Hatzivassiloglou.. (2001)
(Correct)
BibTeX entry: (Update)
Vasileios Hatzivassiloglou, Luis Gravano, and Ankineedu Maganti. 2000. An investigation of linguistic features and clustering algorithms for topical document clustering. In Proceedings of the 23rd Annual ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR-00), pages 224--231, Athens, Greece, July. http://citeseer.ist.psu.edu/hatzivassiloglou00investigation.html More
@inproceedings{ hatzivassiloglou00investigation,
author = "Vasileios Hatzivassiloglou and Luis Gravano and Ankineedu Maganti",
title = "An investigation of linguistic features and clustering algorithms for topical document clustering",
booktitle = "SIGIR 2000",
pages = "224-231",
year = "2000",
url = "citeseer.ist.psu.edu/hatzivassiloglou00investigation.html" }
Citations (may not include all citations):
463
Term weighting approaches in automatic text retrieval (context) - Salton, Buckley - 1988
282
Finding Groups in Data: An Introduction to Cluster Analysis (context) - Kaufman, Rousseeuw - 1990
243
Information Retrieval: Data Structures and Algorithms (context) - Frakes, Baeza-Yates - 1992
166
A re-examination of text categorization methods
- Yang, Liu - 1999
93
Integration of heterogeneous databases without common domain..
- Cohen - 1998
63
Nonlinear Regression Analysis and its Applications (context) - Bates, Watts - 1988
59
Reexamining the cluster hypothesis: Scatter/Gather on retrie..
- Hearst, Pedersen - 1996
40
The Statistical Analysis of Discrete Data (context) - Santner, Duffy - 1989
38
A study on retrospective and on-line event detection
- Yang, Pierce et al. - 1998
31
Progress in the application of natural language processing t.. (context) - Smeaton - 1992
27
Disambiguation of proper names in text
- Wacholder, Ravin et al. - 1997
26
Towards multidocument summarization by reformulation: Progre..
- McKeown, Klavans et al. - 1999
16
the application of syntactic methodologies in automatic text.. (context) - Salton, Smith - 1989
15
MITRE: Description of the Alembic system as used for MUC (context) - Aberdeen, Burger et al. - 1995
12
Interpreting nominal compounds for information retrieval (context) - Gay, Croft - 1990
9
Text-based approaches for the categorization of images
- Sable, Hatzivassiloglou - 1999
8
The beta-binomial mixture model and its application to TDT t..
- Lowe - 1999
7
Simplex NPs clustered by head: A method for identifying sign..
- Wacholder - 1998
6
UMass approaches to detection and tracking at TDT
- Papka, Allan et al. - 1999
3
Available from http://www (context) - of, Technology et al. - 1998
1
Topic Detection and Tracking Principal Investigators meeting (context) - Liberman - 1998
Documents on the same site (http://www.cs.columbia.edu/~gravano/publications.html): More
Metadata for Digital Libraries: Architecture and.. - Baldonado, Chang.. (1997)
(Correct)
Requirements for Deadlock-Free, Adaptive Packet Routing - Cypher, Gravano (1992)
(Correct)
The Stanford Digital Library Metadata Architecture - Baldonado, Chang, Gravano.. (1997)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC