(Enter summary)
Abstract: In this paper we describe an approach to model selection in unsupervised learning. This
approach determines both the feature set and the number of clusters. To this end we first derive
an objective function that explicitly incorporates this generalization. We then evaluate two
schemes for model selection - one using this objective function (a Bayesian estimation scheme
that selects the best model structure using the marginal or integrated likelihood) and the second
based on a technique using a... (Update)
Context of citations to this paper: More
...and evaluation. Finally Section 5 describes some future work in this area. 2 Models for Unsupervised Learning In previous work[14, 13] we addressed the problem of flat (non hierarchical) partitional (each data element belongs to one and only one cluster) clustering. Here...
.... of the diversity of two distributions known as mutual information (MI) This criterion that we will refer to as MIC has been suggested in [21] for unsupervised clustering and we found it rather informative. We have not chosen another criterion such as predicting occurrences of...
Cited by: More
The Organisation and Retrieval of Document Collections: A.. - Vinokourov (2003)
(Correct)
An Information-Theoretic External Cluster-Validity Measure - Dom (2001)
(Correct)
A Probabilistic Framework for the Hierarchic Organisation.. - Vinokourov, Girolami (2002)
(Correct)
Active bibliography (related documents): More All
0.5: Generalized Model Selection For Unsupervised Learning In.. - Vaithyanathan, Dom (1999)
(Correct)
0.3: Humane Interfaces to Video - Lippman, Vasconcelos, Iyengar
(Correct)
0.1: Evolutionary Model Selection in Unsupervised Learning - Kim, Street, Menczer (2002)
(Correct)
Similar documents based on text: More All
0.3: Feature Weighting in k-Means Clustering - Modha, Spangler (2002)
(Correct)
0.2: Thumbs up? Sentiment Classification using Machine.. - Pang, Lee, Vaithyanathan (2002)
(Correct)
0.1: Generalized Opinion Pooling - Ashutosh Garg Jayram
(Correct)
Related documents from co-citation: More All
3: Algorithms for model-based Gaussian hierarchical clustering
- Fraley - 1999
2: Elements of Information Theory (context) - Cover, Thomas - 1991
2: A Bayesian method for the induction of probabilistic networks from data (context) - Cooper, Herskovits - 1992
BibTeX entry: (Update)
Shivakumar Vaithyanathan and Byron Dom. Generalized model selection for unsupervised learning in high dimensions. In S. A. Solla, T. K. Leen, and K. R. Muller, editors, Proceedings of Neural Information Processing Systems. MIT Press, November 1999. http://citeseer.ist.psu.edu/vaithyanathan99generalized.html More
@misc{ vaithyanathan99generalized,
author = "S. Vaithyanathan and B. Dom",
title = "Generalized model selection for unsupervised learning in high dimensions",
text = "Shivakumar Vaithyanathan and Byron Dom. Generalized model selection for
unsupervised learning in high dimensions. In S. A. Solla, T. K. Leen, and
K. R. Muller, editors, Proceedings of Neural Information Processing Systems.
MIT Press, November 1999.",
year = "1999",
url = "citeseer.ist.psu.edu/vaithyanathan99generalized.html" }
Citations (may not include all citations):
2528
Maximum Likelihood from Incomplete Data Via the EM Algorithm (context) - Dempster - 1977
2319
Elements of Information Theory (context) - Cover, Thomas - 1991
2133
Pattern Classification and Scene Analysis (context) - Duda, Hart - 1973
568
Indexing by Latent Semantic Analysis
- Deerwester, Dumais et al. - 1990
475
Estimating the Dimension of A Model (context) - Schwarz - 1978
417
Stochastic Complexity in Statistical Inquiry (context) - Rissanen - 1989
340
Bayesian Theory (context) - Bernardo, Smith - 1994
113
Gather: A Cluster-based Approach to Browsing Large Document .. (context) - Cutting - 1992
97
Pivoted Document Length Normalization
- Singhal, Buckley et al. - 1996
80
Learning to Classify Text from Labeled and Unlabeled Documen..
- Nigam - 1998
58
Distributional Clustering of Words for Text Classification
- Baker, McCallum - 1998
54
Efficient Approximations for the Marginal Likelihood of Baye..
- Chickering, Heckerman - 1997
34
Clustering using Monte Carlo cross-validation
- Smyth - 1996
29
SONIA: A Service for Organizing Networked Information Autono..
- Sahami - 1998
27
An Experimental Comparison of Several Clustering and Initial..
- Meila, Heckerman
20
Distribution of content words and phrases in text and langua.. (context) - Katz - 1996
13
Model Selection in Unsupervised Learning with Applications t.. (context) - Vaithyanathan, Dom - 1998
12
Natural Language Engineering (context) - Church, Gale - 1995
7
Unsupervised classification with stochastic complexity
- Rissanen, Ristad - 1992
3
Clustering images using relative entropy for efficient retri..
- Iyengar - 1998
Documents on the same site (http://www.almaden.ibm.com/cs/k53/ir.html):
Inferring Web Communities from Link Topology - Gibson, Kleinberg, Raghavan (1998)
(Correct)
Enhanced Hypertext Categorization Using Hyperlinks - Chakrabarti, Dom, Indyk (1998)
(Correct)
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC