Translation-Invariant Mixture Models for Curve Clustering (2003)
| Venue: | In Proc. Ninth ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining, Washington D.C., August 24–27 |
| Citations: | 10 - 2 self |
BibTeX
@INPROCEEDINGS{Chudova03translation-invariantmixture,
author = {Darya Chudova and Scott Gaffney and Eric Mjolsness and Padhraic Smyth},
title = {Translation-Invariant Mixture Models for Curve Clustering},
booktitle = {In Proc. Ninth ACM SIGKDD Inter. Conf. on Knowledge Discovery and Data Mining, Washington D.C., August 24–27},
year = {2003},
pages = {79--88},
publisher = {ACM Press}
}
OpenURL
Abstract
In this paper we present a family of algorithms that can simultaneously align and cluster sets of multidimensional curves defined on a discrete time grid. Our approach assumes that the data are being generated from a finite mixture of curve models. Each mixture component uses (a) a mean curve based on a flexible non-parametric representation, (b) additive measurement noise, (c) randomly selected discrete-valued shifts of each curve with respect to the independent variable (i.e., typically along the time axis), and (d) random real-valued o#sets of each curve with respect to the observed variable. We show that the Expectation-Maximization (EM) algorithm can be used to simultaneously recover both the curve models for each cluster, and the most likely shifts, o#sets, and cluster memberships for each curve. We demonstrate how Bayesian estimation methods can improve the results for small sample sizes by enforcing smoothness in the cluster mean curves. We evaluate the methodology on two real-world data sets, time-course gene expression data and storm trajectory data. Experimental results show that models that incorporate curve alignment systematically provide improvements in predictive power on test data sets. The proposed approach provides a non-parametric, computationally e#cient, and robust methodology for clustering broad classes of curve data.







