• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Data Clustering: 50 Years Beyond K-Means (2008)

Cached

  • Download as a PDF

Download Links

  • [web.cse.msu.edu]
  • [biometrics.cse.msu.edu]
  • [www.cse.msu.edu]
  • [dataclustering.cse.msu.edu]
  • [www.cse.msu.edu]
  • [www.cs.utah.edu]
  • [www.cse.msu.edu]
  • [www.cs.ucf.edu]
  • [www.cse.msu.edu]
  • [web.cs.sunyit.edu]
  • [biometrics.cse.msu.edu]
  • [web.cs.sunyit.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Anil K. Jain
Citations:289 - 7 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Jain08dataclustering:,
    author = {Anil K. Jain},
    title = {Data Clustering: 50 Years Beyond K-Means },
    year = {2008}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Organizing data into sensible groupings is one of the most fundamental modes of understanding and learning. As an example, a common scheme of scientific classification puts organisms into taxonomic ranks: domain, kingdom, phylum, class, etc.). Cluster analysis is the formal study of algorithms and methods for grouping, or clustering, objects according to measured or perceived intrinsic characteristics or similarity. Cluster analysis does not use category labels that tag objects with prior identifiers, i.e., class labels. The absence of category information distinguishes data clustering (unsupervised learning) from classification or discriminant analysis (supervised learning). The aim of clustering is exploratory in nature to find structure in data. Clustering has a long and rich history in a variety of scientific fields. One of the most popular and simple clustering algorithms, K-means, was first published in 1955. In spite of the fact that K-means was proposed over 50 years ago and thousands of clustering algorithms have been published since then, K-means is still widely used. This speaks to the difficulty of designing a general purpose clustering algorithm and the illposed problem of clustering. We provide a brief overview of clustering, summarize well known clustering methods, discuss the major challenges and key issues in designing clustering algorithms, and point out some of the emerging and useful research directions, including semi-supervised clustering, ensemble clustering, simultaneous feature selection, and data clustering and large scale data clustering.

Keyphrases

data clustering    cluster analysis    unsupervised learning    key issue    semi-supervised clustering    intrinsic characteristic    discriminant analysis    general purpose    large scale data clustering    taxonomic rank    ensemble clustering    useful research direction    prior identifier    sensible grouping    scientific field    class label    rich history    common scheme    simultaneous feature selection    category information distinguishes data clustering    scientific classification    fundamental mode    category label    major challenge    formal study    simple clustering algorithm    illposed problem    brief overview    supervised learning   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University