• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Classification in Networked Data: A toolkit and a univariate case study (2006)

Cached

  • Download as a PDF

Download Links

  • [www.cs.rutgers.edu]
  • [www.research.rutgers.edu]
  • [www.research.rutgers.edu]
  • [www.cs.purdue.edu]
  • [www.research.rutgers.edu]
  • [www.jmlr.org]
  • [jmlr.org]
  • [archive.nyu.edu]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Sofus A. Macskassy , Foster Provost
Citations:197 - 10 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Macskassy06classificationin,
    author = {Sofus A. Macskassy and Foster Provost},
    title = {Classification in Networked Data: A toolkit and a univariate case study},
    year = {2006}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

This paper is about classifying entities that are interlinked with entities for which the class is known. After surveying prior work, we present NetKit, a modular toolkit for classification in networked data, and a case-study of its application to networked data used in prior machine learning research. NetKit is based on a node-centric framework in which classifiers comprise a local classifier, a relational classifier, and a collective inference procedure. Various existing node-centric relational learning algorithms can be instantiated with appropriate choices for these components, and new combinations of components realize new algorithms. The case study focuses on univariate network classification, for which the only information used is the structure of class linkage in the network (i.e., only links and some class labels). To our knowledge, no work previously has evaluated systematically the power of class-linkage alone for classification in machine learning benchmark data sets. The results demonstrate that very simple network-classification models perform quite well—well enough that they should be used regularly as baseline classifiers for studies of learning with networked data. The simplest method (which performs remarkably well) highlights the close correspondence between several existing methods introduced for different purposes—i.e., Gaussian-field classifiers, Hopfield networks, and relational-neighbor classifiers. The case study also shows that there are two sets of techniques that are preferable in different situations, namely when few versus many labels are known initially. We also demonstrate that link selection plays an important role similar to traditional feature selection.

Keyphrases

networked data    univariate case study    case study    node-centric framework    modular toolkit    appropriate choice    important role    class label    relational classifier    relational-neighbor classifier    new combination    prior work    different situation    univariate network classification    class linkage    new algorithm    link selection    present netkit    machine learning benchmark data set    class-linkage alone    simple network-classification model    node-centric relational learning algorithm    baseline classifier    local classifier    collective inference procedure    traditional feature selection    gaussian-field classifier    prior machine    close correspondence    many label    hopfield network    different purpose   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University