• Documents
  • Authors
  • Tables

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

DMCA

Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function

Cached

  • Download as a PDF

Download Links

  • [www.aaai.org]
  • [www.cs.umass.edu]
  • [www2.selu.edu]
  • [www.selu.edu]
  • [www.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [ciir-publications.cs.umass.edu]
  • [people.cs.umass.edu]
  • [maroo.cs.umass.edu]
  • [cs.iit.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [people.cs.umass.edu]
  • [works.bepress.com]

  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Aron Culotta , Pallika Kanani , Robert Hall , Michael Wick , Andrew Mccallum
Citations:19 - 7 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@MISC{Culotta_authordisambiguation,
    author = {Aron Culotta and Pallika Kanani and Robert Hall and Michael Wick and Andrew Mccallum},
    title = {Author Disambiguation using Error-driven Machine Learning with a Ranking Loss Function},
    year = {}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Author disambiguation is the problem of determining whether records in a publications database refer to the same person. A common supervised machine learning approach is to build a classifier to predict whether a pair of records is coreferent, followed by a clustering step to enforce transitivity. However, this approach ignores powerful evidence obtainable by examining sets (rather than pairs) of records, such as the number of publications or co-authors an author has. In this paper we propose a representation that enables these first-order features over sets of records. We then propose a training algorithm well-suited to this representation that is (1) error-driven in that training examples are generated from incorrect predictions on the training data, and (2) rankbased in that the classifier induces a ranking over candidate predictions. We evaluate our algorithms on three author disambiguation datasets and demonstrate error reductions of up to 60% over the standard binary classification approach.

Keyphrases

author disambiguation    ranking loss function    error-driven machine learning    author disambiguation datasets    training data    standard binary classification approach    clustering step    publication database refer    first-order feature    common supervised machine    powerful evidence    candidate prediction    incorrect prediction    error reduction    training example   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University