• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

DMCA

A comparison of string distance metrics for name-matching tasks (2003)

Cached

  • Download as a PDF

Download Links

  • [secondstring.sourceforge.net]
  • [www-2.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.utexas.edu]
  • [www.cs.cmu.edu]
  • [www.cs.cmu.edu]
  • [www.isi.edu]
  • [www.inf.fu-berlin.de]
  • [dc-pubs.dbs.uni-leipzig.de]
  • [www.niss.org]
  • [nisla05.niss.org]
  • [nisla05.niss.org]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by William W. Cohen , Pradeep Ravikumar , Stephen E. Fienberg
Citations:448 - 11 self
  • Summary
  • Citations
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Cohen03acomparison,
    author = {William W. Cohen and Pradeep Ravikumar and Stephen E. Fienberg},
    title = {A comparison of string distance metrics for name-matching tasks},
    booktitle = {},
    year = {2003},
    pages = {73--78}
}

Share

Facebook Twitter Reddit Bibsonomy

OpenURL

 

Abstract

Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.

Keyphrases

distance metric    name-matching task    best-performing method    different metric    entity name    java toolkit    token-based distance metric    probabilistic record linkage community    hybrid scheme    information retrieval    hybrid method    heuristic string comparators    name-matching method    tfidf weighting scheme    jaro-winkler string-distance scheme    different community    edit-distance metric   

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University