• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 492,982
Next 10 →

Probabilistic String Similarity Joins

by Jeffrey Jestes, Feifei Li, Zhepeng Yan, Ke Yi
"... Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing (determ ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing

Adaptive Duplicate Detection Using Learnable String Similarity Measures

by Mikhail Bilenko, Raymond J. Mooney - In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003 , 2003
"... The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we p ..."
Abstract - Cited by 332 (14 self) - Add to MetaCart
's domain. We present two learnable text similarity measures suitable for this task: an extended variant of learnable string edit distance, and a novel vector-space based measure that employs a Support Vector Machine (SVM) for training. Experimental results on a range of datasets show that our framework can

Survey of Scalable String Similarity Joins

by Khalid F. Alfatmi, Archana S. Vaidya
"... Abstract — Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins. The ..."
Abstract - Add to MetaCart
Abstract — Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins

Weighted Set-Based String Similarity

by Marios Hadjieleftheriou, Divesh Srivastava
"... Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whos ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database

String Similarity Metrics for Ontology Alignment

by Core Scholar, Michelle Cheatham, Pascal Hitzler, Michelle Cheatham, Pascal Hitzler
"... Abstract. Ontology alignment is an important part of enabling the se-mantic web to reach its full potential. The vast majority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate a ..."
Abstract - Add to MetaCart
Abstract. Ontology alignment is an important part of enabling the se-mantic web to reach its full potential. The vast majority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate

String Similarity Joins: An Experimental Evaluation

by Yu Jiang, Guoliang Li, Jianhua Feng, Wen-syan Li
"... String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly com ..."
Abstract - Cited by 8 (3 self) - Add to MetaCart
String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly

String Similarity Measures and Joins with Synonyms

by Jiaheng Lu, Chunbin Lin, Wei Wang, Chen Li, Haiyong Wang
"... A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings “Sam ” and “Samuel ” can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings “Sam ” and “Samuel ” can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities

Alignment-Based Discriminative String Similarity

by unknown authors
"... A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discrimi ..."
Abstract - Add to MetaCart
-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance; on nine separate cognate identification experiments using six language pairs, we more than double the precision

Automatic Construction of Weighted String Similarity Measures

by Jörg Tiedemann
"... String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented. ..."
Abstract - Cited by 32 (1 self) - Add to MetaCart
String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented.

Experiments with String Similarity Measures in the EBMT Framework

by Natalia Elita, Monica Gavrila, Cristina Vertan - In Proceedings of the RANLP 2007 Conference , 2007
"... Measuring string similarity is a frequently used technique in various Language Technology (LT) applications, such as: Spell checkers, Translation ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
Measuring string similarity is a frequently used technique in various Language Technology (LT) applications, such as: Spell checkers, Translation
Next 10 →
Results 1 - 10 of 492,982
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University