Results 1 - 10
of
492,982
Probabilistic String Similarity Joins
"... Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing (determ ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Edit distance based string similarity join is a fundamental operator in string databases. Increasingly, many applications in data cleaning, data integration, and scientific computing have to deal with fuzzy information in string attributes. Despite the intensive efforts devoted in processing
Adaptive Duplicate Detection Using Learnable String Similarity Measures
- In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2003
, 2003
"... The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied on generic or manually tuned distance metrics for estimating the similarity of potential duplicates. In this paper, we p ..."
Abstract
-
Cited by 332 (14 self)
- Add to MetaCart
's domain. We present two learnable text similarity measures suitable for this task: an extended variant of learnable string edit distance, and a novel vector-space based measure that employs a Support Vector Machine (SVM) for training. Experimental results on a range of datasets show that our framework can
Survey of Scalable String Similarity Joins
"... Abstract — Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins. The ..."
Abstract
- Add to MetaCart
Abstract — Similarity Join is an important operation in data integration and cleansing, record linkage, data deduplication and pattern matching. It finds similar sting pairs from two collections of strings. Number of approaches have been proposed as well as compared for string similarity joins
Weighted Set-Based String Similarity
"... Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database whos ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Consider a universe of tokens, each of which is associated with a weight, and a database consisting of strings that can be represented as subsets of these tokens. Given a query string, also represented as a set of tokens, a weighted string similarity query identifies all strings in the database
String Similarity Metrics for Ontology Alignment
"... Abstract. Ontology alignment is an important part of enabling the se-mantic web to reach its full potential. The vast majority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate a ..."
Abstract
- Add to MetaCart
Abstract. Ontology alignment is an important part of enabling the se-mantic web to reach its full potential. The vast majority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate
String Similarity Joins: An Experimental Evaluation
"... String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly com ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly
String Similarity Measures and Joins with Synonyms
"... A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings “Sam ” and “Samuel ” can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities, ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
A string similarity measure quantifies the similarity between two text strings for approximate string matching or comparison. For example, the strings “Sam ” and “Samuel ” can be considered similar. Most existing work that computes the similarity of two strings only considers syntactic similarities
Alignment-Based Discriminative String Similarity
"... A character-based measure of similarity is an important component of many natural language processing systems, including approaches to transliteration, coreference, word alignment, spelling correction, and the identification of cognates in related vocabularies. We propose an alignment-based discrimi ..."
Abstract
- Add to MetaCart
-based discriminative framework for string similarity. We gather features from substring pairs consistent with a character-based alignment of the two strings. This approach achieves exceptional performance; on nine separate cognate identification experiments using six language pairs, we more than double the precision
Automatic Construction of Weighted String Similarity Measures
"... String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented. ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
String similarity metrics are used for several purposes in text-processing. One task is the extraction of cognates from bilingual text. In this paper three approaches to the automatic generation of language dependent string matching functions are presented.
Experiments with String Similarity Measures in the EBMT Framework
- In Proceedings of the RANLP 2007 Conference
, 2007
"... Measuring string similarity is a frequently used technique in various Language Technology (LT) applications, such as: Spell checkers, Translation ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Measuring string similarity is a frequently used technique in various Language Technology (LT) applications, such as: Spell checkers, Translation
Results 1 - 10
of
492,982