(Enter summary)
Abstract: A highly-efficient ANSI-C facility is described for intelligently comparing
a query string with a series of database strings. The bipartite
weighted matching approach taken tolerates ordering violations that are
problematic for simple automaton or string edit distance methods---yet
common in practice. The method is character and polygraph based
and does not require that words are properly formed in a query.
Database characters are processed at a rate of approximately
2.5 million per second... (Update)
Context of citations to this paper: More
...is the number of insertions, deletions, or substitutions of letters required to transform one string into another. In LikeIt [28, 29] a string distance is based on an algorithm that tries to build an optimal weighted matching of the letters and multi graphs (groups of...
.... the edit distance between strings r and s is O(len(r) len(s) where len(r) and len(s) are the lengths of r and s, respectively [YK97] De nition 2.2 (Similar String Join Problem) Given two sets R and S of strings and a prede ned value k, we want to nd pairs of...
Cited by: More
Sentence 782 of The New C Standard - Jones (2003)
(Correct)
Efficient Record Linkage in Large Data Sets - Liang Jin Chen (2003)
(Correct)
Efficient Similarity String Joins in Large Data Sets - Jin, Li, Mehrotra (2002)
(Correct)
Active bibliography (related documents): More All
0.9: A Bipartite Matching Approach to Approximate String.. - Buss, Yianilos (1995)
(Correct)
0.2: Hash Table Sizes for Storing N-Grams for Text Processing - Gu, Berleant
(Correct)
0.2: CACTUS - Clustering Categorical Data Using Summaries - Ganti, Gehrke, Ramakrishnan (1999)
(Correct)
Similar documents based on text: More All
0.4: Face Recognition using Mixture-Distance and Raw Images - Steve Lawrence Peter
(Correct)
0.2: Solving the Minimum-Cost Matching Problem for.. - Buss.. (1994)
(Correct)
0.1: STARI: A TECHNIQUE for High-Bandwidth COMMUNICATION - Greenstreet (1993)
(Correct)
Related documents from co-citation: More All
5: Techniques for Automatically Correcting Words in Text (context) - Kukich - 1992
4: Data-Mining and Visualization of Traditional and Multimedia Datasets (context) - Faloutsos, Lin et al. - 1995
4: Approximate String Joins in a Database (context) - Gravano, Ipeirotis et al. - 2001
BibTeX entry: (Update)
Peter Yianilos. The LikeIt intelligent string comparison facility. Technical Report 97-093, NEC Research Institute, 1997, http://www.neci.nj.nec.com/homepages/pny/papers/likeit/main.html. http://citeseer.ist.psu.edu/yianilos97likeit.html More
@techreport{ yianilos97likeit,
author = "Peter Yianilos",
title = "The {LikeIt} Intelligent String Comparison Facility",
number = "97-093",
year = "1997",
url = "citeseer.ist.psu.edu/yianilos97likeit.html" }
Citations (may not include all citations):
347
Fast pattern matching in strings (context) - Knuth, Morris et al. - 1977
118
GLIMPSE: A tool to search through entire file systems
- Manber, Wu - 1994
83
Approximate string matching (context) - Hall, Dowling - 1980
31
Gauging similarity with n-grams: Languageindependent categor.. (context) - Damashek - 1995
15
Two special cases of the assignment problem (context) - Karp, Li - 1975
10
time minimumcost matching algorithms for quasi-convex tours (context) - Buss, Yianilos et al. - 1994
7
computation and application of symbol string similarity func.. (context) - Yianilos, definition - 1978
6
Macromolecules: The Theory and Practice of Sequence Comparis.. (context) - Sankoff, Kruskal - 1983
4
A large bibliography on theory/foundations of computer scien.. (context) - Seiferas - 1996
4
Acquaintance: A novel vector-space n-gram technique for docu.. (context) - Huffman, Damashek - 1995
3
Fast test searching allowing errors (context) - Wu, Manber - 1993
1
The PF474 -- a coprocessor for string comparison (context) - Rosenthal - 1984
1
NEC Research Insitute (context) - matching, approximate et al. - 1995
1
First look -- friendly program doesn't need exact match to f.. (context) - Miller - 1987
1
Commercial Software for the IBM-PC (context) - Technology, Friendly - 1987
The graph only includes citing articles where the year of publication is known.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC