MetaCartSign in to MyCiteSeer

Include Citations | Advanced Search | Help

Include Citations | Advanced Search | Help

  AND

Download:
Download as a PDF
by Justin Zobel, Philip Dart
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol25/issue3/spe948jz.pdf
Add To MetaCart

Abstract:

Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex

Citations

957 Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer – Salton
236 Fast text searching allowing errors – Wu, Manber - 1992
223 Technique for automatically correcting words in text – Kukich - 1992
106 Approximate string matching with q-grams and maximal matches – Ukkonen - 1992
105 Approximate string matching – HALL, DOWLING - 1980
70 Data compression in full-text retrieval systems – Bell, Moffat, et al. - 1993
57 An e cient indexing technique for fulltext database systems – Zobel, at, et al. - 1992
41 Automatic spelling correction using a trigram similarity measure – Angell, Freund, et al. - 1983
39 PATRICIA-Practical algorithm to retrieve information coded in alphanumeric – MORRISON - 1968
25 Getty’s synoname and its cousins: a survey of applications of personal name-matching algorithms – Borgman, Siegfried - 1992
22 Parameterised compression for sparse bitmaps – Moffat, Zobel - 1992
16 Sacks-Davis,R.,Searching Large Lexicons for partially Specified Terms using Compressed Inverted – Zobel, Moffat - 1993
13 Spelling correction for the telecommunications network for the deaf – Kukich - 1992
11 An approximate string matching algorithm – Kim, Shawe-Taylor - 1992
11 Fast approximate string matching – Owolabi, McGregor - 1988
10 Processing truncated terms in document retrieval systems – Bratley, Choueka - 1982
7 Searching for historical word forms in text databases using spelling correction methods: reverse error and phonetic coding methods – Rogers, Willett - 1991
7 Fisching fore werds’: Phonetic retrieval of written text in information systems. Program: automated library and information systems – Gadd - 1988
6 Handbook of Data Structures and Algorithms – Gonnet, Baeza-Yates - 1991
4 PHONIX: The algorithm. Program – Gadd - 1990