Download:
by Justin Zobel, Philip Dart
http://www.cs.ubc.ca/local/reading/proceedings/spe91-95/spe/./vol25/issue3/spe948jz.pdf
Add To MetaCart
Abstract:
Approximate string matching is used for spelling correction and personal name matching. In this paper we show how to use string matching techniques in conjunction with lexicon indexes to find approximate matches in a large lexicon. We test several lexicon indexing techniques, including n-grams and permuted lexicons, and several string matching techniques, including string similarity measures and phonetic coding. We propose methods for combining these techniques, and show experimentally that these combinations yield good retrieval effectiveness while keeping index size and retrieval time low. Our experiments also suggest that, in contrast to previous claims, phonetic codings are markedly inferior to string distance measures, which are demonstrated to be suitable for both spelling correction and personal name matching. KEY WORDS: pattern matching; string indexing; approximate matching; compressed inverted files; Soundex
Citations
|
957
|
Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer
– Salton
|
|
236
|
Fast text searching allowing errors
– Wu, Manber
- 1992
|
|
223
|
Technique for automatically correcting words in text
– Kukich
- 1992
|
|
106
|
Approximate string matching with q-grams and maximal matches
– Ukkonen
- 1992
|
|
105
|
Approximate string matching
– HALL, DOWLING
- 1980
|
|
70
|
Data compression in full-text retrieval systems
– Bell, Moffat, et al.
- 1993
|
|
57
|
An e cient indexing technique for fulltext database systems
– Zobel, at, et al.
- 1992
|
|
41
|
Automatic spelling correction using a trigram similarity measure
– Angell, Freund, et al.
- 1983
|
|
39
|
PATRICIA-Practical algorithm to retrieve information coded in alphanumeric
– MORRISON
- 1968
|
|
25
|
Getty’s synoname and its cousins: a survey of applications of personal name-matching algorithms
– Borgman, Siegfried
- 1992
|
|
22
|
Parameterised compression for sparse bitmaps
– Moffat, Zobel
- 1992
|
|
16
|
Sacks-Davis,R.,Searching Large Lexicons for partially Specified Terms using Compressed Inverted
– Zobel, Moffat
- 1993
|
|
13
|
Spelling correction for the telecommunications network for the deaf
– Kukich
- 1992
|
|
11
|
An approximate string matching algorithm
– Kim, Shawe-Taylor
- 1992
|
|
11
|
Fast approximate string matching
– Owolabi, McGregor
- 1988
|
|
10
|
Processing truncated terms in document retrieval systems
– Bratley, Choueka
- 1982
|
|
7
|
Searching for historical word forms in text databases using spelling correction methods: reverse error and phonetic coding methods
– Rogers, Willett
- 1991
|
|
7
|
Fisching fore werds’: Phonetic retrieval of written text in information systems. Program: automated library and information systems
– Gadd
- 1988
|
|
6
|
Handbook of Data Structures and Algorithms
– Gonnet, Baeza-Yates
- 1991
|
|
4
|
PHONIX: The algorithm. Program
– Gadd
- 1990
|