| James L. Peterson. Computer programs for detecting and correcting spelling errors. Commun. ACM, 23(12):676--687, 1980. |
....bas been given to the detection and correction of such errors. Different ncthods have been proposed, some completely automatic, thers meant to assist humans in proof reading, sone practical and usable, others of theoretical interest only. For a good overview of this tield of research we suggest [Peterson 80] while [Pollock 82] contains an extensive bibliography, Despite years of reseamh, the detection add correction of typographical errors remains a problem not entirely re solved. Commercial software as well as state of the art techniques described in the literature can only propose approximate ....
Peterson, J. L., Computer programs for detecting and correcting spelling errors, Comm. ACM, 1980, 23, pp. 676-687.
....proposed by de la Briandais [16] and the term trie was coined by Fredkin as an derivative from information retrieval [22] Tries are widely used to represent a set of strings [1, 5, 29] that is, for dictionary management. The range of applications encompasses natural language processing [7, 36], pattern matching [20, 42] searching for reserved words for a compiler [2] for IP routing tables [34] and text compression [9] Balanced tries (represented in a single array) are used as a data structure for dynamic hashing [18] This broad range of applications justifies the view of tries as ....
J. L. Peterson. Computer programs for detecting and correcting spelling errors. Communications of the ACM, 23(12):676--686, 1980. 29
.... of the most studied problems in computer science [5, 26, 17, 15, 9, 12] The main approach is based on edit distance [28] Edit distance is the minimum number of operations on individual characters (e.g. substitutions, insertions, and deletions) needed to transform one string of symbols to another [39, 17, 27]. In the survey by [17] the authors consider two different problems, one under the definition of equivalence and a second using similarity. Their definition of equivalence allows only small differences in the two strings. For examples, they allow alternate spellings of the same word, and ignore ....
J. Peterson. Computer programs for detecting and correcting spelling errors. Communications of the ACM, 23(12):676--687 (1980).
.... 1988; Chang and Lampe, 1992; Du and Chang, 1994] The main approach is based on edit distance [Levenshtein, 1966] Edit distance is the minimum number of operations on individual characters (e.g. substitutions, insertions, and deletions) needed to transform one string of symbols to another [Peterson, 1980; Hall and Dowling, 1980; Kukich, 1992] In the survey by Hall and Dowling (1980) the authors consider two different problems, one under the definition of equivalence and a second using similarity. Their definition of equivalence allows only small differences in the two strings. For examples, ....
J. Peterson. Computer programs for detecting and correcting spelling errors. Communications of the ACM, 23(12):676--687, 1980.
....Approximate Matching Nearest Neighbor(s) e.g. find the closest city to Berlin. Applications: ffl Geographic Information Systems (GIS) ffl Searching for names or addresses ( Smith or Schmit ) ffl Searching bibliographic entries ( CACM for Comm. of ACM ) ffl Spelling error correction [Pet80] DLS83] AFW83] ffl OCR error correction [JSB91] ffl Query by image content ffl Fingerprints [ON89] Two approaches: 1. points embedded in a coordinate space 2. triangular inequality 24 4.4.1 Co ordinate space methods ffl k d tree [FBF77] Log time ffl Grid spiral search [BWY80] ....
J.L. Peterson. Computer programs for detecting and correcting spelling errors. CACM, 23(12):676--687, December 1980.
....misspellings, and random character errors, and they tend to be both language specific and domain specific. The purely statistical characterisation of text in terms of its constituent N Grams has been applied to text analysis and document processing, including spelling and error correction [3], text compression [4] language identification [5] and text search and retrieval [6] Basing on this statistical characterisation, M. Damashek [7] proposes a simple but novel vector space technique that makes sorting, clustering and retrieval feasible in a large multilingual collection of ....
Peterson, J.L.: Computer Programs for Detecting and Correcting Spelling Errors. Comm ACM 23 (1980) 676.
....algorithms rather than editing commands. In essence, we are performing a smart di erencing of versions. This is substantially di erent than the edit distance algorithms typically used to detect di erences as used in Unix di or spell checking algorithms [Hall and Dowling 1980; Kukich 1992; Peterson 1980]. These algorithms operate on a string representation and support the four operations of substitution, transposition, insertion, and deletion. The algorithms are not robust with respect to ordering changes. Also, they result in identifying changes in terms of string differences which do not assist ....
Peterson, J. 1980. Computer programs for detecting and correcting spelling errors. Communciations of the ACM 23, 676-687.
....HCI research into human errors has generally focussed on cognitive errors and their causes, ignoring physical errors of the kind produced by erratic motor control. This despite evidence that keyboarding errors are an important and significant source of errors, particularly in large databases (Peterson, 1980). Finding data produced by people with disabilities is even more difficult. Evaluation of keyboards, mice and applications is almost invariably carried out either with expert users, or with users who are typical of the intended user population, or of sub classes of that population (Karat, 1988) ....
Peterson, J. (1980). Computer programs for detecting and correcting spelling errors. Communications of the A.C.M., 23(12):676--687.
....Inexact sequence comparison is also used extensively in the comparison of biological macro molecules where noisy version of proteins strings are compared with their exact forms typically stored in large protein databases. The literature on spelling correction is extensive; indeed, the reviews [2,20] list hundreds of publications that tackle the problem from various perspectives. Damareau [See references in 2,20] was probably the first researcher to observe that most of the errors that were found in strings were either a single Substitution, Insertion and Deletion (SID) or reversal ....
.... are the ones used to compute the Longest Common Subsequence (LCS) of two strings due to Hirschberg [3,4] Hunt and Szymanski [5] and others [21] The complexity of the LCS problem has been studied by Aho et al. 1] String correction using the GLD as a criterion has been done for strings [2,20,21], substrings [9] dictionaries treated as generalized tries [7] and for grammars [21] Besides these deterministic techniques, various probabilistic methods have been studied in the literature [2,20,10] Pat. Recog. with Subst. Insert. Delet. and Gen. Transpos. Errors. Page 3 Although some ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980). Pat. Recog. with Subst., Insert., Delet. and Gen. Transpos. Errors. Page 15
....of 1023 words, comprises a very large proportion of written English text. Thus, in both string processing and string recognition it is not uncommon to represent the dictionary as a finite set of words, and using this model, string correction can be achieved using a suitable similarity metric [14 18,31,32,37,39,41]. The advantages of using a finite dictionary in text recognition applications are many. First of all, the accuracy of the recognition is very high. Secondly, a noisy string is never recognized as a word which is not in the language, and thus, the question of meaningless decisions is irrelevant. ....
.... keyboard optimization algorithms and the associated data structures that are encountered, such as suffix trees and their generalizations (See the references listed above) Models which utilize the positional bigrams (and their variants) of the language have also been reported (See references in [37,41]) In this paper, we shall present a solution which, to our knowledge, is the first reported solution to the string taxonomy problem. In particular, we shall assume that we are dealing with a finite dictionary, H = X 1 , X J . We intend to partition H into K equi sized sub dictionaries. The ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....Inexact sequence comparison is also used extensively in the comparison of biological macromolecules where noisy version of proteins strings are compared with their exact forms typically stored in large protein databases. The literature on spelling correction is extensive; indeed, the reviews [HD80,Pe80] list hundreds of publications that tackle the problem from various perspectives. Damareau [See references in Oo87] was probably the first researcher to observe that most of the errors that were found in strings were either a single Substitution, Insertion and Deletion (SID) or reversal ....
.... are the ones used to compute the Longest Common Subsequence (LCS) of two strings (Hirschberg [Hi77] Hunt and Szymanski [HS77] and others [SK83] The complexity of the LCS problem has been studied by Aho et al. AHU76] String correction using the GLD as a criterion has been done for strings [Pe80,SK83], substrings [KO83] dictionaries treated as generalized tries [KO83] and for grammars [SK83] Besides these deterministic techniques, various probabilistic methods have been studied in the literature [Pe80,KO84] Although some work has been done to extend the traditional set of SID operations to ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....deletion and insertion operations of the individual symbols of the alphabet. In the literature, these operations are the most popular, because the general string editing problem has been studied using these operations Optimal and Information Theoretic Syntactic Pattern Recognition. Page 6 [3,4,5,6,7,8,9,18,19,20,21,22,23,24,25,26,27,35,36,37,38,39,40,41,42,43], and furthermore, these operations can also be used to study problems involving subsequences and supersequences [6,27,37,38,39] Also, since most errors found in text based systems are substitution, deletion and insertion errors (and even the common transposition error can be modelled easily as a ....
....to achieve the computation with a single quadratic array. We omit these details here in the interest of brevity. ii) A note about the modus operandus of the proof of computing Pr[Y U] is not out of place. The techniques used here are not merely straightforward dynamic programming principles [5,6,18,24,25,26,27,37,38,41,42,43,44] applicable to the scenario when the computational operators are the arithmetic addition and multiplication respectively. First of all, observe that the quantities computed are probabilities, and hence, at every point, the rigid constraints imposed by the laws of probability must be satisfied. ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....the longest common subsequence problem and sequence correction using the GLD as a criterion for strings, substrings, dictionaries treated as generalized tries and for grammars. Also, although some work has been done to extend the traditional set of SID operations to include adjacent transpositions [5, 9, 10] our work tackles the unsolved problem for constrained editing for Generalized Transposition (GT) errors. 1.1 Notation A is a finite alphabet, and A is the set of strings over A. q is the null symbol, where q A, and is distinct from the empty string. Let A = A q . A string X A of ....
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
.... how it should be searched so as to maximize performance at minimal cost (search time) This problem has been studied previously and in the context of word recognition many methods have been proposed and explored [SHC83, BT80, ST79, SB82] Several ways of organising a dictionary are described in [Pet80]. In this paper, we formulate the problem of organising the dictionary and searching it as that of contentaddressable memories in a one layer neural network. To our knowledge, we are the first to have formulated the problem this way [Jag90a] Specifically, words in the dictionary are stored in ....
Peterson J.L. (1980). Computer programs for detecting and correcting spelling errors. Communications of the ACM 23,12 (Dec 1980), 676-687.
....the strategies presented here. I. PROBLEM STATEMENT In the comparison of text patterns, phonemes and biological macromolecules a question that has attracted much interest is that of quantifying the dissimilarity between strings. A review of such distance measures and their applications is in [1,6,12], and in an excellent book edited by Sankoff et al. 13] The most promising of all measures used to compare strings is the one that relates them using various edit operations [13, pp.37 39] the edit operations most frequently considered are the deletion, insertion and substitution of individual ....
....one string to another. The GLD is closely related to other measures involving the strings, such as their longest common subsequence [2,3] Algorithms to compute the GLD have been proposed [1,3,4,7,9,13, 14] and its application to the pattern recognition of strings and substrings has been reported [1,5,6,11,12]. Unfortunately, the GLD is inadequate for recognizing noisy subsequences [11] and skeletal images [8] In [8] Marzal and Vidal showed how an auxiliary related measure could be used in hand written character recognition. This measure, the Normalized Edit Distance (NED) between strings X and Y is ....
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....of spelling conventions and are of three general types, those that result from not knowing what the accepted spelling for the word is, those that are inadvertent writing or typing errors, and those that result from transmission or recognition errors, as in optical character recognition. Peterson [27] discusses computational algorithms for recognizing spelling errors and approaches to correcting the errors. Recognition of spelling errors generally depends on word lists against which each incoming form is tested. If no match is found, a short list of potentially correct spellings is presented ....
Peterson JL. Computer Programs for Detecting and Correcting Spelling Errors. Communications of the ACM, Vol. 23, No. 12, 1980, 676-687.
....c(x t ; ffl) d c (x t Gamma1 ; y v ) c(ffl; y v ) d c (x t ; y v Gamma1 ) 9 = 1) where d c (ffl; ffl) 0. The edit distance may be computed in O(t Delta v) time using dynamic programming [18, 31] Many excellent reviews of the string edit distance literature are available [10, 14, 21, 29]. Several variants of the edit distance have been proposed, including the constrained edit distance [19] and the normalized edit distance [17] A stochastic interpretation of string edit distance was first provided by Bahl and Jelinek [2] but without an algorithm for learning the edit costs. The ....
Peterson, J. Computer programs for detecting and correcting spelling errors. Comm. ACM 23 (1980), 676--687.
....to decode the encoded text backwards, this is easily done only if the Huffman code has also the suffix property (which is usually not the case) so that no codeword is the prefix or the suffix of any other. A code having both prefix and suffix properties is called an affix code (as in Peterson [14]) Bidirectional decoding may be useful in the following applications: 1) KWIC display. In full text information retrieval systems a query consists of one or several keywords, the occurrences of which are to be located. This is done with the help of a concordance which contains for every ....
Peterson J.L., Computer programs for detecting and correcting spelling errors, Comm. ACM 23 (1980) 676--687.
....finite set of words, and using this model, string correction can be achieved using a suitable similarity metric 8 Our present paper also gives a formal basis for some of the experimental results which we presented in [19] Optimal and Information Theoretic Syntactic Pattern Recognition. Page 5 [11,16 18,21 23,25,27 30,33,35]. The advantages of using a finite dictionary in text recognition applications are many. First of all, the accuracy of the recognition is very high. Secondly, a noisy string is never recognized as a word which is not in the language, avoiding meaningless decisions. Finally, the time complexity ....
....The operations which we shall consider in this paper are the substitution, deletion and insertion operations of the individual symbols of the alphabet. In the literature, these operations are the most popular, because the general string editing problem has been studied using these operations [2,11,14,16 19,22,23,27,30,34,35,38,39], and furthermore, these operations can also be used to study problems involving subsequences and supersequences [1,13,14,23,33] Also, since most errors found in text based systems are substitution, deletion and insertion errors (and even the common transposition error can be modelled easily as a ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....Inexact sequence comparison is also used extensively in the comparison of biological macromolecules where noisy version of proteins strings are compared with their exact forms typically stored in large protein databases. The literature on sequence correction is extensive; indeed, the reviews [HD80, Pe80] list hundreds of publications that tackle the problem from various perspectives. Damareau [See references in Oo87a,b] was probably the first researcher to observe that most of the errors that were found in strings were either a single Substitution, Insertion and Deletion (SID) or reversal ....
....other inter substring edit operations. Related to these algorithms are the ones used to compute the Longest Common Subsequence (LCS) of two strings (Hirschberg [Hi77] Hunt and Szymanski [HS77] and others [SK83] String correction using the GLD as a criterion has been done for noisy strings [Pe80,SK83], substrings [KO83c] and subsequences [Oo87b] and also for strings in which the dictionaries are treated as generalized tries [KO81] and grammars [SK83] Besides these, various probabilistic methods have been studied in the literature [Pe80,KO84] Although some work has been done to extend the ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 91980).
....of Y. Pat. Recog. with Subst. Insert. Delet. and Gen. Transpos. Errors. Page 2 macro molecules where noisy version of proteins strings are compared with their exact forms typically stored in large protein databases. The literature on spelling correction is extensive; indeed, the reviews [HD80,Pe80] list hundreds of publications that tackle the problem from various perspectives. Damareau [See references in HD80,Pe80] was probably the first researcher to observe that most of the errors that were found in strings were either a single Substitution, Insertion and Deletion (SID) or reversal ....
....where noisy version of proteins strings are compared with their exact forms typically stored in large protein databases. The literature on spelling correction is extensive; indeed, the reviews [HD80,Pe80] list hundreds of publications that tackle the problem from various perspectives. Damareau [See references in HD80,Pe80] was probably the first researcher to observe that most of the errors that were found in strings were either a single Substitution, Insertion and Deletion (SID) or reversal (transposition) error. Thus the question of computing the dissimilarity between strings was reduced to comparing them using ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....of 1023 words, comprises a very large proportion of written English text. Thus, in both string processing and string recognition it is not uncommon to represent the dictionary as a finite set of words, and using this model, string correction can be achieved using a suitable similarity metric [14 18,31,32,37,39,41]. The advantages of using a String Taxonomy Using Learning Automata Page 4 finite dictionary in text recognition applications are many. First of all, the accuracy of the recognition is very high. Secondly, a noisy string is never recognized as a word which is not in the language, and thus, the ....
.... keyboard optimization algorithms and the associated data structures that are encountered, such as suffix trees and their generalizations (See the references listed above) Models which utilize the positional bigrams (and their variants) of the language have also been reported (See references in [37,41]) In this paper, which we believe, is a pioneering work in the field, we shall assume that we are dealing with a finite dictionary, H = X 1 , X J . We intend to partition H into K equi sized sub dictionaries. The problem of partitioning H into unequally sized sub dictionaries is still open. ....
[Article contains additional citation context not shown here]
J. L. Peterson, Computer programs for detecting and correcting spelling errors, Comm. Assoc. Comput. Mach., 23:676-687 (1980).
....are represented as vectors in V dimensional space, where V is the size of the vocabulary of the collection. For the English language, we can expect V to range from 2,000 up to and exceeding 100,000 (the vocabulary of every day English, and the size of a very detailed dictionary, respectively [Pet80] The coordinates of such vectors are called term weights and can be binary ( 1 if the term appears in the document; 0 if not) or real valued, with values increasing with the importance (e.g. occurrence frequency) of the term in the document. Consider two documents d 1 and d 2 , with vectors ....
J.L. Peterson. Computer programs for detecting and correcting spelling errors. CACM, 23(12):676--687, December 1980.
....each word can be completely distinguished from its synonyms, using only h bits for the hash code and p ( 4 bytes, usually) for the ptr to word. The advantages of storing ptr to word instead of storing the actual word are two: 1) space is saved (a word from the dictionary is 8 characters long [19]) and (2) the records of the intermediate file have fixed length; thus, there is no need for a word delimiter and there is no danger for a word to cross bucket boundaries. Searching is done in a similar way with DCBS. The only difference is that, whenever a matching hashcode is found in Step 2, ....
....we plan to store technical papers. Each paper has L=30 Kb size, with a vocabulary of D = 1000 words (these estimates are based on measurements on actual technical reports) The vocabulary of the collection is pessimistically estimated as V=100,000, which is the vocabulary of the best dictionaries [19]. Leaving 50 Mb for the index and other overheads, we have 250 Mb, which can hold 250 Mb 30 Kb 8,300 = N documents. Graph G6.7 plots the analytical results for this setting. Notice that the proposed methods require even smaller overhead: BSSF needs 8 , CBS needs 20 , and DCBS and NFD need ....
Peterson, J.L., "Computer Programs for Detecting and Correcting Spelling Errors," CACM, vol. 23, no. 12, pp. 676-687, Dec. 1980.
....records containing identified nicknames. Since misspellings are introduced by the database generator, we explored the possibility of improving the results by running a spelling correction program over some fields. Spelling correction algorithms have received a large amount of attention for decades [Peterson, 1980; Kukich, 1992] Most of the spelling correction algorithms we considered use a corpus of correctly spelled words from which the correct spelling is selected. Since we only have a corpus for the names of the cities in the U.S.A. 18670 different names) we attempted correcting only the spelling of ....
J. L. Peterson. Computer Programs for Detecting and Correcting Spelling Errors. Communications of the ACM, 23(12):676--687, 1980.
....) c(x t ; ffl) d c (x t Gamma1 ; y v ) c(ffl; y v ) d c (x t ; y v Gamma1 ) 9 = 1) where d c (ffl; ffl) 0. The edit distance may be computed in O(t Delta v) time using dynamic programming [16, 28] Many excellent reviews of the string edit distance literature are available [10, 13, 19, 26]. Several variants of the edit distance have been proposed, including the constrained edit distance [17] and the normalized edit distance [15] A stochastic interpretation of string edit distance was first provided by Bahl and Jelinek [2] but without an algorithm for learning the edit costs. The ....
Peterson, J. Computer programs for detecting and correcting spelling errors. Comm. ACM 23 (1980), 676--687.
....the dictionary is the set of likely proposals for correction. Unfortunatly most implementations consider only one error in a word in order to speed up this method: to correct a word of n characters from an alphabet of m characters, it only handles m(2n 1) n Gamma 1 strings. Peterson s SPELL [Peterson80] efficiently uses this algorithm. But the most successful approach is yielded by metrics methods. They compare a misspelled word with the words issued from a dictionary and calculate a distance between these words. They can correct several errors in a word. The concept was introduced by ....
J.L. Peterson. Computer programs for detecting and correcting spelling errors. Communication of the ACM, vol. 23, num. 12, pp. 676-687, December 1980.
....1977) Murphy (1990) describes a stepwise technique for constructing tries that optimize the r error digital search problem (where r digits of the key may be erroneous) but concludes that the problem of finding the optimal tree is NP complete. Other spelling correction systems have used hashing (Peterson, 1980) and digrams, trigrams, or in general ngrams (Riseman Hanson, 1974) However, it would be difficult to apply these indexing strategies to the diversity of operations that cross indexing supports. Smith and Steen (1981) proposed using a bitmap approach similar to cross indexing, for their ....
Peterson, J.L. (1980) Computer programs for detecting and correcting spelling errors. Comm. Association for Computing Machinery 23(12): 676--687; December.
....dictionary lookup programs. The Obvious Password Detector uses a statistical characteristic of the dictionary to eliminate words which are likely to be in the dictionary; OPUS uses Bloom filters. Both these techniques are probabilistic in nature and simply extend previous work (see for example [5][9]) similar techniques can be used with proactive password programs, and in fact either system could be incorporated into such a program. of 9 Of the three proactive password programs described above, the standard UNIX program passwd is clearly the weakest; this is not surprising, given the ....
Peterson, J. L., "Computer Programs for Detecting and Correcting Spelling Errors," Communications of the ACM 23(12) (Dec. 1980) pp. 676-687.
No context found.
James L. Peterson. Computer programs for detecting and correcting spelling errors. Commun. ACM, 23(12):676--687, 1980.
No context found.
J. L. Peterson, Computer program for detecting and correcting spelling errors, Communications ACM, vol. 23, pp. 667-687, 1980. 193
No context found.
Peterson J.L. Computer Programs for detecting and correcting spelling errors. Comm. of ACM vol.23 nš12, 1980.
No context found.
Peterson J.L. (1980). Computer Programs for detecting and correcting spelling errors, Comm. of ACM , vol.23, No.12.
No context found.
J.L. Peterson. 1980. Computer programs for detecting and correcting spelling errors. Communications of the ACM, 23(12):676--687.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC