Results 1  10
of
59
FiniteState Transducers in Language and Speech Processing
 Computational Linguistics
, 1997
"... Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducer ..."
Abstract

Cited by 392 (42 self)
 Add to MetaCart
Finitestate machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential stringtostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of stringtoweight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Weighted finitestate transducers in speech recognition
 COMPUTER SPEECH & LANGUAGE
, 2002
"... We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general tr ..."
Abstract

Cited by 211 (5 self)
 Add to MetaCart
(Show Context)
We survey the use of weighted finitestate transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), contextdependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full crossword triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in realtime on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for secondpass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
MONA: Monadic SecondOrder Logic in Practice
 IN PRACTICE, IN TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, FIRST INTERNATIONAL WORKSHOP, TACAS '95, LNCS 1019
, 1995
"... The purpose of this article is to introduce Monadic Secondorder Logic as a practical means of specifying regularity. The logic is a highly succinct alternative to the use of regular expressions. We have built a tool MONA, which acts as a decision procedure and as a translator to finitestate au ..."
Abstract

Cited by 149 (20 self)
 Add to MetaCart
(Show Context)
The purpose of this article is to introduce Monadic Secondorder Logic as a practical means of specifying regularity. The logic is a highly succinct alternative to the use of regular expressions. We have built a tool MONA, which acts as a decision procedure and as a translator to finitestate automata. The tool is based on new algorithms for minimizing finitestate automata that use binary decision diagrams (BDDs) to represent transition functions in compressed form. A byproduct of this work is a new bottomup algorithm to reduce BDDs in linear time without hashing. The potential
The Design Principles of a Weighted FiniteState Transducer Library
, 2002
"... We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract

Cited by 110 (20 self)
 Add to MetaCart
(Show Context)
We describe the algorithmic and software design principles of an objectoriented library for weighted finitestate transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10 7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortestpaths algorithms over general semirings, objectoriented programming, lazy evaluation and memoization.
A Rational Design for a Weighted FiniteState Transducer Library
 LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... ..."
(Show Context)
State Complexity of Regular Languages
 Journal of Automata, Languages and Combinatorics
, 2000
"... State complexity is a descriptive complexity measure for regular languages. We investigate the problems related to the state complexity of regular languages and their operations. In particular, we compare the state complexity results on regular languages with those on finite languages. ..."
Abstract

Cited by 63 (9 self)
 Add to MetaCart
(Show Context)
State complexity is a descriptive complexity measure for regular languages. We investigate the problems related to the state complexity of regular languages and their operations. In particular, we compare the state complexity results on regular languages with those on finite languages.
Fast String Correction with LevenshteinAutomata
 INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2002
"... The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract

Cited by 37 (5 self)
 Add to MetaCart
(Show Context)
The Levenshteindistance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshteinautomata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshteindistance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshteinautomaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshteinautomaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshteindistance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshteinautomata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshteindistance where further primitive edit operations (transpositions, merges and splits) may be used.
On the complexity of Hopcroft’s state minimization algorithm
 of Lecture Notes in Computer Science
, 2004
"... Abstract. Hopcroft’s algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight. More precisely, we provide a family of automata of size n =2 k on which the algorithm runs in time k2 k. These automata have a very simple structure and are ..."
Abstract

Cited by 16 (2 self)
 Add to MetaCart
(Show Context)
Abstract. Hopcroft’s algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight. More precisely, we provide a family of automata of size n =2 k on which the algorithm runs in time k2 k. These automata have a very simple structure and are built over a oneletter alphabet. Their sets of final states are defined by de Bruijn words. 1
Online Construction of Subsequence Automata for Multiple Texts
, 2000
"... We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(#(m + k) + N) time using O(#m) space, where # is the size of alphab ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(#(m + k) + N) time using O(#m) space, where # is the size of alphabet, m is the size of the resulting subsequence automaton, k is the number of texts, N is the total length of texts. It can be used to preprocess a given set S of texts in such a way that for any subsequent query w # # # , returns in O(w) time the number of texts in S which contains w as a subsequence. We also show an upper bound of the size of automaton compared to the minimum automaton.