Results 1 - 10
of
59
Finite-State Transducers in Language and Speech Processing
- Computational Linguistics
, 1997
"... Finite-state machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential string-tostring transducer ..."
Abstract
-
Cited by 392 (42 self)
- Add to MetaCart
Finite-state machines have been used in various domains of natural language processing. We consider here the use of a type of transducers that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential string-tostring transducers. Transducers that output weights also play an important role in language and speech processing. We give a specific study of string-to-weight transducers, including algorithms for determinizing and minimizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated. 1.
Weighted finite-state transducers in speech recognition
- COMPUTER SPEECH & LANGUAGE
, 2002
"... We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general tr ..."
Abstract
-
Cited by 211 (5 self)
- Add to MetaCart
(Show Context)
We survey the use of weighted finite-state transducers (WFSTs) in speech recognition. We show that WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs. Furthermore, general transducer operations combine these representations flexibly and efficiently. Weighted determinization and minimization algorithms optimize their time and space requirements, and a weight pushing algorithm distributes the weights along the paths of a weighted transducer optimally for speech recognition. As an example, we describe a North American Business News (NAB) recognition system built using these techniques that combines the HMMs, full cross-word triphones, a lexicon of 40 000 words, and a large trigram grammar into a single weighted transducer that is only somewhat larger than the trigram word grammar and that runs NAB in real-time on a very simple decoder. In another example, we show that the same techniques can be used to optimize lattices for second-pass recognition. In a third example, we show how general automata operations can be used to assemble lattices from different recognizers to improve recognition performance.
MONA: Monadic Second-Order Logic in Practice
- IN PRACTICE, IN TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS, FIRST INTERNATIONAL WORKSHOP, TACAS '95, LNCS 1019
, 1995
"... The purpose of this article is to introduce Monadic Second-order Logic as a practical means of specifying regularity. The logic is a highly succinct alternative to the use of regular expressions. We have built a tool MONA, which acts as a decision procedure and as a translator to finite-state au ..."
Abstract
-
Cited by 149 (20 self)
- Add to MetaCart
(Show Context)
The purpose of this article is to introduce Monadic Second-order Logic as a practical means of specifying regularity. The logic is a highly succinct alternative to the use of regular expressions. We have built a tool MONA, which acts as a decision procedure and as a translator to finite-state automata. The tool is based on new algorithms for minimizing finitestate automata that use binary decision diagrams (BDDs) to represent transition functions in compressed form. A byproduct of this work is a new bottom-up algorithm to reduce BDDs in linear time without hashing. The potential
The Design Principles of a Weighted Finite-State Transducer Library
, 2002
"... We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive eff ..."
Abstract
-
Cited by 110 (20 self)
- Add to MetaCart
(Show Context)
We describe the algorithmic and software design principles of an object-oriented library for weighted finite-state transducers. By taking advantage of the theory of rational power series, we were able to achieve high degrees of generality, modularity and irredundancy, while attaining competitive efficiency in demanding speech processing applications involving weighted automata of more than 10 7 states and transitions. Besides its mathematical foundation, the design also draws from important ideas in algorithm design and programming languages: dynamic programming and shortest-paths algorithms over general semirings, object-oriented programming, lazy evaluation and memoization.
A Rational Design for a Weighted Finite-State Transducer Library
- LECTURE NOTES IN COMPUTER SCIENCE
, 1998
"... ..."
(Show Context)
State Complexity of Regular Languages
- Journal of Automata, Languages and Combinatorics
, 2000
"... State complexity is a descriptive complexity measure for regular languages. We investigate the problems related to the state complexity of regular languages and their operations. In particular, we compare the state complexity results on regular languages with those on finite languages. ..."
Abstract
-
Cited by 63 (9 self)
- Add to MetaCart
(Show Context)
State complexity is a descriptive complexity measure for regular languages. We investigate the problems related to the state complexity of regular languages and their operations. In particular, we compare the state complexity results on regular languages with those on finite languages.
Fast String Correction with Levenshtein-Automata
- INTERNATIONAL JOURNAL OF DOCUMENT ANALYSIS AND RECOGNITION
, 2002
"... The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levensht ..."
Abstract
-
Cited by 37 (5 self)
- Add to MetaCart
(Show Context)
The Levenshtein-distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein-automata of degree n for a word W are defined as finite state automata that regognize the set of all words V where the Levenshtein-distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W , a deterministic Levenshtein-automaton of degree n for W in time linear in the length of W . Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein-automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein-distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein-automata and leads to even improved eciency. We also describe how to extend both methods to variants of the Levenshtein-distance where further primitive edit operations (transpositions, merges and splits) may be used.
On the complexity of Hopcroft’s state minimization algorithm
- of Lecture Notes in Computer Science
, 2004
"... Abstract. Hopcroft’s algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight. More precisely, we provide a family of automata of size n =2 k on which the algorithm runs in time k2 k. These automata have a very simple structure and are ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Hopcroft’s algorithm for minimizing a deterministic automaton has complexity O(n log n). We show that this complexity bound is tight. More precisely, we provide a family of automata of size n =2 k on which the algorithm runs in time k2 k. These automata have a very simple structure and are built over a one-letter alphabet. Their sets of final states are defined by de Bruijn words. 1
Online Construction of Subsequence Automata for Multiple Texts
, 2000
"... We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(|#|(m + k) + N) time using O(|#|m) space, where |#| is the size of alphab ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
We consider a deterministic finite automaton which accepts all subsequences of a set of texts, called subsequence automaton. We show an online algorithm for constructing subsequence automaton for a set of texts. It runs in O(|#|(m + k) + N) time using O(|#|m) space, where |#| is the size of alphabet, m is the size of the resulting subsequence automaton, k is the number of texts, N is the total length of texts. It can be used to preprocess a given set S of texts in such a way that for any subsequent query w # # # , returns in O(|w|) time the number of texts in S which contains w as a subsequence. We also show an upper bound of the size of automaton compared to the minimum automaton.