Results 1 -
2 of
2
Improving WFST-based G2P Conversion with Alignment Constraints and RNNLM N-best Rescoring
"... This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Graphemeto-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment al ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
This work introduces a modified WFST-based multiple to multiple EM-driven alignment algorithm for Graphemeto-Phoneme (G2P) conversion, and preliminary experimental results applying a Recurrent Neural Network Language Model (RNNLM) as an N-best rescoring mechanism for G2P conversion. The alignment algorithm leverages the WFST framework and introduces several simple structural constraints which yield a small but consistent improvement in Word Accuracy (WA) on a selection of standard baselines. The RNNLM rescoring further extends these gains and achieves state-of-the-art performance on four standard G2P datasets. The system is also shown to be significantly faster than existing solutions. Finally, the complete WFST-based G2P framework is provided as an open-source toolkit.
Compiling Large-Context Phonetic Decision Trees into Finite-State
"... Recent work has shown that the use of finite-state transducers (FST’s) has many advantages in large vocabulary speech recognition. Most past work has focused on the use of triphone phonetic decision trees. However, numerous applications use decision trees that condition on wider contexts; for exampl ..."
Abstract
- Add to MetaCart
(Show Context)
Recent work has shown that the use of finite-state transducers (FST’s) has many advantages in large vocabulary speech recognition. Most past work has focused on the use of triphone phonetic decision trees. However, numerous applications use decision trees that condition on wider contexts; for example, many systems at IBM use 11-phone phonetic decision trees. Alas, large-context phonetic decision trees cannot be compiled straightforwardly into FST’s due to memory constraints. In this work, we discuss memory-efficient techniques for manipulating large-context phonetic decision trees in the FST framework. First, we describe a lazy expansion technique that is applicable when expanding small word graphs. For general applications, we discuss how to construct large-context transducers via a sequence of simple, efficient finite-state operations; we also introduce a memory-efficient implementation of determinization. 1.