Download:
|
by Mitchel Weintraub (sri, Eric Fosler (icsi, Charles Galles (dod, Yu-hung Kao (ti, Sanjeev Khudanpur (jhu, Murat Saraclar (jhu, Steven Wegmann (dragon
http://www-speech.sri.com/papers/ws96-pronunciation.ps.gz
Add To MetaCart
Abstract:
Today's recognizers are primarily based on single pronunciations for most words. This means that the burden of modeling phonetic variability falls entirely on acoustic modeling. In addition, certain types of pronunciation variation (phone deletion/reduction, dialect) are impossible to model well at the acoustic level. We suspect that one of the difficulties in recognizing conversational speech (compared to read speech) is the greater variability of pronunciation. We propose to capture this variability by modeling the pronunciations for each word. The goal of this project is to automatically learn a model of word pronunciation from data. We focus on frequent words that appear many times in the Switchboard and Callhome corpora, since a small number of words make up a large fraction of the total errors. We can hope to learn these pronunciations automatically since these words occur many times in the training data. All past attempts in this area have treated pronunciation variants as mutually independent, i.e., under the assumption that any speaker would choose one of the given variants with a given probability, independent of related choices in the same phonological context, conversation, by the same speaker. Such an approach is simple to implement, but increases the number of parameters and the
Citations
|
304
|
SWITCHBOARD: telephone speech corpus for research and development
– Godfrey, Holliman, et al.
- 1992
|
|
64
|
Insights into spoken language gleaned from phonetic transcriptions of the Switchboard corpus. ICSLP-96
– Greenberg, Ellis, et al.
- 1996
|
|
47
|
Phonological Structures for Speech Recognition
– Cohen
- 1989
|
|
45
|
An Information Theoretic Approach to the Automatic Determination
– Lucassen, Mercer
- 1984
|
|
43
|
A statistical model for generating pronunciation networks
– Riley
- 1991
|
|
42
|
Multiple-pronunciation lexical modeling in a speaker-independent speech understanding system
– Wooters, Stolcke
- 1994
|
|
23
|
Dictionary Learning: Performance Through Consistency
– Sloboda
- 1995
|
|
22
|
Building Multiple Pronunciation Models for Novel Words Using Exploratory
– Tajchman, Fosler, et al.
- 1995
|
|
20
|
Dictionary Learning For Spontaneous Speech Recognition
– Sloboda, Waibel
- 1996
|
|
20
|
Identification of contextual factors for pronunciation networks
– Chen
- 1990
|
|
14
|
Word juncture modeling using phonological rules for HMM-based continuous speech recognition
– Giachin, Rosenberg, et al.
- 1991
|
|
14
|
The LIMSI continuous speech dictation system: evaluation on the ARPA Wall Street Journal task
– Gauvain, Lamel, et al.
- 1994
|
|
9
|
Learning phonological rule probabilities from speech corpora with exploratory computational phonology
– Tajchman, Jurafsky, et al.
- 1995
|
|
8
|
Speech perception and phonemic restorations
– Warren, Obusek
- 1971
|
|
7
|
Studies for an Adaptive Recognition Lexicon
– Cohen, Baldwin, et al.
- 1987
|
|
6
|
Decision Trees for
– Bahl, Souza, et al.
- 1991
|
|
6
|
Automatic phonetic baseform determination
– Bahl, Das, et al.
- 1991
|
|
6
|
A segment model based approach to speech recognition
– Lee, Soong, et al.
- 1988
|
|
5
|
Phonological studies for speech recognition
– Bernstein, Baldwin, et al.
- 1986
|
|
4
|
Subphonetic Modeling for Speech Recognition
– Hwang, Huang
- 1992
|
|
3
|
A New Class of Fenonic Markov Word Models for Large Vocabulary Continuous Speech Recognition
– Bahl, Bellegarda, et al.
- 1991
|
|
2
|
Eect of speaking style
– Weintraub, Taussig, et al.
- 1996
|
|
2
|
Acoustic Subword Models in the Berkeley Restaurant Project
– Wooters, Morgan
- 1992
|
|
1
|
Acoustic Modeling," presented at the April 29
– Zavaliagkos, McDonough
- 1996
|