Results 1 -
8 of
8
A discriminative training algorithm for hidden Markov models
- IEEE Trans. Speech Audio Proc.,vol
, 2004
"... ..."
Building an ASR System for Noisy Environments: SRI's 2001 SPINE Evaluation System
- Proceedings of the International Conference on Spoken Language Processing
, 2002
"... We describe SRI's recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments. The task had some unique challenges, including segmentation of foreground speech from noisy backgroun ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
(Show Context)
We describe SRI's recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments. The task had some unique challenges, including segmentation of foreground speech from noisy background, the need for robust acoustic models to handle noisy speech, and development of language models from limited training data. In developing the SRI evaluation system for this task, we addressed each of these challenges using a combination of state-of-the-art techniques, including several types of feature normalization, model adaptation, class-based language modeling, multi-pass segmentation and recognition, and word posterior-based decoding and system combination 1.
Speech recognition engineering issues in speech to speech translation system design for low resource languages and domains."Acoustics
- Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on
, 2006
"... ABSTRACT Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the co ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
ABSTRACT Engineering automatic speech recognition (ASR) for speech to speech (S2S) translation systems, especially targeting languages and domains that do not have readily available spoken language resources, is immensely challenging due to a number of reasons. In addition to contending with the conventional data-hungry speech acoustic and language modeling needs, these designs have to accommodate varying requirements imposed by the domain needs and characteristics, target device and usage modality (such as phrase-based, or spontaneous free form interactions, with or without visual feedback) and huge spoken language variability arising due to socio-linguistic and cultural differences of the users. This paper, using case studies of creating speech translation systems between English and languages such as Pashto and Farsi, describes some of the practical issues and the solutions that were developed for multilingual ASR development. These include novel acoustic and language modeling strategies such as language adaptive recognition, active-learning based language modeling, class-based language models that can better exploit resource poor language data, efficient search strategies, including N-best and confidence generation to aid multiple hypotheses translation, use of dialog information and clever interface choices to facilitate ASR, and audio interface design for meeting both usability and robustness requirements.
Progress on mandarin conversational telephone speech recognition
- in International Symposium on Chinese Spoken Language Processing
, 2004
"... Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-depende ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Over the past decade, there has been good progress on English conversational telephone speech (CTS) recognition, built on the Switchboard and Fisher corpora. In this paper, we present our efforts on extending language-independent technologies into Mandarin CTS, as well as addressing language-dependent issues such as tone. We will show the impact of each of the following factors: (a) simplified Mandarin phone set, (b) pitch features, (c) auto-retrieved web texts for augmenting ngram training, (d) speaker adaptive training, (e) maximum mutual information estimation, (f) decision-tree-based parameter sharing, (g) cross-word co-articulation modeling, and (h) combining MFCC and PLP decoding outputs using confusion networks. We have reduced the Chinese character error rate (CER) of the BBN-2003 development test set from 53.8 % to 46.8 % after (a)+(b)+(c)+(f)+(g) are combined. Further reduction in CER is anticipated after integrating all improvements. 1.
GRADIENT BOOSTING LEARNING OF HIDDEN MARKOV MODELS
"... In this paper, we present a new training algorithm, gradient boosting learning, for Gaussian mixture density (GMD) based acoustic models. This algorithm is based on a function approximation scheme from the perspective of optimization in function space rather than parameter space, i.e., stage-wise ad ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, we present a new training algorithm, gradient boosting learning, for Gaussian mixture density (GMD) based acoustic models. This algorithm is based on a function approximation scheme from the perspective of optimization in function space rather than parameter space, i.e., stage-wise additive expansions of GMDs are used to search for optimal models instead of gradient descent optimization of model parameters. In the proposed approach, GMD starts from a single Gaussian and is built up by sequentially adding new components. Each new component is globally selected to produce optimal gain in the objective function. MLE and MMI are unified under the H-criterion, which is optimized by the extended BW (EBW) algorithm. A partial extended EM algorithm is developed for stage-wise optimization of new components. Experimental results on WSJ task demonstrate that the new algorithm leads to improved model quality and recognition performance. 1.
Modeling Lexical Tones for Mandarin Large Vocabulary Continuous Speech Recognition
, 2006
"... ..."
(Show Context)
Building an ASR System for Noisy Environments: SRI's 2001 SPINE Evaluation System
- PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING
, 2002
"... We describe SRI's recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments. The task had some unique challenges, including segmentation of foreground speech from noisy backgroun ..."
Abstract
- Add to MetaCart
We describe SRI's recognition system as used in the 2001 DARPA Speech in Noisy Environments (SPINE) evaluation. The SPINE task involves recognition of speech in simulated military environments. The task had some unique challenges, including segmentation of foreground speech from noisy background, the need for robust acoustic models to handle noisy speech, and development of language models from limited training data. In developing the SRI evaluation system for this task, we addressed each of these challenges using a combination of state-of-the-art techniques, including several types of feature normalization, model adaptation, class-based language modeling, multi-pass segmentation and recognition, and word posterior-based decoding and system combination.