Results 1 -
6 of
6
The RWTH 2010 quaero ASR evaluation system for English
- French, and German,” in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing
, 2011
"... Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech recognition systems developed by RWTH for the English, French, and German language w ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
Recognizing Broadcast Conversational (BC) speech data is a difficult task, which can be regarded as one of the major challenges beyond the recognition of Broadcast News (BN). This paper presents the automatic speech recognition systems developed by RWTH for the English, French, and German language which attained the best word error rates for English and German, and competitive results for the French task in the 2010 Quaero evaluation for BC and BN data. At the same time, the RWTH German system used the least amount of training data among all participants. Large reductions in word error rate were obtained by the incorporation of the new Bottleneck Multilayer Perceptron (MLP) features for all three languages. Additional improvements were obtained for the German system by applying a new language modeling technique, decomposing words into sublexical components. Index Terms: automatic speech recognition, multilayer perceptrons
Speech Recognition for Machine Translation in Quaero
, 2011
"... This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classifica ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
This paper describes the speech-to-text systems used to provide automatic transcriptions used in the Quaero 2010 evaluation of Machine Translation from speech. Quaero (www.quaero.org) is a large research and industrial innovation program focusing on technologies for automatic analysis and classification of multimedia and multilingual documents. The ASR transcript is the result of a Rover combination of systems from three teams ( KIT, RWTH, LIMSI+VR) for the French and German languages. The casesensitive word error rates (WER) of the combined systems were respectively 20.8 % and 18.1 % on the 2010 evaluation data, relative WER reductions of 14.6 % and 17.4 % respectively over the best component system. 1.
AUTOMATIC STATE DISCOVERY FOR UNSTRUCTURED AUDIO SCENE CLASSIFICATION
"... In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness. Our scheme is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextua ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In this paper we present a novel scheme for unstructured audio scene classification that possesses three highly desirable and powerful features: autonomy, scalability, and robustness. Our scheme is based on our recently introduced machine learning algorithm called Simultaneous Temporal And Contextual Splitting (STACS) that discovers the appropriate number of states and efficiently learns accurate Hidden Markov Model (HMM) parameters for the given data. STACS-based algorithms train HMMs up to five times faster than Baum-Welch, avoid the overfitting problem commonly encountered in learning large state-space HMMs using Expectation Maximization (EM) methods such as Baum-Welch, and achieve superior classification results on a very diverse dataset with minimal pre-processing. Furthermore, our scheme has proven to be highly effective for building real-world applications and has been integrated into a commercial surveillance system as an event detection component. Index Terms — Hidden Markov Models, audio classification, topology learning
A semi-markov model for speech segmentation with an utterance-break prior
- in Proc. Interspeech
, 2014
"... Abstract Speech segmentation is the problem of finding the end points of a speech utterance for passing to an automatic speech recognition (ASR) system. The quality of this segmentation can have a large impact on the accuracy of the ASR system; in this paper we demonstrate that it can have an even ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract Speech segmentation is the problem of finding the end points of a speech utterance for passing to an automatic speech recognition (ASR) system. The quality of this segmentation can have a large impact on the accuracy of the ASR system; in this paper we demonstrate that it can have an even larger impact on downstream natural language processing tasks -in this case, machine translation. We develop a novel semi-Markov model which allows the segmentation of audio streams into speech utterances which are optimised for the desired distribution of sentence lengths for the target domain. We compare this with existing state-of-the-art methods and show that it is able to achieve not only improved ASR performance, but also to yield significant benefits to a speech translation task.
Learning Latent Variable and Predictive Models of Dynamical Systems
, 2009
"... Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclus ..."
Abstract
- Add to MetaCart
(Show Context)
Despite the single author listed on the cover, this dissertation is not the product of one person alone. I would like to acknowledge many, many people who influenced me, my life and my work. They have all aided this research in different ways over the years and helped it come to a successful conclusion. Geoff Gordon, my advisor, has taught me a lot over the years; how to think methodically and analyze a problem, how to formulate problems mathematically, and how to choose interesting problems. From the outset, he has helped me develop the ideas that went into the thesis. Andrew Moore, my first advisor, got me started in machine learning and data mining and helped make this field fun and accessible to me, and his guidance and mentoring was crucial for work done early in my Ph.D. Both Geoff and Andrew are the very best kind of advisor I could have asked for: really smart, knowledgeable, caring and hands-on. They showed me how be a good researcher while staying relaxed, calm and happy. Though I wasn’t always able to strike that balance, the example they set was essential for me to be able to make it through without burning out in the process. All the members of the AUTON lab deserve much thanks, especially Artur Dubrawski
The RWTH Aachen German and English LVCSR systems for IWSLT-2013
"... Abstract In this paper, German and English large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 evaluation campaign are presented. Good improvements are obtained with state-of-the-art monolingual and multilingual bottleneck featur ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract In this paper, German and English large vocabulary continuous speech recognition (LVCSR) systems developed by the RWTH Aachen University for the IWSLT-2013 evaluation campaign are presented. Good improvements are obtained with state-of-the-art monolingual and multilingual bottleneck features. In addition, an open vocabulary approach using morphemic sub-lexical units is investigated along with the language model adaptation for the German LVCSR. For both the languages, competitive WERs are achieved using system combination.