Results 1 - 10
of
14
Acoustical and Environmental Robustness in Automatic Speech Recognition
, 1990
"... This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in d ..."
Abstract
-
Cited by 145 (8 self)
- Add to MetaCart
This dissertation describes a number of algorithms developed to increase the robustness of automatic speech recognition systems with respect to changes in the environment. These algorithms attempt to improve the recognition accuracy of speech recognition systems when they are trained and tested in different acoustical environments, and when a desk-top microphone (rather than a close-talking microphone) is used for speech input. Without such processing, mismatches between training and testing conditions produce an unacceptable degradation in recognition accuracy. Two kinds of
Conversational Interfaces: Advances and Challenges
, 2000
"... The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the developme ..."
Abstract
-
Cited by 61 (4 self)
- Add to MetaCart
The last decade has witnessed the emergence of a new breed of human computer interfaces that combines several human language technologies to enable information access and transactional processing using spoken dialogue. In this paper, I discuss my view on the research issues involved in the development of such interfaces, describe the recent work done in this area at the MIT Laboratory for Computer Science, and outline some of the unmet research challenges, including the need to work in real domains, spoken language generation, and portability across domains and languages.
Environmental Adaptation for Robust Speech Recognition
, 1994
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1. Approaches to Overcoming Environmental Variability . . . . . . . . . . . . . . 6 1.1.1. Re-Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.1.2. Multi-Style Training . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.1.3. Environmental Compensation Using Dynamic Adaptation . . . . . . . . . . 8 1.2. Towards Environment-Independent Recognition . . . . . . . . . . . . . . . . 8 1.2.1. Sources of Environmental Variability . . . . . . . . . . . . . . . . . . 9 1.2.2. Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 9 1.3. Dissertation Outline . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Chapter 2 Overview of Environmental Robustness in Speech Recognition . . . . . . 12 2.1. Sources of Degradation...
Acoustic And Language Modeling Of Human And Nonhuman Noises For Human-To-Human Spontaneous Speech Recognition
, 1995
"... In this paper several improvements of our speech-to-speech translation system JANUS on spontaneous human-to-human dialogs are presented. Common phenomena in spontaneous speech are described, followed by a classification of different types of noises. To handle the variety of spontaneous effects in hu ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
In this paper several improvements of our speech-to-speech translation system JANUS on spontaneous human-to-human dialogs are presented. Common phenomena in spontaneous speech are described, followed by a classification of different types of noises. To handle the variety of spontaneous effects in human-to-human dialogs, special noise models are introduced representing both human and nonhuman noises, as well as word fragments. It will be shown that both the acoustic and the language modeling of these noises increase the recognition performance significantly. In the experiments, a clustering of the noise classes is performed and the resulting cluster variants are compared, thus allowing to determine the best tradeoff between sensitivity and trainability of the models. 1. INTRODUCTION Recently, a large number of studies has been devoted to the task of recognizing and understanding spontaneous speech. Compared to read speech, some specific problems exist when spontaneous speech is to be ...
FST-based recognition techniques for multi-lingual and multi-domain spontaneous speech
- Proceedings of the European Conference on Speech Communication and Technology
, 2001
"... In this paper we present techniques for building multi-domain and multi-lingual recognizers within a finite-state transducer (FST) framework. The flexibility of the FST approach is also demonstrated on the task of incorporating networks modeling different types of non-speech events into an existing ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In this paper we present techniques for building multi-domain and multi-lingual recognizers within a finite-state transducer (FST) framework. The flexibility of the FST approach is also demonstrated on the task of incorporating networks modeling different types of non-speech events into an existing word lattice network. The ability to create robust multi-domain and/or multi-lingual recognizers for spontaneous speech will enable a conversational system to switch seamlessly and automatically among different domains and/or languages. Preliminary results using a bi-domain recognizer exhibit only small recognition accuracy degradation in comparison to domain-dependent recognition. Similarly promising results were observed using a bilingual recognizer which performs simultaneous language identification and recognition. When using the FST techniques to add non-speech models to the recognizer, experiments show a 10 % reduction in word error rate across all utterances and a 30% reduction on utterances containing non-speech events. 1.
A Robust Loose Coupling for Speech Recognition and Natural Language Understanding
- IEEE, Bob O'Hara and Al
, 1995
"... The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer ach ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
The focus of this thesis proposal is to improve the ability of a computational system to understand spoken utterances in a dialogue with a human. Available computational methods for word recognition do not perform as well on spontaneous speech as we would hope. Even a state of the art recognizer achieves slightly worse than 70% word accuracy on (nearly) spontaneous speech in a conversation about a specific problem. To address this problem, I will explore novel methods for post-processing the output of a speech recognizer in order to correct errors. I adopt statistical techniques for modeling the noisy channel from the speaker to the listener in order to correct some of the errors introduced there. The statistical model accounts for frequent errors such as simple word/word confusions and short phrasal problems (one-to-many word substitutionsand many-to-one word concatenations). To use the model, a search algorithm is required to find the most likely correction of a given word sequence ...
Recognition Confidence Measures: Detection of Misrecognitions and Out-of-Vocabulary Words
, 1994
"... This paper describes and evaluates a new technique for measuring confidence in word strings produced by speech recognition systems. It detects misrecognized and out-of-vocabulary words in spontaneous spoken dialogs. The system uses multiple, diverse knowledge sources including acoustics, semantics, ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper describes and evaluates a new technique for measuring confidence in word strings produced by speech recognition systems. It detects misrecognized and out-of-vocabulary words in spontaneous spoken dialogs. The system uses multiple, diverse knowledge sources including acoustics, semantics, pragmatics and discourse to determine if a word string is misrecognized. When likely misrecognitions are detected, a series of tests distinguishes out-of-vocabulary words from other error sources. The work is part of a larger effort to automatically recognize and understand new words when spoken in a spontaneous spoken dialog.
LIESHOU: A Mandarin Conversational Task Agent for the Galaxy-II Architecture
, 2003
"... Multilinguality is an important component of spoken dialogue systems, both because it makes the systems available to a wider audience and because it leads to a more flexible system dialogue strategy. This thesis concerns the development of a Chinese language capability for the ORION system, which is ..."
Abstract
- Add to MetaCart
Multilinguality is an important component of spoken dialogue systems, both because it makes the systems available to a wider audience and because it leads to a more flexible system dialogue strategy. This thesis concerns the development of a Chinese language capability for the ORION system, which is one of many spoken dialogue systems available within the GALAXY-II architecture. This new system, which we call LIESHOU, interacts with Mandarin-speaking users and performs o#-line tasks, initiating later contact with a user at a pre-negotiated time. The development and design of LIESHOU closely followed the design of similar multilingual GALAXY-II domains, such as MUXING (Chinese JUPITER), and PHRASEBOOK (Translation Guide for Foreign Travelers). The successful deployment of LIESHOU required the design and implementation of four main components - speech recognition, natural language understanding, language generation, and speech synthesis. These four components were implemented using the SUMMIT speech recognition system, TINA Natural Language understanding system, GENESIS-II language generation system, and ENVOICE speech synthesis system respectively. The development of the necessary resources for each of these components is described in detail, and a system evaluation is given for the final implementation. Thesis Supervisor: Stephanie Sene# Title: Principal Research Scientist Acknowledgments I would like to extend my thanks and gratitude to my advisor, Stephanie Sene#, who I have had the honor of working with for the past four years. I am so proud to have been part of Orion and Lieshou since their infancy, and it has been such a great experience working and learning from her. This thesis would not have been possible without her invaluable help. I would also like to thank e...
LINGUINI -- Acquiring Individual Interest Profiles by Means of Adaptive Natural Language Dialog
, 2006
"... User information is needed by adaptive systems in order to tailor information and product offers to the needs and preferences of individual users. Personalized Recommender Systems are adaptive systems that automatically generate recommendations on the basis of individual user profiles. Most existi ..."
Abstract
- Add to MetaCart
User information is needed by adaptive systems in order to tailor information and product offers to the needs and preferences of individual users. Personalized Recommender Systems are adaptive systems that automatically generate recommendations on the basis of individual user profiles. Most existing Recommender Systems, however, are based on rather simple and mainly standardized profile information, which often delimits the adequacy of the recommendations they generate for an individual user. More adequate recommendations could be generated on the basis of more individual and representative user profiles that also integrate complex information, for example about personal interests or lifestyle. Furthermore, most adaptive systems acquire profile information only for their own purposes and do not allow for an exchange of this information with other applications the user wants to use. Above all, existing explicit profiling methods suffer from severe drawbacks which limit their utilizability in practice.

