Results 1 - 10
of
59
Connectionist Probability Estimation in HMM Speech Recognition
- IEEE Transactions on Speech and Audio Processing
, 1992
"... This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech ..."
Abstract
-
Cited by 45 (9 self)
- Add to MetaCart
This report is concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system, This is achieved through a statistical understanding of connectionist networks as probability estimators, first elucidated by Herve Bourlard. We review the basis of HMM speech recognition, and point out the possible benefits of incorporating connectionist networks. We discuss some issues necessary to the construction of a connectionist HMM recognition system, and describe the performance of such a system, including evaluations on the DARPA database, in collaboration with Mike Cohen and Horacio Franco of SRI International. In conclusion, we show that a connectionist component improves a state of the art HMM system. ii Part I INTRODUCTION Over the past few years, connectionist models have been widely proposed as a potentially powerful approach to speech recognition (e.g. Makino et al. (1983), Huang et al. (1988) and Waibel et al. (1989)). However, whilst connec...
Realtime Personal Positioning System for Wearable Computers
"... Context awareness is an important functionality for wearable computers. In particular, the computer should know where the person is in the environment. This paper proposes an image sequence matching technique for the recognition of locations and previously visited places. As in single word recogniti ..."
Abstract
-
Cited by 40 (3 self)
- Add to MetaCart
Context awareness is an important functionality for wearable computers. In particular, the computer should know where the person is in the environment. This paper proposes an image sequence matching technique for the recognition of locations and previously visited places. As in single word recognition in speech recognition, a dynamic programming algorithm is proposed for the calculation of the similarity of different locations. The system runs on a stand alone wearable computer such as a Libretto PC. Using a training sequence a dictionary of locations is created automatically. These locations are then be recognized by the system in realtime using a hatmounted camera.
Precise N-Gram Probabilities from Stochastic Context-Free Grammars
, 1994
"... We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation o ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
We present an algorithm for computing n-gram probabilities from stochastic context-free grammars, a procedure that can alleviate some of the standard problems associated with n-grams (estimation from sparse data, lack of linguistic structure, among others). The method operates via the computation of substring expectations, which in turn is accomplished by solving systems of linear equations derived from the grammar. The procedure is fully implemented and has proved viable and useful in practice.
Dynamic Programming Search for Continuous Speech Recognition
, 1999
"... . Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning str ..."
Abstract
-
Cited by 30 (0 self)
- Add to MetaCart
. Initially introduced in the late 1960s and early 1970s, dynamic programming algorithms have become increasingly popular in automatic speech recognition. There are two reasons why this has occurred: First, the dynamic programming strategy can be combined with avery e#cient and practical pruning strategy so that very large search spaces can be handled. Second, the dynamic programming strategy has turned out to be extremely #exible in adapting to new requirements. Examples of such requirements are the lexical tree organization of the pronunciation lexicon and the generation of a word graph instead of the single best sentence. In this paper, we attempt to systematically review the use of dynamic programming search strategies for small#vocabulary and large#vocabulary continuous speech recognition. The following methods are described in detail: search using a linear lexicon, search using a lexical tree, language-model look-ahead and word graph generation. 1 Introduction Search strategie...
Speaker-Independent Continuous Speech Dictation
- SPEECH COMMUNICATION
, 1994
"... In this paper we report on progress made at LIMSI in speaker-independent large vocabulary speech dictation using newspaper-based speech corpora in English and French. The recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on n ..."
Abstract
-
Cited by 26 (12 self)
- Add to MetaCart
In this paper we report on progress made at LIMSI in speaker-independent large vocabulary speech dictation using newspaper-based speech corpora in English and French. The recognizer makes use of continuous density HMMs with Gaussian mixtures for acoustic modeling and n-gram statistics estimated on newspaper texts for language modeling. Acoustic modeling uses cepstrum-based features, context-dependent phone models (intra and interword), phone duration models, and sex-dependent models. For English the ARPA Wall Street Journal-based CSR corpus is used and for French the BREF corpus containing recordings of texts from the French newspaper Le Monde is used. Experiments were carried out with both these corpora at the phone level and at the word level with vocabularies containing up to 20,000 words. Word recognition experiments are also described for the ARPA RM task which has been widely used to evaluate and compare systems.
A Spoken Language System For Information Retrieval
- In Proceedings of ICLSP'94
"... Spoken language systems aim to provide a natural interface between humans and computers by using simple and natural dialogues to enable the user to access stored information. The LIMSI spoken language work is being pursued in several task domains. In this paper we present a system for vocal access t ..."
Abstract
-
Cited by 24 (11 self)
- Add to MetaCart
Spoken language systems aim to provide a natural interface between humans and computers by using simple and natural dialogues to enable the user to access stored information. The LIMSI spoken language work is being pursued in several task domains. In this paper we present a system for vocal access to database for a French version of the Air Travel Information Services (ATIS) task. The ATIS task is a designated common task for data collection and evaluation within the ARPA Speech and Natural Language program. A complete spoken language system including a speech recognizer, a natural language component, a database query generator and a natural language response generator is described. The speaker independent continuous speech recognizer makes use of task-independent acoustic models trained on the BREF corpus and a task-specific languagemodel. A case-frame approach is used for the natural language component. This component determines the meaning of the query and builds an appropriate sema...
Hybrid HMM/ANN Systems for Speech Recognition: Overview and New Research Directions
- in Adaptive Processing of Sequences and Data Structures, ser. Lecture Notes in Artificial Intelligence (1387
, 1998
"... ..."
Multimodal Interfaces
- Artificial Intelligence Review Journal, special issue
, 1994
"... In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instea ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
In this paper, we present an overview of research in our laboratories on Multimodal Human Computer Interfaces. The goal for such interfaces is to free human computer interaction from the limitations and acceptance barriers due to rigid operating commands and keyboards as only/main I/O-device. Instead we move to involve all available human communication modalities. These human modalities include Speech, Gesture and Pointing,
Decoder Technology For Connectionist Large Vocabulary Speech Recognition
, 1995
"... The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
The search problem in large vocabulary continuous speech recognition (LVCSR) is to locate the most probable string of words for a spoken utterance given the acoustic signal and a set of sentence models. Searching the space of possible utterances is difficult because of the large vocabulary size and the complexity imposed when long-span language models are used. This report describes an efficient search procedure and its software embodiment in a decoder, NOWAY, which has been incorporated in ABBOT, a hybrid connectionist/ hidden Markov model (HMM) LVCSR system [15]. The search algorithm is based on stack decoding and uses both likelihood- and posterior-based pruning. The use of the posterior-based phone deactivation pruning techniques is well-suited to hybrid connectionist/HMM systems because posterior phone probabilities are directly computed by the connectionist acoustic model. The single-pass decoder has been evaluate on the large vocabulary North American Business News task using a...

