Results 1 - 10
of
58
A tutorial on hidden markov models and selected applications in speech recognition
- Proceedings of the IEEE
, 1989
"... Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical s ..."
Abstract
-
Cited by 3117 (0 self)
- Add to MetaCart
Although initially introduced and studied in the late 1960s and early 1970s, statistical methods of Markov source or hidden Markov modeling have become increasingly popular in the last several years. There are two strong reasons why this has occurred. First the models are very rich in mathematical structure and hence can form the theoretical basis for use in a wide range of applications. Sec-ond the models, when applied properly, work very well in practice for several important applications. In this paper we attempt to care-fully and methodically review the theoretical aspects of this type of statistical modeling and show how they have been applied to selected problems in machine recognition of speech. I.
Signal modeling techniques in speech recognition
- PROCEEDINGS OF THE IEEE
, 1993
"... We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to norm ..."
Abstract
-
Cited by 99 (5 self)
- Add to MetaCart
We have seen three important trends develop in the last five years in speech recognition. First, heterogeneous parameter sets that mix absolute spectral information with dynamic, or time-derivative, spectral information, have become common. Second, similariry transform techniques, often used to normalize and decor-relate parameters in some computationally inexpensive way, have become popular. Third, the signal parameter estimation problem has merged with the speech recognition process so that more sophisticated statistical models of the signal’s spectrum can be estimated in a closed-loop manner. In this paper, we review the signal processing components of these algorithms. These al-gorithms are presented as part of a unified view of the signal parameterization problem in which there are three major tasks: measurement, transformation, and statistical modeling. This paper is by no means a comprehensive survey of all possible techniques of signal modeling in speech recognition. There are far too many algorithms in use today to make an exhaustive survey feasible (and cohesive). Instead, this paper is meant to serve as a tutorial on signal processing in state-of-the-art speech recognition systems and to review those techniques most commonly used. In keeping with this goal, a complete mathematical description of each algorithm has been included in the paper.
Lossy Source Coding
- IEEE Trans. Inform. Theory
, 1998
"... Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 year ..."
Abstract
-
Cited by 46 (1 self)
- Add to MetaCart
Lossy coding of speech, high-quality audio, still images, and video is commonplace today. However, in 1948, few lossy compression systems were in service. Shannon introduced and developed the theory of source coding with a fidelity criterion, also called rate-distortion theory. For the first 25 years of its existence, rate-distortion theory had relatively little impact on the methods and systems actually used to compress real sources. Today, however, rate-distortion theoretic concepts are an important component of many lossy compression techniques and standards. We chronicle the development of rate-distortion theory and provide an overview of its influence on the practice of lossy source coding. Index Terms---Data compression, image coding, speech coding, rate distortion theory, signal coding, source coding with a fidelity criterion, video coding. I.
Structured Audio: Creation, Transmission, and Rendering of Parametric Sound Representations
- PROC. IEEE
, 1998
"... ..."
Extraction of Visual Features for Lipreading
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 2002
"... The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is de ..."
Abstract
-
Cited by 36 (1 self)
- Add to MetaCart
The multi-modal nature of speech is often ignored in human-computer interaction but lip deformation, and other body such as head and arm motion all convey additional infor-mation. We integrate speech cues from many sources and this improves intelligibility, es-pecially when the acoustic signal is degraded. This paper shows how this additional, often complementary, visual speech information can be used for speech recognition. Three meth-ods for parameterising lip image sequences for recognition using hidden Markov models are compared. Two of these are top-down approaches that fit a model of the inner and outer lip contours and derive lipreading features from a principal component analysis of shape, or shape and appearance respectively. The third, bottom-up, method uses a non-linear scale-space analysis to form features directly from the pixel intensity. All methods are compared on a multi-talker visual speech recognition task of isolated letters.
SPASM: a Real-Time Vocal Tract Physical Model Editor/Controller and Singer
- Computer Music Journal
, 1992
"... Computer Music Journal is currently published by The MIT Press. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained pri ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Computer Music Journal is currently published by The MIT Press. Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
New Applications of the Sound Description Interchange Format
- Proc. ICMC-98, Ann Arbor
, 1998
"... This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA. ..."
Abstract
-
Cited by 23 (2 self)
- Add to MetaCart
This paper describes the goals and design of SDIF and its standard frame types, followed by a review of recent SDIF work at CNMAT, IRCAM, and IUA.
Effects of Temporal Correction on Intelligibility of Foreign-Accented English
- Journal of Phonetics
, 1997
"... This study investigates the contribution of the temporal patterning of speech to the reduced intelligibility of foreign-accented utterances. Short English phrases spoken by a native Chinese speaker were instrumentally modified, using LPC resynthesis and dynamic time warping, so as to align the durat ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This study investigates the contribution of the temporal patterning of speech to the reduced intelligibility of foreign-accented utterances. Short English phrases spoken by a native Chinese speaker were instrumentally modified, using LPC resynthesis and dynamic time warping, so as to align the duration of acoustic segments with tokens of the same phrases spoken by a native English speaker, while retaining the spectral and source characteristics of the Chinese speaker. Similarly, the native speaker's productions were distorted to match the durational patterns of the nonnative speaker. Intelligibility of these stimuli was measured based on native English listeners' performance in a forced-choice identification test with four alternatives: the correct phrase plus three phonetically similar distractor phrases suggested by listening to the Chinese productions. Intelligibility of the unmodified Chinese-accented phrases was poor (39% correct), but improved significantly (to 58%) after tempora...
The prosody of speech: Melody and rhythm
- The Handbook of Phonetic Sciences, Nr. 5 in Blackwell Handbooks in Linguistics, chap
, 1997
"... The word ‘prosody ’ comes from ancient Greek, where it was used for a “song sung with instrumental music”. In later times the word was used for the “science of versification ” and the “laws of metre”, governing the modulation of the human voice in reading poetry aloud. In modern phonetics the word ‘ ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The word ‘prosody ’ comes from ancient Greek, where it was used for a “song sung with instrumental music”. In later times the word was used for the “science of versification ” and the “laws of metre”, governing the modulation of the human voice in reading poetry aloud. In modern phonetics the word ‘prosody ’ and its adjectival
A Method to Combine Acoustic and Morphological Constraints in the Speech Production Inverse Problem
, 1995
"... . This paper approaches the articulatory-to-acoustic speech production inverse case. A framework based on an explicit combination of vocal-tract morphological and acoustic constraints is proposed. The solution is based on a Fourier analysis of the vocal-tract log-area function: the relationship betw ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
. This paper approaches the articulatory-to-acoustic speech production inverse case. A framework based on an explicit combination of vocal-tract morphological and acoustic constraints is proposed. The solution is based on a Fourier analysis of the vocal-tract log-area function: the relationship between the log-area Fourier cosine coefficients and the corresponding formants is used to formulate an acoustic constraint. The same set of coefficients is then used to express a morphological constraint. This representation of both acoustic and morphological constraints in the same parameter space allows an efficient solution for the inverse problem. The basis of the acoustic constraint formulation was first proposed by Mermelstein (1967). However, at that time, the combination with morphological constraints was not realized. The method is tested for some vowels. The results confirm the validity of the method, but they also show the need for dynamic constraints. Zusammenfassung. Diese Arbeit...

