Results 1 -
9 of
9
Production Models As A Structural Basis For Automatic Speech Recognition
, 1996
"... We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeli ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating models from the production community with the probabilistic analysis-by-synthesis strategy currently used by the technology community. R ' ESUM ' EE Dans cet article, nous proposons que les mod`eles de production de la parole contribueront beaucoup `a la r'eussite eventuelle des mod`eles de reconnaissance automatique, limit'es en ce moment par les faiblesses de la base th'eorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des mod`eles phonologiques et mod`...
A Multichannel Articulatory Database and its Application for Automatic Speech Recognition
- In Proceedings 5 th Seminar of Speech Production
, 2000
"... The goal of this research is to improve the performance of a speaker-independent Automatic Speech Recognition (ASR) system by using directly measured articulatory parameters in the training phase. This paper examines the need for a multi-channel/multi-speaker articulatory database and describes the ..."
Abstract
-
Cited by 21 (1 self)
- Add to MetaCart
The goal of this research is to improve the performance of a speaker-independent Automatic Speech Recognition (ASR) system by using directly measured articulatory parameters in the training phase. This paper examines the need for a multi-channel/multi-speaker articulatory database and describes the design of such a database and the processes involved in its creation. 1.
Deep Belief Networks for phone recognition
"... Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable ..."
Abstract
-
Cited by 17 (9 self)
- Add to MetaCart
Hidden Markov Models (HMMs) have been the state-of-the-art techniques for acoustic modeling despite their unrealistic independence assumptions and the very limited representational capacity of their hidden states. There are many proposals in the research community for deeper models that are capable of modeling the many types of variability present in the speech generation process. Deep Belief Networks (DBNs) have recently proved to be very effective for a variety of machine learning problems and this paper applies DBNs to acoustic modeling. On the standard TIMIT corpus, DBNs consistently outperform other techniques and the best DBN achieves a phone error rate (PER) of 23.0 % on the TIMIT core test set. 1
A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition
, 2002
"... Current-generation automatic speech recognition #ASR# systems assume that words are readily decomposable into constituent phonetic components ##phonemes"#. A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic #beads-on-a-string" app ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
Current-generation automatic speech recognition #ASR# systems assume that words are readily decomposable into constituent phonetic components ##phonemes"#. A detailed linguistic dissection of state-of-the-art speech recognition systems indicates that the conventional phonemic #beads-on-a-string" approach is of limited utility, particularly with respect to informal, conversational material. The study shows that there is a signi#cantgapbetween the observed data and the pronunciation models of current ASR systems. It also shows that many important factors a#ecting recognition performance are not modeled explicitly in these systems.
Phonetic Classification And Recognition Using HMM Representation Of Overlapping Articulatory Features For All Classes Of English Sounds
- Proceedings ICASSP-94
, 1994
"... Our recent eorts in developing a feature-based general statistical framework intended for unlimited-vocabulary speech recognition are reported. The design of the feature -based atomic units of speech is aimed at a parsimonious scheme to share the inter-word and inter-phone speech data. Our basic des ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Our recent eorts in developing a feature-based general statistical framework intended for unlimited-vocabulary speech recognition are reported. The design of the feature -based atomic units of speech is aimed at a parsimonious scheme to share the inter-word and inter-phone speech data. Our basic design philosophy has been motivated by the theory of distinctive features and by a new form of phonology which argues for use of multi-dimensional articulatory structures [1]. The work reported here is a signicant extension of our earlier studies [4, 5] in three aspects. First, a comprehensive set of features is developed, enabling the recognizer to operate on all classes of English sounds. Second, a more ecient strategy is devised for feature-based lexical representation. Third, more extensive evaluation results, including both the phonetic classication and phonetic recognition results, are reported. 1. INTRODUCTION As for the design of any speech recognizer, two major issues are of conc...
Techniques for modelling Phonological Processes in Automatic Speech Recognition
, 2001
"... Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Declaration This dissertation is the result of my own work and includes nothing which is the outcome of work done in collaboration, except where stated. It has not been submitted in whole or part for a degree at any other university. The length of this thesis including footnotes and appendices does not exceed 29,500 words and includes no more than 40 figures. 1 Systems which automatically transcribe carefully dictated speech are now commercially available, but their performance degrades dramatically when the speaking style of users becomes more relaxed or conversational. This dissertation focuses on techniques that aim to improve the robustness of statistical speech transcription systems to conversational speaking styles. The dissertation shows first that the performance degradation occuring as speech becomes more conversational is severe and is partially attributable to differences in the acoustic realizations of sentences. Hypothesizing that the quantifiably wider range of
A Gaussian Mixture Model Spectral Representation for Speech Recognition
"... Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Summary Most modern speech recognition systems use either Mel-frequency cepstral coefficients or per-ceptual linear prediction as acoustic features. Recently, there has been some interest in alter-native speech parameterisations based on using formant features. Formants are the resonant frequencies in the vocal tract which form the characteristic shape of the speech spectrum. How-ever, formants are difficult to reliably and robustly estimate from the speech signal and in some cases may not be clearly present. Rather than estimating the resonant frequencies, formant-like features can be used instead. Formant-like features use the characteristics of the spectral peaks to represent the spectrum. In this work, novel features are developed based on estimating a Gaussian mixture model (GMM) from the speech spectrum. This approach has previously been used sucessfully as a speech codec. The EM algorithm is used to estimate the parameters of the GMM. The extracted parameters: the means, standard deviations and component weights can be related to the for-mant locations, bandwidths and magnitudes. As the features directly represent the linear spec-trum, it is possibly to apply techniques for vocal tract length normalisation and additive noise
Pseudo-Articulatory Representations In Speech Synthesis And Recognition
"... Pseudo-Articulatory Representations are increasingly being used in work on speech synthesis and recognition. The value of such representations lies in their derivation from linguistic abstractions -- they are based on articulatory idealizations used by linguists to describe speech. Iles [4] has demo ..."
Abstract
- Add to MetaCart
Pseudo-Articulatory Representations are increasingly being used in work on speech synthesis and recognition. The value of such representations lies in their derivation from linguistic abstractions -- they are based on articulatory idealizations used by linguists to describe speech. Iles [4] has demonstrated that using these representations it is possible to overcome the many-to-one problem in mapping articulatory configuration to acoustic signal. In this paper we show how the representations facilitate the details of speech processing, for both synthesis and recognition, and we give details of work in progress on recognition. The role of Pseudo-Articulatory Representations in the development of an integrated approach to synthesis and recognition is also discussed.
SIGNAL PROCESSING Maximum likelihood in statistical estimation of dynamic systems: Decomposition algorithm and simulation results
, 1996
"... In this paper, we describe an efficient decomposition algorithm for parameter estimation of linear dynamical systems with the state-space formulation which contain a “drive ” term as a free, unknown system parameter. The dynamical system can be viewed as a natural extension from the discrete-state h ..."
Abstract
- Add to MetaCart
In this paper, we describe an efficient decomposition algorithm for parameter estimation of linear dynamical systems with the state-space formulation which contain a “drive ” term as a free, unknown system parameter. The dynamical system can be viewed as a natural extension from the discrete-state hidden Markov model to its continuous-state counterpart. The focus of this paper is on unified techniques for efficient estimation of the p~eters of such a model. The Expectation-Maximization (EM) algorithm is developed, in conjunction with the conventional Kalman smoothing estimators, for estimating the system parameters by maximum likelihood. The algorithm developed is applicable to either stationary or non-stationary version of the dynamic system. In particular, a decomposition technique is described in detail and is shown to reduce effectively the compu~tional load in parameter estimation for hip-dimensioM1 systems. Simulation results are presented which demonstrate the accuracy of the proposed parameter estimation technique. @ 1997 Elsevier Science B.V.

