Results 1 
8 of
8
Discounted likelihood linear regression for rapid speaker adaptation
 Comp. Spch. & Lang
, 2001
"... Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on ..."
Abstract

Cited by 19 (2 self)
 Add to MetaCart
(Show Context)
Rapid adaptation schemes that employ the EM algorithm may suffer from overtraining problems when used with small amounts of adaptation data. An algorithm to alleviate this problem is derived within the information geometric framework of Csiszár and Tusnády, and is used to improve MLLR adaptation on NAB and Switchboard adaptation tasks. It is shown how this algorithm approximately optimizes a discounted likelihood criterion. 1.
Rapid Speech Recognizer Adaptation to New Speakers
, 1999
"... This paper summarizes the work of the "Rapid Speech Recognizer Adaptation" team in the workshop held at Johns Hopkins University in the summer of 1998. The project addressed the modeling of dependencies between units of speech with the goal of making more effective use of small amounts of ..."
Abstract

Cited by 9 (2 self)
 Add to MetaCart
This paper summarizes the work of the "Rapid Speech Recognizer Adaptation" team in the workshop held at Johns Hopkins University in the summer of 1998. The project addressed the modeling of dependencies between units of speech with the goal of making more effective use of small amounts of data for speaker adaptation. A variety of statistical dependence models were investigated, including (i) a Gaussian multiscale process governed by a stochastic linear dynamical system on a tree, (ii) a simple hierarchical treestructured prior, (iii) explicit correlation models and (iv) Markov Random elds. In particular, we investigated dependence models of the bias components of "cascade" transforms of the Gaussian means, which improved the accuracy of the widely used adaptation by transform (constrained reestimation). Modeling methodologies are contrasted, and comparative performance on the Switchboard task is presented under identical test conditions for supervised and unsupervised adaptation with controlled amounts of adaptation speech.
Information Geometry and Maximum Likelihood Criteria
 Princeton University
, 1996
"... This paper presents a brief comparison of two information geometries as they are used to describe the EM algorithm used in maximum likelihood estimation from incomplete data. The Alternating Minimization framework based on the IGeometry developed by Csisz'ar is presented first, followed by the ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
This paper presents a brief comparison of two information geometries as they are used to describe the EM algorithm used in maximum likelihood estimation from incomplete data. The Alternating Minimization framework based on the IGeometry developed by Csisz'ar is presented first, followed by the emalgorithm of Amari. Following a comparison of these algorithms, a discussion of a variation in likelihood criterion is presented. The EM algorithm is usually formulated so as to improve the marginal likelihood criterion (as described in Section 2.1). Closely related algorithms also exist which are intended to maximize different likelihood criteria. The 1Best criterion, for example, leads to the Viterbi training algorithm used in Hidden Markov Modeling. This criterion has an information geometric description that results from a minor modification of the marginal likelihood formulation. The techniques discussed here are not given in rigorous detail, but rather at a level intended to allow comparison between them; the works cited in the bibliography should be consulted for complete and correct presentations of all methods discussed. 2 Likelihood Criteria for Incomplete Data Problems
Clustering WideContexts and HMM Topologies for Spontaneous Speech Recognition
, 2001
"... In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured imp ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
In most speech recognition systems today, all the acoustic variation associated with a phoneme is characterized in terms of the identity of its neighboring phonemes. The neighbors influence only the state observation density of a fixed Hidden Markov Model. Other sources of variation are captured implicitly by using Gaussian mixture models for the state observations. Consequently, these models can be very broad, particularly for casual spontaneous speech. In this thesis, we explore conditioning of phonemes on higher level linguistic structure, specifically syllable and wordlevel structure to learn models for phonemes that are more specific to the context, reporting experimental results on a large vocabulary (35k words) conversational speech task (Switchboard). In particular, this thesis makes three main contributions related to wide context conditioning. First, we demonstrate that syllable and wordlevel structure can be incorporated into current acoustic models to improve recognition accuracy over triphones. For a fixed number of parameters, these models are computationally more efficient than pentaphones, both in training and in testing. In addition, use of syllable and word features leads to a small but significant improvement in performance. The widecontexts used in our acoustic model can implicitly capture resyllabification effects to a certain extent. However, we find that explicitly modeling resyllabification does not improve recognition further, because there are only a small number of phones that exhibit acoustic difference after resyllabification. The second contribution addresses the difficulties that arise when a large number of additional conditioning features are used. As the number of conditioning features increases, the training cost can increase exponentially. Moreover, a large fraction of the training labels tends to have too few examples to have reliable statistics associated with them, and this could potentially cause decision trees to learn bad clusters. A new method has been developed for clustering with multiple stages, where each stage clusters a different subset of features, and also has a choice of using the partitions learned in the previous stages. Apart from reducing the risk of unreliable statistics, it is designed to ameliorate data fragmentation problem and is computationally less expensive. This method was successfully demonstrated with pentaphones, resulting in equivalent performance at a lower cost. Finally, a new algorithm is described to design contextspecific HMMs. The idea is to model reduction of a phone for certain contexts, and to learn a more constrained topology. Using contextual information, the algorithm clusters HMM paths where each path has a different number of states. An HMM distance measure has been formulated to prune out the paths which are similar. During decoding, the paths are allocated dynamically for each subword unit according to their context. We investigated this algorithm to model phone topologies, finding improved characterization of speech given known word sequences but no significant improvement in word error rate.
ISCA Archive Convergence of DLLR Rapid Speaker Adaptation Algorithms
"... Discounted Likelihood Linear Regression (DLLR) is a speaker adaptation technique for cases where there is insufficient data for MLLR adaptation. Here, we provide an alternative derivation of DLLR by using a censored EM formulation which postulates additional adaptation data which is hidden. This der ..."
Abstract
 Add to MetaCart
(Show Context)
Discounted Likelihood Linear Regression (DLLR) is a speaker adaptation technique for cases where there is insufficient data for MLLR adaptation. Here, we provide an alternative derivation of DLLR by using a censored EM formulation which postulates additional adaptation data which is hidden. This derivation shows that DLLR, if allowed to converge, provides maximum likelihood solutions. Thus the robustness of DLLR to small amounts of data is obtained by slowing down the convergence of the algorithm and by allowing termination of the algorithm before overtraining occurs. We then show that discounting the observed adaptation data by postulating additional hidden data can also be extended to MAP estimation of MLLRtype adaptation transformations. 1.
RAPID SPEECH RECOGNIZER ADAPTATION TO NEW SPEAKERS
"... This paper summarizes the work of the “Rapid Speech Recognizer Adaptation ” team in the workshop held at Johns Hopkins University in the summer of 1998. The project addressed the modeling of dependencies between units of speech with the goal of making more effective use of small amounts of data for ..."
Abstract
 Add to MetaCart
(Show Context)
This paper summarizes the work of the “Rapid Speech Recognizer Adaptation ” team in the workshop held at Johns Hopkins University in the summer of 1998. The project addressed the modeling of dependencies between units of speech with the goal of making more effective use of small amounts of data for speaker adaptation. A variety of methods were investigated and their effectiveness in a rapid adaptation task defined on the SWITCHBOARD conversational speech corpus is reported. 1.
Convergence of DLLR Rapid Speaker Adaptation Algorithms
"... Discounted Likelihood Linear Regression (DLLR) is a speaker adaptation technique for cases where there is insufficient data for MLLR adaptation. Here, we provide an alternative derivation of DLLR by using a censored EM formulation which postulates additional adaptation data which is hidden. This der ..."
Abstract
 Add to MetaCart
(Show Context)
Discounted Likelihood Linear Regression (DLLR) is a speaker adaptation technique for cases where there is insufficient data for MLLR adaptation. Here, we provide an alternative derivation of DLLR by using a censored EM formulation which postulates additional adaptation data which is hidden. This derivation shows that DLLR, if allowed to converge, provides maximum likelihood solutions. Thus the robustness of DLLR to small amounts of data is obtained by slowing down the convergence of the algorithm and by allowing termination of the algorithm before overtraining occurs. We then show that discounting the observed adaptation data by postulating additional hidden data can also be extended to MAP estimation of MLLRtype adaptation transformations. 1.