| M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000. |
....of component j at time t given the complete observation sequence. The gradient of (4) with respect to A has the expression TA GammaT Gamma T X t=1 N X j=1 fl t (j ) Sigma Gamma1 j AX t X T t Gamma fl t (j ) Sigma Gamma1 j j X T t (5) Following the terminology from [2], we define the sufficient statistics for feature space MLLR by: ffl K = T X t=1 N X j=1 fl t (j ) Sigma Gamma1 j j X T t and ffl G i = T X t=1 N X j=1 fl t (j) oe 2 ji X t X T t ; i = 1 : n where Sigma j = diag(oe 2 j1 ; oe 2 jn ) By rewriting (5) in ....
M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000, Beijing, 2000.
....[4] to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood [1] is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....
M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. In Proceedings of the Sixth International Conference on Spoken Language Processing, volume 4, pages 536--539, Beijing, 2000.
....of a clustering configuration remains unclear even given the supervisory information. Therefore, in the experiments using normalization in training, unsupervised clustering was applied to the training data as well. Two clustering approaches were investigated that were compared in more detail in [7]. The first used Text Independent Gaussian Mixture Models (TIGMMs) to represent messages and used an agglomerative clustering approach with a likelihood based distance metric. The models were estimated on the speech frames of the messages only. To distinguish speech from silence and noises, the ....
....m denotes the m th mean of the mixture model of message i, n denotes the n th mean of the mixture model of message j, M denotes the mixture model of message i and N denotes the mixture model of message j. The second clustering approach was the MLLR based algorithm described in detail in [7]. This algorithm uses the MLLR adaptation statistics to directly optimize the MLLR adaptation likelihood of the cluster data. In contrast to the TIGMM approach, this clustering approach is consistent as it optimizes the same objective used in MLLR adaptation. In addition, this approach is ....
M. Bacchiani, "Using Maximum Likelihood Linear Regression for Segment Clustering and Speaker Identification," In Proceedings of the International Conference on Spoken Language Processing, Vol. 4, pp. 536-539, 2000.
....[10] to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood [11] is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....
M. Bacchiani, "Using maximum likelihood linear regression for segment clustering and speaker identification," in Proceedings of the Sixth International Conference on Spoken Language Processing, vol. 4, (Beijing), pp. 536--539, 2000.
....1999) to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood (Bacchiani, 2000) is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....
Bacchiani, M. 2000. Using maximum likelihood linear regression for segment clustering and speaker identification. In Proceedings of the Sixth International Conference on Spoken Language Processing, volume 4, pages 536--539, Beijing.
No context found.
M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC