6 citations found. Retrieving documents...
M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Linear Feature Space Projections for Speaker Adaptation - Saon, Zweig, Padmanabhan (2001)   (Correct)

....of component j at time t given the complete observation sequence. The gradient of (4) with respect to A has the expression TA GammaT Gamma T X t=1 N X j=1 fl t (j ) Sigma Gamma1 j AX t X T t Gamma fl t (j ) Sigma Gamma1 j j X T t (5) Following the terminology from [2], we define the sufficient statistics for feature space MLLR by: ffl K = T X t=1 N X j=1 fl t (j ) Sigma Gamma1 j j X T t and ffl G i = T X t=1 N X j=1 fl t (j) oe 2 ji X t X T t ; i = 1 : n where Sigma j = diag(oe 2 j1 ; oe 2 jn ) By rewriting (5) in ....

M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000, Beijing, 2000.


SCANMail: Audio Navigation in the Voicemail Domain - Bacchiani, Hirschberg.. (2001)   (1 citation)  Self-citation (Bacchiani)   (Correct)

....[4] to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood [1] is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....

M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. In Proceedings of the Sixth International Conference on Spoken Language Processing, volume 4, pages 536--539, Beijing, 2000.


Automatic Transcription Of Voicemail At ATT - Bacchiani (2001)   Self-citation (Bacchiani)   (Correct)

....of a clustering configuration remains unclear even given the supervisory information. Therefore, in the experiments using normalization in training, unsupervised clustering was applied to the training data as well. Two clustering approaches were investigated that were compared in more detail in [7]. The first used Text Independent Gaussian Mixture Models (TIGMMs) to represent messages and used an agglomerative clustering approach with a likelihood based distance metric. The models were estimated on the speech frames of the messages only. To distinguish speech from silence and noises, the ....

....m denotes the m th mean of the mixture model of message i, n denotes the n th mean of the mixture model of message j, M denotes the mixture model of message i and N denotes the mixture model of message j. The second clustering approach was the MLLR based algorithm described in detail in [7]. This algorithm uses the MLLR adaptation statistics to directly optimize the MLLR adaptation likelihood of the cluster data. In contrast to the TIGMM approach, this clustering approach is consistent as it optimizes the same objective used in MLLR adaptation. In addition, this approach is ....

M. Bacchiani, "Using Maximum Likelihood Linear Regression for Segment Clustering and Speaker Identification," In Proceedings of the International Conference on Spoken Language Processing, Vol. 4, pp. 536-539, 2000.


SCANMail: Browsing and Searching Speech Data by Content - Hirschberg, Bacchiani..   (4 citations)  Self-citation (Bacchiani)   (Correct)

....[10] to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood [11] is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....

M. Bacchiani, "Using maximum likelihood linear regression for segment clustering and speaker identification," in Proceedings of the Sixth International Conference on Spoken Language Processing, vol. 4, (Beijing), pp. 536--539, 2000.


SCANMail: Audio Navigation in the Voicemail Domain - Bacchiani, Hirschberg.. (2001)   (1 citation)  Self-citation (Bacchiani)   (Correct)

....1999) to obtain progressively more accurate acoustic models and uses these in a rescoring framework. In contrast to Switchboard, voicemail messages are generally too short too allow direct application of the normalization techniques. A novel message clustering algorithm based on MLLR likelihood (Bacchiani, 2000) is used to guarantee sufficient data for normalization. The final transcripts, obtained after 6 recognition passes, have a word error rate of 28.7 a 6.2 accuracy improvement. Gender dependency provides 1.6 of this gain. VTLN then additively improves accuracy with 1.0 when applied only on ....

Bacchiani, M. 2000. Using maximum likelihood linear regression for segment clustering and speaker identification. In Proceedings of the Sixth International Conference on Spoken Language Processing, volume 4, pages 536--539, Beijing.


Linear Feature Space Projections For Speaker Adaptation - George Saon Geoffrey (2001)   (Correct)

No context found.

M. Bacchiani. Using maximum likelihood linear regression for segment clustering and speaker identification. Proceedings of ICSLP 2000.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC