Results 1 -
3 of
3
Pitch prediction from MFCC vectors for speech reconstruction
- in ICASSP
, 2004
"... This work proposes a technique for reconstructing an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predict ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
This work proposes a technique for reconstructing an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs). Previous speech reconstruction methods have required an additional pitch element, but this work proposes two maximum a posteriori (MAP) methods for predicting pitch from the MFCC vectors themselves. The first method is based on a Gaussian mixture model (GMM) while the second scheme utilises the temporal correlation available from a hidden Markov model (HMM) framework. A formal measurement of both frame classification accuracy and RMS pitch error shows that an HMM-based scheme with 5 clusters per state is able to correctly classify over 94 % of frames and has an RMS pitch error of 3.1Hz in comparison to a reference pitch. Informal listening tests and analysis of spectrograms reveals that speech reconstructed solely from the MFCC vectors is almost indistinguishable from that using the reference pitch. 1.
MAP prediction of pitch from MFCC vectors for speech reconstruction
- Proc. ICSLP,2004
"... This work proposes a method of predicting pitch and voicing from mel-frequency cepstral coefficient (MFCC) vectors. Two maximum a posteriori (MAP) methods are considered. The first models the joint distribution of the MFCC vector and pitch using a Gaussian mixture model (GMM) while the second method ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
This work proposes a method of predicting pitch and voicing from mel-frequency cepstral coefficient (MFCC) vectors. Two maximum a posteriori (MAP) methods are considered. The first models the joint distribution of the MFCC vector and pitch using a Gaussian mixture model (GMM) while the second method also models the temporal correlation of the pitch contour using a combined hidden Markov model (HMM)-GMM framework. Monophone-based HMMs are connected together in the form of an unconstrained monophone grammar which enables pitch to be predicted from unconstrained speech input. Evaluation on 130,000 MFCC vectors reveals a voicing classification accuracy of over 92% and an RMS pitch error of 10Hz. The predicted pitch contour is also applied to MFCC-based speech reconstruction with the resultant speech almost indistinguishable from that reconstructed using a reference pitch. 1.

