Results 1 -
3 of
3
Personalizing a Speech Synthesizer by Voice Adaptation
"... A voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
A voice adaptation system enables users to quickly create new voices for a text-to-speech system, allowing for the personalization of the synthesis output. The system adapts to the pitch and spectrum of the target speaker, using a probabilistic, locally linear conversion function based on a Gaussian Mixture Model. Numerical and perceptual evaluations reveal insights into the correlation between adaptation quality and the amount of training data, the number of free parameters. A new joint density estimation algorithm is compared to a previous approach. Numerical errors are studied on the basis of broad phonetic categories. A data augmentation method for training data with incomplete phonetic coverage is investigated and found to maintain high speech quality while partially adapting to the target voice.
VOICE CONVERSION: A CRITICAL SURVEY
"... Voice conversion is an emergent problem in voice and speech processing with increasing commercial interest, due to applications such as Speech-to-Speech Translation (SST) and personalized Text-To-Speech (TTS) systems. A Voice Conversion system should allow the mapping of acoustical features of sente ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Voice conversion is an emergent problem in voice and speech processing with increasing commercial interest, due to applications such as Speech-to-Speech Translation (SST) and personalized Text-To-Speech (TTS) systems. A Voice Conversion system should allow the mapping of acoustical features of sentences pronounced by a source speaker to values corresponding to the voice of a target speaker, in such a way that the processed output is perceived as a sentence uttered by the target speaker. In the last two decades the number of scientific contributions to the voice conversion problem has grown considerably, and a solid overview of the historical process as well as of the proposed techniques is indispensable for those willing to contribute to the field. The goal of this text is to provide a critical survey that combines historical presentation to technical discussion while pointing out advantages and drawbacks of each technique, and to bring a discussion of future directions, specially referring to the development of a perceptual benchmark process for voice conversion systems. 1.

