Results 1 -
3 of
3
An Overview of the VUB Entry for the 2008 Blizzard Challenge
"... In this paper, we describe the configuration of our synthesizer, as used for the Blizzard Challenge the first time. Two new UK English voices were built for the DSSP synthesizer, our in-house unit selection synthesizer, which uses non-uniform units and a symbolic description of target prosody. Liste ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
In this paper, we describe the configuration of our synthesizer, as used for the Blizzard Challenge the first time. Two new UK English voices were built for the DSSP synthesizer, our in-house unit selection synthesizer, which uses non-uniform units and a symbolic description of target prosody. Listening tests indicate reasonable quality although there is still room for improvement. Index Terms: speech synthesis, unit selection, evaluation of synthesized speech
Optionality In Evaluating Prosody Prediction
"... This paper concerns the evaluation of prosody prediction at the symbolic level, in particular the locations of pitch accents and intonational boundaries. One evaluation method is to ask an expert to annotate text prosodically, and to compare the system's predictions with this reference. However, thi ..."
Abstract
- Add to MetaCart
This paper concerns the evaluation of prosody prediction at the symbolic level, in particular the locations of pitch accents and intonational boundaries. One evaluation method is to ask an expert to annotate text prosodically, and to compare the system's predictions with this reference. However, this ignores the issue of optionality: there is usually more than one acceptable way to place accents and boundaries. Therefore, predictions that do not match the reference are not necessarily wrong. We propose dealing with this issue by means of a 3-class annotation which includes a class for optional accents/boundaries. We show, in a prosody prediction experiment using a memory-based learner, that evaluating against a 3-class annotation derived from multiple independent 2-class annotations allows us to identify the real prediction errors and to better estimate the real performance. Next, it is shown that a 3-class annotation produced directly by a single annotator yields a reasonable approximation of the more expensive 3-class annotation derived from multiple annotations. Finally, the results of a larger scale experiment confirm our findings.
The INESC-ID Blizzard Entry: Unsupervised Voice Building and Synthesis
"... This paper describes the INESC-ID participation in the Blizzard Challenge 2008, which consisted in building the two English voices. We have been developing a new European Portuguese TTS system, called DIXI, for the last two years. This year, the system was already stable enough to be used in the cha ..."
Abstract
- Add to MetaCart
This paper describes the INESC-ID participation in the Blizzard Challenge 2008, which consisted in building the two English voices. We have been developing a new European Portuguese TTS system, called DIXI, for the last two years. This year, the system was already stable enough to be used in the challenge, after a partial adaptation to support synthesis in English. The major motivation for our participation in this year’s edition of the challenge was to evaluate to what extent our unsupervised and less resource-demanding voice building methods, very successfully applied in limited domain applications, can be used in open domain synthesis. Index Terms: speech synthesis, cluster unit selection, automatic voice building methods.

