41 citations found. Retrieving documents...
T. Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, Dordrecht, 1997.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Corpus-Based Unit Selection for Natural-Sounding Speech Synthesis - Yi (2003)   (Correct)

....based on the query and formulate a reply via language generation. The symbolic response is then transformed back to the speech domain by the process of speech synthesis described above. 1. 1 Background Although speech synthesis has had a long history [71, 87] progress is still being made [43, 152] and recent attention in the field has been primarily focused on concatenating real human speech selected from a large corpus or inventory. The method of concatenation [107] is not a new idea and has been applied to other parametric speech representations including, for example, linear prediction ....

T. Dutoit, Ed., An introduction to text-to-speech synthesis,KluwerAcademic Publishers, 1996.


Applying Talking Head Technology To A Web Based Weather Service - Dam, de Souza (2002)   (Correct)

.... and silence are used for the purpose of contrast and emphasis, They serve somewhat the same purpose the white of this paper does for the print, but they do it in a more dramatic way [15] Pitch variation can serve to emphasise certain words or syllables in order to highlight their importance [8]. Researchers and people involved in speech have described the speech correlates of emotion. This is a complex task given that there is no consensus on what the emotions are, but most seem to agree on the nature of basic emotions [16] The same remarks can be made concerning how the speech ....

T. Dutoit. An Introduction toText-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.


CORA: An Anthropomorphic Robot Assistant for Human.. - Iossifidis.. (2002)   (Correct)

....lower resolution of the fovea cameras with respect to the human fovea. 2. 2 Acoustic and Phonetic Abilities Cora s acoustic system consisting of microphones on the hardware side and the speech recognizer ears [7] 8] on the software side forms together with the speech synthesizer called mbrola [3] [1] a dialog system. 2.3 Head Configuration For grasping the relative configuration of body, arm and head is crucial: Humans typically use visually controlled arm and hand movements with the position of the head above and behind the grasping position. This and the fact that most visually ....

Thierry Dutoit. An Introduction to Text-to-Speech Synthesis. Dordrecht: Kluwer, 1997.


Speech Synthesis Using HMM-Based Acoustic Unit Inventory - Matousek   (Correct)

....voiced sounds) or by Gaussian noise source. Prosody matching is straightforward, since pitch, duration and intensity are explicit parameters of the model. Spectral discontinuities encountered in the concatenation points are simple to smooth by linear interpolating the difference between PARCORs [7]. The resulting synthetic signal is intelligible, more natural than in previous approach. However it is accompanied by a characteristic buzziness caused by pulse like nature of the excitation signal. An alternative solution to this problem is the usage of the LP residual excitation signal (so ....

Dutoit T. (1997): "An Introduction to Text-toSpeech Synthesis"; Kluwer Academic Publishers, Dordrecht.


Generating Multilingual Personalized Descriptions.. - Androutsopoulos.. (2001)   (5 citations)  (Correct)

....exhibits are selected by approaching them. Once an exhibit has been selected, the system retrieves from the database all the information that is relevant to the exhibit, and using natural language generation [McDonald 2000; McKeown 1995; Reiter Dale 1997, 2000] and speech synthesis techniques [Dutoit 1997] it produces an appropriate textual or spoken description. Figure 3 shows an English description generated by the current demonstrator, and Figure 4 shows the same description in Greek. The structure of the database and the generation process will be discussed further in Section 3. visitor ....

T. Dutoit. An Introduction to Text-To-Speech Synthesis. Kluwer Academic Press, 1997.


AIBO's first words. The social learning of language and meaning - Steels, Kaplan (2001)   (2 citations)  (Correct)

....might occur in the dialog. Although the recognition rate is high, it is not perfect and so provisions must be made in the language game script to overcome the problem of recognition error. The speech synthesis system is a state of the art text to speech synthesiser, similar to the one described in [Dutoit, 1997]. An associative memory stores relations between object views and words. The different views of an object form an implicit category [Kaplan, 1998] based on the fact that they are named the same way. Word learning takes place by reinforcement learning [Sutton and Barto, 1998] When the ....

Dutoit, T. (1997). An introduction to Text-To-Speech Synthesis. Kluwer Acamdec Publishers, Dordrecht.


Lifelike Gesture Synthesis and Timing for Conversational Agents - Wachsmuth, Kopp (2001)   (Correct)

....marked with prominence values which support the subsequent generation of prosodic parameters (phoneme length; intonation) to produce a linguistic representation of text input. From this representation, speech output is generated by the use of MBROLA and the German diphone database provided for it [3]. MBROLA is a real time concatenative speech synthesizer which is based on the multi band resynthesis, pitchsynchronous overlap add procedure (MBR PSOLA) To achieve a variety of alterations in intonation and speech timing, pitch can be varied by a factor within the range of 0.5 to 2.0, and phone ....

Dutoit, T. (1997) An Introduction to Text-to-Speech Synthesis. Dordrecht: Kluwer Academic Publishers.


General-Purpose Architectures for Media Processing.. - Parthasarathy..   (Correct)

....based on cepstral co efficients attributable to the shape of the vocal tract. We run the benchmark with the RASTA front end filtering technique that allows for removal of additive noise and spectral distortion. We use the ex5 c1.wav input file from the UCLA MediaBench suite. Speech synthesis [33] is the process of converting text to speech and consists of two parts (1) the natural language processing, and (2) the digital signal processing. Our speech synthesis benchmark focuses on the former. Specifically, our benchmark is based on the alpha version of the FreeSpeech text to speech ....

Thierry Dutoit. An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, 1996.


A Multi-Strategy Approach to Improving Pronunciation by Analogy - Marchand, Damper   (5 citations)  (Correct)

....can actually outperform traditional rules. However, this possibility is not usually given much credence. For instance, Divay and Vitale (1997) recently wrote: To our knowledge, learning algorithms, although promising, have not (yet) reached the level of rule sets developed by humans (p. 520) Dutoit (1997) takes this further, stating such training based strategies are often assumed to exhibit much more intelligence than they do in practice, as revealed by their poor transcription scores (footnote 14, p. 115) Pronunciation by analogy (PbA) is a data driven technique for the automatic ....

Dutoit, Thierry (1997). Introduction to Text-to-Speech Synthesis. Dordrecht, The Netherlands: Kluwer.


The Need for Increased Speech Synthesis Research: Report .. - Sproat, Ostendorf..   (Correct)

....abbreviation can be expanded as doctor or drive (and in other ways too) Thus, it must be able to enumerate the possible expansions of this abbreviation and to disambiguate among them given the context. Text analysis methods for TTS have been reviewed in a number of places, including [Klatt, 1987, Dutoit, 1997, Sproat, 1998] Here, we will only briefly describe the various approaches that have been taken to the more prominent problems, namely word pronunciation and homograph disambiguation. The architecturally simplest approach to word pronunciation involves letter to sound rules. These are rules that ....

Dutoit, T. (1997). An Introduction to Text-to-Speech Synthesis. Kluwer, Dordrecht.


Domain Specific Text Processing for Speech Synthesis - Heyman (2001)   (Correct)

....speech has had flaws demanding a certain tolerance of the listener. Improved synthesis techniques have made the quality of the generated speech increasingly higher and it has lately become possible to make use of it in commercial applications, for example telecommunication services [Allen, 1992] [Dutoit, 1997]. Now, it is not necessarily the acoustic quality of what is being said that is likely to disturb the listener, but rather the failure of the Text to Speech (TTS) system to correctly read out words and expressions that constitute some linguistic problem. When Text to Speech synthesis is used to ....

....task can be divided into two broad processes, linguistic analysis and synthesis. The goal of the analysis part is to make a narrow phonetic transcription of the text, with additional information about prosody. The synthesizing part should use the transcription to produce natural sounding speech [Dutoit, 1997]. This introduction will describe the analysis; the synthesis will be briefly outlined at the end. The part of the system that performs the analysis comprises several components that work on different levels; there is some initial pre processing, morphological, contextual and syntactic analyses, ....

[Article contains additional citation context not shown here]

Dutoit, T., (1997), An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, Dordrecht.


The Mbrola Project: Towards A Set Of High Quality.. - Dutoit, Pagel..   (32 citations)  Self-citation (Dutoit)   (Correct)

....2. MBROLA ALGORITHM The MBROLA 2. 00 program uses a technique known as Multi Band Resynthesis OverLap Add which produces speech by diphone (triphone or polyphone will be available in future versions) concatenation (for an introduction to concatenative approaches to TTS synthesis, refer to [2]) Like the well known PSOLA methods (TD PSOLA for Time Domain Pitch Synchronous OverLap Add [3] or PIOLA [4] standing for Pitch Inflected Overlap Add, or MBR PSOLA[5] standing for Multi Band Resynthesis Pitch Synchronous OverLap Add) it adds overlapping frames directly in the time domain. ....

T. Dutoit. An Introduction to Text-to-Speech Synthesis.Kluwer Academic Publishers, Boston, 1996. Forthcoming textbook.


Modeling Improved Prosody Generation from High-Level.. - Xydas, al. (2005)   (Correct)

No context found.

T. Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, Dordrecht, 1997.


Tone-Group F 0 selection for modeling focus prominence - In Small-Footprint Speech   (Correct)

No context found.

Dutoit, T., 1997. An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht.


Text-To-Speech Technologies for Mobile Telephony Services - Paulseph-John Farrugia..   (Correct)

No context found.

Thierry Dutoit. An Introduction to Text-To-Speech Synthesis, volume 3 of Text, Speech and Language Technology. Kluwer Academic Publishers, P.O. Box 322, 3300 AH Dordrecht, The Netherlands, 1997.


An Architecture for Voice-Enabled Interfaces over Local.. - Bagein Pietquin Ris (2003)   (Correct)

No context found.

T. Dutoit, "An Introduction to Text-To-Speech Synthesis", Kluwer Academic Publishers, Dordrecht, 1997.


Enabling Speech Based Access to Information - Management Systems Over (2003)   (Correct)

No context found.

T. Dutoit, "An Introduction to Text-To-Speech Synthesis", Kluwer Academic Publishers, Dordrecht, 1997.


XML Representation Languages as a Way of Interconnecting TTS.. - Schröder, Breuer (2004)   (Correct)

No context found.

T. Dutoit, An Introduction to Text-to-Speech Synthesis. Dordrecht: Kluwer Academic, 1997.


Emotional Speech Synthesis for Emotionally-Rich Virtual Worlds - Schröder (2003)   (Correct)

No context found.

Thierry Dutoit. An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, 1997.


"May I talk to you? :-)" - Facial Animation from Text - Albrecht, Haber, Kähler.. (2002)   (Correct)

No context found.

T. Dutoit. An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, 1997.


First Implementation of VHML on the Java Text-to-Speech Synthesiser - De Souza   (Correct)

No context found.

T. Dutoit. An Introduction toText-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.


"May I talk to you? :-)" - Facial Animation from Text - Albrecht, Haber, Kähler.. (2002)   (Correct)

No context found.

T. Dutoit. An Introduction to Text-to-Speech Synthesis. Kluwer Academic Publishers, Dordrecht, 1997.


A Novel Discontinuity Metric for Unit Selection Text-to-Speech.. - Bellegarda   (Correct)

No context found.

T. Dutoit, An Introduction to Text--to--Speech Synthesis, Norwell, MA: Kluwer, 1997.


Subjective Evaluation Of Join Cost Smoothing Methods - Jithendra Vepa And   (Correct)

No context found.

T. Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, The Netherlands, 1997.


Prosody Modelling for Syllable-Based Speech Synthesis - Kopecek, Pala   (Correct)

No context found.

T. Dutoit, An Introduction to Text-to-Speech Synthesis, Kluwer Academic Publishers, 1997.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC