70 citations found. Retrieving documents...
Klatt, D. H. Review of Text-to-Speech Conversion for English. Journal of the Acoustical Society of America, 82, 3 (1987), 737-793.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Intrinsic Phone Durationsare Speaker-Specific - Hartmut Pf Itzinger   (Correct)

....on ANNs or HMMs. While the first model type requires explicit rules formulated by an expert, the other three types extract their knowledge from phone duration distributions calculated from large spoken language resources. Much work in the field of constructing duration models was done by Klatt [2], Kohler [3] and van Santen [4] The goal of any duration model to generate natural sounding timing cannot in any practical sense be achieved because durational phenomena are too complex (van Santen 1993 [1, p. 1398] Five years later he writes: there is a sizable amount of durational ....

Dennis H. Klatt, "Review of text-to-speech conversion for English," J. of the Acoustical Society of America, vol. 82, no. 3, pp. 737--793, 1987.


Corpus-Based Unit Selection for Natural-Sounding Speech Synthesis - Yi (2003)   (Correct)

....form the test set for synthesis evaluations. 153 6.2 Makeupofautomaticallyacquiredintonationclasses. 159 6.3 Recognition error analysis for di#erent synthesis configurations. 168 6.4 Worderrorratesforspeechsynthesizedwithrescoring. 170 Introduction Speech synthesis [71] is an automatic encoding process carried out by machine through which symbols conveying linguistic information are converted into an acoustic pressure waveform. The speech message may originate from a representation of the meaning to be communicated. The process of language generation produces a ....

....Now, the machine can then retrieve information based on the query and formulate a reply via language generation. The symbolic response is then transformed back to the speech domain by the process of speech synthesis described above. 1. 1 Background Although speech synthesis has had a long history [71, 87], progress is still being made [43, 152] and recent attention in the field has been primarily focused on concatenating real human speech selected from a large corpus or inventory. The method of concatenation [107] is not a new idea and has been applied to other parametric speech representations ....

[Article contains additional citation context not shown here]

D. Klatt, "Review of text to speech conversion for English," Journal of the Acoustical Society of America, vol. 82, pp. 737--793, 1987.


Compression Of Acoustic Inventories Using Asynchronous.. - Kain, van Santen (2002)   (1 citation)  (Correct)

....of each phoneme, resulting in a sorted list of frequencies of the form ### ## , ## ## , ## ## , ## ## , ## ## , ## ## , ##, where # is a constant upper limit frequency beyond which no spectral modification is desired. When formants were not visible, we used formant frequencies from locus theory [3]. During synthesis, # has access to both the original features## ## and the desired features # # , using them to calculate a non uniform sampling of the original frequency locations. To obtain the final spectrum, the magnitude and unwrapped phases are interpolated at the new frequencies. ....

D. Klatt, "Review of text-to-speech conversion for english," J. Acoust. Soc. Am., vol. 82, no. 3, pp. 737--793, September 1987.


Phonological Parsing for Bi-directional.. - Meng (1995)   (11 citations)  (Correct)

....the power to speak and listen can create a user friendly, hands free and eyes free environment for the user, and the speech medium can provide an ecient and economical mode of transmission. Great strides have been made in many areas of speech research over the past few decades. Speech synthesizers [41] have achieved a reasonable degree of clarity and naturalhess, and are striving to cover unlimited vocabularies. Speech recognizers are now capable of speaker independent, large vocabulary, continuous speech recognition. The speech in put may either be read or spontaneous. 1 Vocabulary sizes can ....

....attempts to capture letter sound regularities for the development of pronunciation and spelling systems. 1.4 Previous Work 1.4.1 Letter to Sound Generation A myriad of approaches have been applied to the problem of letter to sound gener ation. Excellent reviews can be found in [18] 29] and [41]. The various approaches have given rise to a wide range of letter to sound generation accuracies. Many of these accuracies are based on different corpora, and some corpora may be more dif ficult than others. Furthermore, certain systems are evaluated by human subjects, while others have their ....

Klatt, D., "Review of Text-to-speech Conversion for English," JASA 82 (3), Acoustic Society of America, pp. 737-793, 1987.


Whistler: A Trainable Text-To-Speech System - Huang, Acero, Adcock, Hon..   (17 citations)  (Correct)

....of intelligibility, they typically sound unnatural. The process of deriving these rules is not only labor intensive but also difficult to generalize to a new language, a new voice, or a new speech style. For prosody modeling, most TTS systems use linguistic rules to define the prosody parameters [5,11]. Only limited natural language processing is generally used prior to prosody parameter generation. These rule based prosody models tend to sound robotic. Moreover, while these rules may have been derived from speech of a donor speaker, the resulting synthetic prosody typically does not resemble ....

Klatt D. "Review of text-to-speech conversion for English". Journal of the Acoustical Society of America, 82(3):737793, 1987.


Flexible Speech Synthesis Using Weighted Finite State Transducers - Bulyko (1996)   (1 citation)  (Correct)

....applications of HMMs include automatic speech segmentation [100, 55, 38] and smoothing waveforms at the concatenation points [70] 2.4 Unit Selection 2.4.1 Synthesis Unit Choosing the inventory of units is a subject of ongoing research. Diphone based systems have been o ered for many years [51]. A diphone database contains the transitions between all pairs of phones that can exist in a given language. While English has approximately 50 phones the total number of diphones ranges between 1500 and 2000, as some combinations never occur. Diphone based systems can produce very intelligible ....

D. Klatt. Review of text-to-speech conversion for english. Journal of the Acoustical Society of America, 82(3):737-793, 1987.


Development and Comparison of Three Syllable Stress Classifiers - Karen Jenkin Michael (1996)   (4 citations)  (Correct)

....Zero or no stress (NS) is assigned to all remaining syllables. Syllable stress is extremely useful in speech processing. Most pioneering work in utilizing stress has been in the area of text tospeech synthesis, where the need for producing intelligible and natural sounding speech is paramount [5]. In the areas of speech recognition and understanding in spite of the potential benefits from prosodic information its involvement up until now there has been limited [1] Syllable stress can be useful in facilitating lexical access in isolated word recognition systems. It is estimated that 75 ....

Klatt, D., "Review of text-to-speech conversion for English", J. Acoust. Soc. America, vol.82, no.3, pp.737-793, 1987.


Dixi - Portuguese Text-To-Speech System - Oliveira, Viana, Trancoso (1991)   (Correct)

....of Portuguese. With the exception of lexical stress assignment , the linguist and phonetic module was built using a rule compiler combined with a set of auxiliary functions written in the C language. The use of a rule compiler has the advantage of imposing a more structured rule definition [6] and enabling the system developmentby researchers with less programming skills. SCYLA, Speech Compiler for Your LAnguage, the rule compiler developed by CSELT [7] was selected because of three basic features of its multi level structures, allowing each procedure to access simultaneously all the ....

Klatt, D. H. (1987), "Review of Text-to-SpeechConversion for English", JournaloftheAcoustical Society of America, 82(3), 737-793.


SignSynth: A Sign Language Synthesis Application Using .. - Angus Grieve-Smith.. (2001)   (Correct)

....Angus B. Grieve Smith Linguistics Department, Humanities 526, The University of New Mexico, Albuquerque, NM 87131 USA grvsmth unm.edu Abstract. Development of sign synthesis (also known as text to sign) can benefit from studying the history of its older cousin, speech synthesis. As Klatt [1] outlines the basic architecture of a speech synthesis application, I will discuss the architecture of a sign synthesis application and mention some of the applications and prototypes currently available. I will focus on SignSynth, a CGI based articulatory sign synthesis prototype I am ....

....boundaries [8] 3. Sign Synthesis Architecture Sign synthesis is essentially the same task as speech synthesis; the difference is the form of the output. The architecture of a sign synthesis application is thus almost identical to a speech synthesis application. 3. 1 Basic Architecture Klatt [1] describes the basic architecture of a speech synthesis application: Input text is acted on by some analysis routines that produce an abstract underlying linguistic representation. This representation is fed into synthesis routines that produce output speech. The architecture is summarized in ....

[Article contains additional citation context not shown here]

Klatt, D.: Review of Text-to-Speech Conversion for English. Journal of the Acoustic Society of America 82 (1987)


Chapter 1 - Introductionequation Section From   (Correct)

....with millions of names and acronyms. Moreover, in order to sound natural, the intonation of the sentences must be appropriately generated. The development of TTS synthesis can be traced back to the 1930s when Dudley s Voder, developed by Bell Laboratories, was demonstrated at the World s Fair [18]. Taking advantage of increasing computation power and storage technology, TTS researchers have been able to generate high quality commercial multilingual text to speech systems, although the quality is inferior to human speech for general purpose applications. The basic TTS components are shown ....

Klatt, D., "Review of Text-to-Speech Conversion for English," Journal of Acoustical Society of America, 1987, 82, pp. 737-793.


Prosodic Modeling for Improved Speech Recognition and Understanding - Wang (2001)   (Correct)

....interaction with the user. Such actions usually include responding to a user query, asking for additional information, requesting clarification, or simply prompting the user to speak, etc. The importance of prosody to the naturalness and intelligibility of speech is evident in speech synthesis (Klatt 1987). It is not surprising that much prosodic modeling work has been carried out on the prediction side for such applications. In a typical speech synthesis system, some linguistic analysis is usually performed, and prosodic tags (such as phrase boundaries, pitch accents, boundary tones, lexical ....

Klatt, D. H. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America 82 (3), 737--793.


Prosody Prediction For Speech Synthesis Using Transformational.. - Fordyce   (3 citations)  (Correct)

.... production component requires more concentration than natural speech communication between humans [53] It has been proposed by many researchers that one important area in which the intelligibility and naturalness of synthesized speech can be improved is the prosody of the synthesized speech [28, 32, 46, 53]. Prosody is an important component of human speech that helps carry important semantic, syntactic, and discourse infor1 mation of the utterance. Prosody is found in the acoustic characteristics of human speech which include pauses, fundamental frequency (F 0 ) contours, and energy of the sounds. ....

....pre recorded waveforms or canned speech, and the synthesis of an acoustic waveform from unrestricted text. Both methods have a relatively long and varied history 1 . The first method is currently used in many telephony applications, where a caller hears prerecorded utterances such as 1 See [28] for a more complete account of the history of speech synthesis. 6 The number you requested, 555 1212, can be automatically dialed for an additional 35 cents. in response to a request for information. Pre recorded speech can be natural sounding and effective in many applications. This method ....

[Article contains additional citation context not shown here]

Klatt, D. (1987). "Review of Text-to-Speech conversion for English." Journal of the Acoustic Society of America, 82(3).


The Need for Increased Speech Synthesis Research: Report .. - Sproat, Ostendorf..   (Correct)

....a general sense of the history of the field. To this end, we present in this section a brief overview of work on speech synthesis and text analysis for text to speech synthesis. Complete reviews are found elsewhere in published materials, from which much of the current material is drawn, including [Klatt, 1987, Allen, 1992, Olive, 1997] 13 3.1.1 The Synthesis of Speech Sounds The history of speech synthesis can be traced to the late 18th century and the first attempts by Kratzenstein and von Kempelen to build mechanical devices to mimic the sounds produced by the human vocal apparatus. These ....

....of speech are collectively referred to as prosody. Aspects of prosody in speech production have been investigated in various phonetic and linguistic traditions for centuries, but their investigation in the context of speech synthesis is, of course, much more recent. For example, according to [Klatt, 1987], the first implemented algorithm for determining a pitch contour was done by Ignatius Mattingly in 1966, and was incorporated into the Holmes rule based synthesizer. Intonation patterns were constructed using three basic intonational tunes ( falling , rising , and fall rise ) which were aligned ....

[Article contains additional citation context not shown here]

Klatt, D. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82:737--793.


Domain Specific Text Processing for Speech Synthesis - Heyman (2001)   (Correct)

.... be found in the lexicon are usually transcribed by rule, but there are other possibilities as well, for example the analogy strategy, where a viable pronunciation for a previously unseen word is created by using the knowledge of similar letter patterns in words the system knows how to pronounce [Klatt, 1987]. While lexicon based transcription methods usually need so called Letter to Sound (LTS) rules to fall back upon, rule based methods have to keep an exception lexicon with words whose pronunciation diverge from the language specific rules, for example loan words. There are expert rule based ....

Klatt, D. H., (1987), "Review of text-to-speech conversion for English" in Journal of Acoustical Society of America, vol. 82, no. 3, pp. 737-793.


Using Acoustic-Phonetic Information - John-Paul Hosom University   (Correct)

No context found.

Klatt, D. H. Review of Text-to-Speech Conversion for English. Journal of the Acoustical Society of America, 82, 3 (1987), 737-793.


High-Quality and Flexible Speech Synthesis with Segment Selection.. - Toda (2003)   (Correct)

No context found.

D.H. Klatt. Review of text-to-speech conversion for English. J. Acoust. Soc. Am., Vol. 82, No. 3, pp. 737--793, 1987.


Mitsubishi Electric Research Laboratories - Cambridge Research Center   (Correct)

No context found.

Dennis H. Klatt. 1987. Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82(3):737--792, September.


Mitsubishi Electric Research Laboratories - Http Www Merl   (Correct)

No context found.

Dennis H. Klatt. Review of text-to-speech conversion for English. J. Acoust. Soc. Am., 82#3#, 1987.


Conversational Interfaces: Advances and Challenges - Zue, Glass (2000)   (22 citations)  (Correct)

No context found.

D. Klatt, "Review of text-to-speech conversion for English," J. Acoust. Soc. Am., 82(3), 737--793, 1987.


Prosodic Modeling for Improved Speech Recognition and Understanding - Wang (2001)   (Correct)

No context found.

Klatt, D. H. (1987). Review of text-to-speech conversion for English. Journal of the Acoustical Society of America 82 (3), 737--793.


A Novel Syllable Duration Modeling - Approach For Mandarin (2001)   (Correct)

No context found.

D. H. Klatt (1987), "Review of text-to-speech conversion for English," J. Acoust. Soc. Amer. 82, pp.137-181.


Duration Control by Asymmetric Causal Retro-Causal Neural.. - Ca Glayan Erdem (2002)   (Correct)

No context found.

Klatt, D., 1987. Review of text-to-speech conversion for English. Journal of the Acoustical Society of America, 82 (3), 737-793.


Formant Re-Synthesis Of Dysarthric Speech - Alexander Kain Xiaochuan   (Correct)

No context found.

Dennis H Klatt, "Review of text-to-speech conversion for english," JASA, vol. 82, no. 3, pp. 737--793, 1987.


Intonation Modeling For Indian Languages - Sreenivasa Rao And (2004)   (Correct)

No context found.

D. H. Klatt, "Review of text-to-speech conversion for English," Journal of Acoustic Society of America, vol. 82(3), pp. 737--793, Sep. 1987.


Speech Production And Perception Models And Their - Applications To Synthesis   (Correct)

No context found.

Klatt, D. "Review of text-to-speech conversion for English, " JASA vol. 82, no. 3, pp. 737-793, 1987.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC