| Cahn, J. E., Generating Expression in Synthesized Speech, Master's Thesis, MIT, 1989. http://www.media.mit.edu/~cahn/masters-thesis.html |
.... enable the creation of new voices from existing voices [145, 66, 67] System back channel while the user is still speaking would increase the naturalness of the interactions extending previous work on intelligent barge in [136] The question remains whether expressive or emotional speech synthesis [23, 96] can be achieved with concatenative methods [58, 113, 18] Although speech synthesis has considerably improved in the past century or so, it is di#cult to predict whether synthetic speech will ever equal 174 or surpass human speech in naturalness. If it indeed does become indistinguishable or ....
J. Cahn, "Generating expressions in synthesized speech," M.S. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 1989.
....same remarks can be made concerning how the speech correlates of emotion vary. Although the type of variation is often consistent in that when one considers that for anger the median fundamental frequency, F0, will be high, there is some discordance with respect to how high the median F0 should be [4]. Table 1 shows values for a few emotions as described by J. Stallo [21] these are expressed as a percentage relative to neutral speech and indicate by how much the default values, relating to the speech rate, pitch average, pitch range and volume, need to be modified. Emotion Speech Rate Pitch ....
J. E. Cahn. Generating expression in synthesized speech. Master's thesis, Massachusetts Institute of Technology, 1989.
....Subtle differences between the subtractive model and the physical speech process, and the lack of pitch and timing cues, create this lack of expression that is important to the human perception of natural speech. It is very difficult to arrive at algorithmic representations of these sorts of cues [16]. 4) Formant and Granular Synthesis: Driving a formant filter with a periodic source of impulses results in an output that is a sum of delayed formant time responses; this suggests a time domain method of generating voiced speech where copies of formant filter impulse responses are overlaid. The ....
....phoneme alphabet rather small (human languages have anywhere from around ten to around 60 phonemes) the resulting bit stream is extremely compact. However, there are not yet widespread models for synthetic speech that allow the presentation of cues to speaker identity [25] or emotional quality [16] in a robust, yet compact, fashion. Speech analysis synthesis is nonetheless an appropriate transmission technique for applications in which these cues are not of paramount importance. 3) Model Parameter Estimation: In theory, any of the synthesis methods described in Section II may be the ....
J. Cahn, "Generating expression in synthesized speech," M.S. thesis, Massachusetts Institute of Technology Media Lab, Cambridge, 1990.
....in the temporal parameters must be seen against the length of the investigated speech samples. It seems to be a popular paradigm in emotional speech research to study one sentence utterances. This explains why pausing behaviour is not taken into account in the tables in [5] 3] and [2] Cowan in [4], however, remarks that fewer and shorter pauses occur in emotional speech, whereas we found more shorter pauses. This difference in results show that pausing strategies, and especially breathing, might be aspects worth studying in an emotional context, as well as tempo, which seems to raise more ....
Cahn, J. Generating Expression in Synthesized Speech. Master's thesis, MIT, Cambridge, MA, 1989.
....The work of Terzopoulus [39] is a typical example where the modeling of facial muscles and tissue enables a variety of facial expressions. Although speech synthesizers are already commercially available, the ability to generate expression modulated speech is still a subject of research ([6]) Finally, a good example of attributing expressiveness to media objects is the work of Yin Yin Wong ( 42] with expressive typography. In this case, text dynamically changes its shape, typeface, color, and screen position in order to convey temporally the expressive dimension of the message. 6 ....
Janet Cahn. Generating expression in synthesized speech. Master's thesis, M.I.T. Media Arts and Sciences Program, May 1989.
....range, loudness and tempo. For example, fear is often manifested by high pitch levels, a wide pitch range, a large number of pitch variations and an increased speaking rate. Such vocal characteristics can be translated into appropriate intonational parameters to model emotion in synthesized speech [15]. In addition to the muscular and vocal manifestations of emotion, faces can also respond by changing color. For example, a face often blushes under embarrassment, excitement or physical exercise, or turns pale when being frightened. Patel s system [102] provides a color parameter as part of the ....
J. Cahn. Generating expression in synthesized speech. Master's thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts, 1989.
....restores half of it; and the lip model restores a third. It should be noted that the evaluation techniques presented here in the context of analysis of linguistic transmission can also be used for para linguistic information (for example, emotion) In the auditory modality, for example, Cahn [22] analyzed confusions in the perception of the emotional content of sentences. In her study, five sentences were synthesized each presented in six emotions in a variety of random orders. The observers then identified which of the six were intended. Similar experiments can be carried out with ....
J. Cahn. Generating expression in synthesized speech. Master's thesis, Massachusetts Institute of Technolog, 1989.
.... features are systematically related to discourse information units corresponding to topic or theme (that is, what the discourse segment is about) and comment or rheme (that is, what novel information the utterance supplies) Listeners may also detect the speaker s affect from prosodic features [Cahn, 1989]. Affects seem to be differentiated mainly by pitch (while frequency is a physical property of a sound, pitch is a subjective one) loudness (the perceived intensity of a sound) pitch contour (the global envelope of the pitch) tempo (rate of speech) and pause [Cahn, 1989] 2.3.2 Notational ....
.... from prosodic features [Cahn, 1989] Affects seem to be differentiated mainly by pitch (while frequency is a physical property of a sound, pitch is a subjective one) loudness (the perceived intensity of a sound) pitch contour (the global envelope of the pitch) tempo (rate of speech) and pause [Cahn, 1989]. 2.3.2 Notational System The notation for intonation contours that we use is derived from J. Pierrehumbert [Pierrehumbert, 1980] We follow [Pierrehumbert and Hirschberg, 1990] Prevost and Steedman, to appear] in assuming that different intonational tunes are used to convey various ....
Cahn, J. (1989). Generating expression in synthesized speech. Master's thesis, MassachusettsInstitute of Technology, Cambridge, Massachusetts.
....tract. Although typically used as a text to speech system, it was chosen over other systems because it gives the user low level control over the vocalizations through physiologically based parameter settings. These parameters make it possible to convey a#ective information through vocalizations (Cahn 1990), and to convey personality by designing a custom voice for the robot. As such, Kismet s voice is that of a young child. The system also has the ability to play back files in a .wav format, so the robot can produce infant like vocalizations (laughter, coos, gurgles, etc. that the synthesizer ....
....to verbally reach for a particular toy by addressing the caregiver with what she interprets as a request for that toy, Kismet has learned to perform an act of meaning. 12.4. 1 Emotive Vocalizations With respect to giving Kismet the ability to express emotive vocalizations, Janet Cahn s work (Cahn 1990) is a valuable resource. In this work, she identifies the acoustical correlates of emotion in speech. These acoustic parameters alter the pitch, timing, voice quality, and articulation of the speech signal. Some of these correlates are due to physiological changes in the body as a result of being ....
Cahn, J. (1990), Generating Expression in Synthesized Speech, Master's thesis, MIT Media Lab.
....its speech rate fast. Sadness, however, is characterized by a low pitch level, a narrow pitch range and very small pitch variations. Its intensity is soft, its mean low, its range narrow and its fluctuations small. Its speech rate is slow with the highest number of pauses of the longest duration [CAH89], LAD85] WIL81] To define the syntactic structure of intonation, we are using Janet Pierrehumbert s notation [HIR86] Under this definition, intonation consists of a linear sequence of accents. Utterances are decomposed into intonational and intermediate phrases. Both of them consist of ....
.... to hesitation) or to signal punctuation marks (such as a comma or exclamation marks) DIT74] The number of pauses affect the speech rate: a sad person has a slow speech rate due in part to a large number of long pauses, while a frightened person s speech shows very few pauses of short duration [CAH89]. Thus the occurrence of punctuators and their type (i.e. their corresponding facial expressions) are emotion dependent: a happy person has the tendency to punctuate his speech by smiling. Certain types of head movements occur during pauses. A boundary point (between intermediate phrases, for ....
J. Cahn, "Generating expression in synthesized speech", Masters Thesis, M.I.T., 1989.
....a great deal of attention whereas the third category has received relatively little. We are studying a component of the third category; we are interested in automatically classifying affect through speech analysis. Although there has been much research to identify acoustic correlates of affect [14, 17, 3, 4, 12, 8], the authors are not aware of any previous work which attempts automatic classification of affect by explicitly modeling these acoustic features. In this paper we report on initial experiments to determine useful acoustic features for automatic affect classification of speech. The task was to ....
Cahn, J.E. Generating expression in synthesized speech. Masters thesis, MIT Media Laboratory, May 1989.
....explores improvements to the affective component of synthesized speech. It is embodied in the Affect Editor program, which is intended to show that variations in affect can be generated in synthetic speech and to point the way towards improving the recognizability and naturalness of the affect [2]. Its success in generating recognizable affect was confirmed by an experiment in which the intended affect was perceived for the majority of presentations. Affect should be a concern in speech synthesis for theoretical and practical reasons. Its role in human speech is to provide the context in ....
Janet E. Cahn. Generating Expression in Synthesized Speech. Master's thesis, Massachusetts Institute of Technology, May 1989. Unpublished.
....in the work on emotional and expressive synthesized speech. Thus far, all the text to speech efforts have attempted to imitate the prosodic output rather than generate it from underlying causes. HAMLET [MAN88] simulates six emotions by rule, using unrestricted text. The Affect Editor [Cah88, Cah89] applies a parameterized model to syntactically analyzed text. Varying its acoustical and linguistic parameters produces many kinds and shadings of vocal affect. The emotion rules in HAMLET and the parameters in the Affect Editor describe the acoustical evidence of the effects of emotion on ....
....in their proper relation. Ultimately, a production model is the better path. Instead of imitating the acoustical features of emotional speech, it will predict, explain and produce them as the emergent consequence of the physiological and cognitive biases and capacities of an individual speaker [Cah89] 4.2 Concept to speech synthesis While text to speech systems must perform a reverse engineering feat from text back to concept, concept to speech systems start with task knowledge and some form of a discourse model (Figure 4 2) Their main challenge is to map attentional and logical ....
Janet E. Cahn. Generating Expression in Synthesized Speech. Master's thesis, Massachusetts Institute of Technology, May 1989. Unpublished.
....program, which takes an abstract description of emotional speech and produces affect generation instructions for a speech synthesizer. Its success in generating recognizable affect was confirmed by an experiment in which the affect intended was perceived as such for the majority of presentations [Cahn (1989)] Affect is desirable in synthesized speech for reasons of naturalness, efficiency and general utility. Hearers expect affect in speech. After all, it is part of human speech. It illuminates the intentions of the speaker and is part of the context in which an utterance is interpreted. Affective ....
Cahn, J. E. (1989). Generating expression in synthesized speech. Master's thesis, Massachusetts Institute of Technology.
No context found.
Cahn, J. E., Generating Expression in Synthesized Speech, Master's Thesis, MIT, 1989. http://www.media.mit.edu/~cahn/masters-thesis.html
No context found.
J. Cahn, "Generating expression in synthesized speech", Master's thesis, Massachusetts Institute of Technology, May 1989.
No context found.
J. E. Cahn. Generating expression in synthesized speech. Master's thesis, Massachusetts Institute of Technology, 1989.
No context found.
Cahn, J.E. 1990. Generating expression in synthesized speech. Technical Report. Boston: MIT Media Lab.
No context found.
Cahn, J. E. (1990). Generating expression in synthesized speech. Technical report, MIT Media Lab., Boston.
No context found.
Cahn, Janet E., Generating Expression in Synthesized Speech. Master's Thesis, Massachusetts Institute of Technology. May, 1989.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC