Results 1 -
4 of
4
ToBI Or NotToBI?
, 2002
"... In the decade that has passed since theintro7qbA4] o the Toe systemfo the transcriptio o pronscr speech technoFWF hasmob: o o the laboR7]bA andinto coo7:47b applicatio5 o several froral Horalb virtually nor o the co]45WbA? pro]45W have made large-scale useo probF5: Nevertheless, researchers inboF re ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In the decade that has passed since theintro7qbA4] o the Toe systemfo the transcriptio o pronscr speech technoFWF hasmob: o o the laboR7]bA andinto coo7:47b applicatio5 o several froral Horalb virtually nor o the co]45WbA? pro]45W have made large-scale useo probF5: Nevertheless, researchers inboF recoF7bA?? and synthesis cohesis to agree that betterutilizatio o proliz is essentialto improalb theperfo5:bA? and acceptabilityo coceptabi systems. In this paper, we review the current state o proteb in co:5?4bA? systems, and examineho the ohebFW discussioA relatedto what and hoto transcribe with respectto proctb have simultaneoq57 advanced and inhibited the field. In particular, we argue that, in hindsight, the Tob systemcotemb: several flaws that have limited its acceptance andapplicatio4 1.
Automatic generation of prosody: comparing two superpositional systems
- In Proceedings of Speech Prosody 2004
, 2004
"... We face many options when designing a system that automatically generates prosody from linguistic and paralinguistic information. The literature provides several candidate phonetic models, phonological models and mapping tools to actually implement the system. We detail here some dimensions along wh ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We face many options when designing a system that automatically generates prosody from linguistic and paralinguistic information. The literature provides several candidate phonetic models, phonological models and mapping tools to actually implement the system. We detail here some dimensions along which these models have to be compared. We show also that systems employing quite similar phonetic models can still have radically different approaches. We present results of a first evaluation comparing two systems using a superpositional model of melody on a common multilingual prosodic database of spoken math formulae. We conclude that prosodic models and intonation theories could certainly benefit from well-defined tasks and fair benchmarks. 1.
Evidence for attractors in English intonation.
"... B. Braun, JASA Although the pitch of the human voice is continuously variable, some linguists contend that intonation in speech is restricted to a small, limited set of pat-terns. We test this claim by asking subjects to mimic a block of 100 randomly generated intonation contours and then to imitate ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
B. Braun, JASA Although the pitch of the human voice is continuously variable, some linguists contend that intonation in speech is restricted to a small, limited set of pat-terns. We test this claim by asking subjects to mimic a block of 100 randomly generated intonation contours and then to imitate themselves in several succes-sive sessions. The produced f0 contours gradually converge towards a limited set of distinct, previously recognized basic English intonation patterns. These patterns are ‘attractors ’ in the space of possible intonation English contours. The convergence does not occur immediately. Seven of the ten participants show continued convergence toward their attractors after the first iteration. Subjects retain and use information beyond phonological contrasts, suggest-ing that intonational phonology is not a complete description of their mental representation of intonation.
Capturing data and realistic 3D models for cued speech analysis and audiovisual synthesis. Auditory-Visual Speech Processing Workshop
, 2005
"... We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys (“dike ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We have implemented a complete text-to-speech synthesis system by concatenation that addresses French Manual Cued Speech (FMCS). It uses two separate dictionaries, one for multimodal diphones with audio and facial articulation, and the other with the gestures between two consecutive FMCS keys (“dikeys”). Dictionaries were built from real data. This paper presents our methodology and the final results, illustrated by the accompanying videos. We recorded and analyzed the 3D trajectories of 50 hand and 63 facial fleshpoints during the production of 238 utterances carefully designed to cover all possible diphones of French. Linear and non-linear statistical models of hand and face deformations and postures were developed using both separate and joint corpora. Additional data allowed us to capture the shape of the hand and face with a higher spatial density (2,600 points for the hand and forearm and 2,000 for the face), as well as their appearance. We succeeded in building new high-density articulated models that were compatible with the previous emerging set of control parameters. This allows the outputted synthesis parameters to drive the more realistic 3D models instead of the low-density ones. 1.

