• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

The Rise/Fall/Connection Model of Intonation (1994)

by Paul Taylor
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 22
Next 10 →

Analysis and Synthesis of Intonation using the Tilt Model

by Paul Taylor - Journal of the Acoustical Society of America
"... This paper introduces the tilt intonational model and describes how this model can be used to automatically analyse and synthesize intonation. In the model, intonation is represented as a linear sequence of events, which can be pitch accents or boundary tones. Each event is characterised by continuo ..."
Abstract - Cited by 68 (3 self) - Add to MetaCart
This paper introduces the tilt intonational model and describes how this model can be used to automatically analyse and synthesize intonation. In the model, intonation is represented as a linear sequence of events, which can be pitch accents or boundary tones. Each event is characterised by continuous parameters representing amplitude, duration and tilt (a measure of the shape of the event). The paper describes a event detector, in effect an intonational recognition system, which produces a transcription of an utterance's intonation. The features and parameters of the event detector are discussed and performance figures are shown on a variety of read and spontaneous speaker independent conversational speech databases. Given the event locations, algorithms are described which produce an automatic analysis of each event in terms of the Tilt parameters. Synthesis algorithms are also presented which generate F0 contours from Tilt representations. The accuracy of these is shown by comparing...

Characterizing and Recognizing Spoken Corrections in Human-Computer Dialogue

by Gina-anne Levow - In Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics, COLING/ACL 98 , 1998
"... Miscommunication in speech recognition systems is unavoidable, but a detailed characterization of user corrections will enable speech systems to identify when a correction is taking place and to more accurately recognize the content of correction utterances. In this paper we investigate the adaptati ..."
Abstract - Cited by 51 (5 self) - Add to MetaCart
Miscommunication in speech recognition systems is unavoidable, but a detailed characterization of user corrections will enable speech systems to identify when a correction is taking place and to more accurately recognize the content of correction utterances. In this paper we investigate the adaptations of users when they encounter recognition errors in interactions with a voice-in/voice-out spoken language system. In analyzing more than 300 pairs of original and repeat correction utterances, matched on speaker and lexical content, we found overall increases in both utterance and pause duration from original to correction. Interestingly, corrections of misrecognition errors (CME) exhibited significantly heightened pitch variability, while corrections of rejection errors (CRE) showed only a small but significant decrease in pitch minimum. CME's demonstrated much greater increases in measures of duration and pitch variability than CRE's. These contrasts allow the development of decision t...

Intonation and Dialogue Context as Constraints for Speech Recognition

by Paul Taylor , Simon King, Stephen Isard, Helen Wright - LANGUAGE AND SPEECH , 1998
"... This paper describes a way of using intonation and dialogue context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialogue speech. This corpus has been tagged according to a ..."
Abstract - Cited by 29 (4 self) - Add to MetaCart
This paper describes a way of using intonation and dialogue context to improve the performance of an automatic speech recognition (ASR) system. Our experiments were run on the DCIEM Maptask corpus, a corpus of spontaneous task-oriented dialogue speech. This corpus has been tagged according to a dialogue analysis scheme that assigns each utterance to one of 12 "move types", such as "acknowledge", "query-yes/no" or "instruct". Most asr systems use a bigram language model to constrain the possible sequences of words that might be recognised. Here we use a separate bigram language model for each move type. We show that when the "correct" move-specific language model is used for each utterance in the test set, the word error rate of the recogniser drops. Of course

Synthesizing Conversational Intonation from a Linguistically Rich Input

by Paul Taylor, Alan W. Black - In Proc. ESCA Workshop on Speech Synthesis , 1994
"... This paper describes a general system which maps from a phonological specification of an utterance 's intonation to a F 0 contour. The system can accomodate a variety of feature based phonological description schemes. Speaker dependent characteristics can be modelled and an automatic method of deter ..."
Abstract - Cited by 21 (11 self) - Add to MetaCart
This paper describes a general system which maps from a phonological specification of an utterance 's intonation to a F 0 contour. The system can accomodate a variety of feature based phonological description schemes. Speaker dependent characteristics can be modelled and an automatic method of determining these is described. 1 Introduction This paper describes the phonetic part of the intonation component of the Chatr speech synthesizer developed at ATR. Chatr is a concept-to-speech system which is used as the final stage in a spoken language translation system [2]. The input to Chatr is not text-based, but rather a complex linguistic description of an utterance that is produced from the language generation component of the translation system, and so reliable syntactic, semantic and pragmatic information is available for each utterance to be synthesized. The utterances are part of a conversation, so it is not acceptable to produce "neutral declarative" intonation: a much wider range ...

The tilt intonation model

by Paul Taylor - Proc. ICSLP 98 , 1998
"... The tilt intonation model facilitates automatic analysis and synthesis of intonation. The analysis algorithm detects intonational events in F0 contours and parameterises them in terms of the continuously varying Tilt parameters. We describe the analysis system and give results for speaker independen ..."
Abstract - Cited by 14 (2 self) - Add to MetaCart
The tilt intonation model facilitates automatic analysis and synthesis of intonation. The analysis algorithm detects intonational events in F0 contours and parameterises them in terms of the continuously varying Tilt parameters. We describe the analysis system and give results for speaker independent spontaneous dialogue speech. We then describe a synthesis algorithm which can generate F0 contours given a tilt parameterisation of an utterance. We give results showing how well the automatically produced contours match natural ones. The paper concludes with a discussion of the linguistic relevance of the tilt parameters and show that this is both a useful and natural way of representing intonation. 1.

Generating F0 Contours For Speech Synthesis Using The Tilt Intonation Theory

by Kurt Dusterhoff, Alan W. Black - In Proceedings of ESCA Workshop on Intonation , 1997
"... This paper presents a method for generating F 0 contours for a speech synthesis system using the Tilt intonation theory ([10], [9]). The Tilt theory offers an abstract description of natural F 0 contours which may be derived automatically from natural speech. Given a speech database labelled with Ti ..."
Abstract - Cited by 11 (3 self) - Add to MetaCart
This paper presents a method for generating F 0 contours for a speech synthesis system using the Tilt intonation theory ([10], [9]). The Tilt theory offers an abstract description of natural F 0 contours which may be derived automatically from natural speech. Given a speech database labelled with Tilt events, this paper shows how that data may be used to train a model which can adequately predict Tilt parameters from features available in a text to speech system and hence produce natural sounding F 0 contours. After a short description of the Tilt theory, the database used and the necessary features used to generate the parameters are presented. For comparison, this work is contrasted with a previous similar experiment on the same database using the ToBI intonation labelling system [2]. The Tilt method not only produces better results (RMSE 32.5 and correlation 0.60) but as it offers automatic labelling of data, it promises the ability to more easily train from general speech databases...

A Multilingual Prosodic Database

by Estelle Campione, Jean Véronis - Proc. of ICSLP'98 , 1998
"... We present a prosodic corpus in five languages (French, English, Italian, German and Spanish) comprising 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based are extracted from the EUROM 1 speech database ..."
Abstract - Cited by 11 (1 self) - Add to MetaCart
We present a prosodic corpus in five languages (French, English, Italian, German and Spanish) comprising 4 hours and 20 minutes of speech and involving 50 different speakers (5 male and 5 female per language). The recordings on which the corpus is based are extracted from the EUROM 1 speech database and consists of passages of about five sentences. The corpus was stylized automatically by an algorithm which factors out microprosodic effects and represents the intonation contour of utterances by a series of target points. Once interpolated by a smooth curve (spline), these points produce a contour undistinguishable from the original when re-synthesized, apart from a few detection errors. A symbolic coding of the 50000 pitch movements of the corpus is also provided, along with the time-alignment of orthographic transcription to signal at wordlevel. The entire corpus was verified and manually corrected by experts for each language. It will be made available at production cost for research...

A Stochastic Model Of Intonation For Text-To-Speech Synthesis

by Jean Veronis, Philippe Di Cristo, Fabienne Courtois, Cédric Chaumette - Proceedings Eurospeech '97 (Rhodes , 1998
"... This paper presents a stochastic model of intonation contours for use in text-to-speech synthesis. The model has two modules, a linguistic module that generates abstract prosodic labels from text, and a phonetic module that generates an F 0 curve from the abstract prosodic labels. This model differs ..."
Abstract - Cited by 5 (2 self) - Add to MetaCart
This paper presents a stochastic model of intonation contours for use in text-to-speech synthesis. The model has two modules, a linguistic module that generates abstract prosodic labels from text, and a phonetic module that generates an F 0 curve from the abstract prosodic labels. This model differs from previous work in the abstract prosodic labels used, which can be automatically derived from the training corpus. This feature makes it possible to use large 1 This paper is based on a communication presented at Eurospeech'97 (Vronis et al. 1997) and has been recommended by the Editorial Board of Speech Communication. 2 corpora or several corpora of different speech styles, in addition to making it easy to adapt to new languages. The present paper focuses on the linguistic module, which does not require full syntactic analysis of the text but simply relies on part-of-speech tagging. The results were validated on French by means of a perception test. Listeners did not perceive a signif...

A Statistical Study of Pitch Target Points in Five Languages

by Estelle Campione, Jean Véronis - In Proceedings of ICSLP’98 , 1998
"... We present the results of a large-scale statistical study of pitch target points in five languages, on a corpus comprising 4 hours 20 minutes of speech and involving 50 different speakers. The entire corpus has been stylized automatically by a technique reducing the F 0 contour to a series of target ..."
Abstract - Cited by 5 (4 self) - Add to MetaCart
We present the results of a large-scale statistical study of pitch target points in five languages, on a corpus comprising 4 hours 20 minutes of speech and involving 50 different speakers. The entire corpus has been stylized automatically by a technique reducing the F 0 contour to a series of target points representing the significant pitch changes. It was then entirely verified by experts using a resynthesis method, in order to ensure that there was no audible difference with the original. The set of ca. 50000 pitch target points thus obtained was then analyzed from a statistical point of view. In this paper we describe the main results of this study, in terms of frequency distribution of target points, pitch movements and relation of pitch movements to time interval. Our study reveals interesting differences across languages and sex. 1. INTRODUCTION Large sets of prosodic data on many languages would be a useful resource for theoretical studies, as well as for practical application...

Using Neural Networks To Locate Pitch Accents

by Paul Taylor - In Proc. of 4th European Conference on Speech Communication and Technology. Madrid , 1995
"... This paper descirbes a technique for finding intonatioanl events, (pitch accents and boundary tones) from waveforms. The technique works in a bottom-up manner by using a recurrent neural network to perform a classification of each frame in the input waveform. An autosegmental description, consisting ..."
Abstract - Cited by 5 (1 self) - Add to MetaCart
This paper descirbes a technique for finding intonatioanl events, (pitch accents and boundary tones) from waveforms. The technique works in a bottom-up manner by using a recurrent neural network to perform a classification of each frame in the input waveform. An autosegmental description, consisting of intonational events, syllables and the links between them, is then produced from this frame-based classification. The technique correctly identifies 85.7% of pitch accents and boundary tones. 1. INTRODUCTION In order to use prosodic information in speech recognition, it is necessary to have algorithms which can automatically extract prosodic information from speech waveforms. This paper describes a technique for automatically extracting a representation of an utterance's intonation from its waveform. The rise/fall/connection (RFC) labelling system [9] achieves good results on a low level intonation labelling task. However, this system falls short of producing the type of output that is ...
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University