Results 1 - 10
of
16
The Rise/Fall/Connection Model of Intonation
, 1994
"... This paper describes a new model of intonation for English. The paper proposes that intonation can be described using a sequence of rise, fall and connection elements. Pitch accents and boundary rises are described using rise and fall elements, and connection elements are used to describe everything ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
This paper describes a new model of intonation for English. The paper proposes that intonation can be described using a sequence of rise, fall and connection elements. Pitch accents and boundary rises are described using rise and fall elements, and connection elements are used to describe everything else. Equations can be used to synthesize fundamental frequency (F 0 ) contours from these elements. An automatic labelling system is described which can derive a rise/fall/connection description from any utterance without using prior knowledge or top-down processing. Synthesis and analysis experiments are described using utterances from six speakers of various English accents. An analysis/resynthesis experiment is described which shows that the contours produced by the model are similar to within 3.6 to 7.3 Hz of the originals. An assessment of the automatic labeller shows 72% to 92% agreement between automatic and hand labels. The paper concludes with a comparison between this model and o...
Effects of Tone and Focus on the Formation and Alignment of F0 Contours
, 1999
"... The present study examines how lexical tone and focus contribute to the formation and alignment of f0contours in speech. This was done through an investigation of f0contour formation in short Mandarin sentences. These sentences all consisted of five syllables with varying tones on the middle three s ..."
Abstract
-
Cited by 29 (8 self)
- Add to MetaCart
The present study examines how lexical tone and focus contribute to the formation and alignment of f0contours in speech. This was done through an investigation of f0contour formation in short Mandarin sentences. These sentences all consisted of five syllables with varying tones on the middle three syllables. The sentences were produced by eight Mandarin speakers with four different focus patterns: focus on the first, second, or last word, or with no narrow focus. The f0patterns of these sentences were examined through point-by-point f0tracing, graphical comparison of averaged f0contours, f0-contour-syllable alignment analysis, and analysis of maximum, minimum f0, and slope of f0contours. The results indicate that (a) while the lexical tone of a syllable is the most important determining factor for the local f0contour of the syllable, focus extensively modulates the global shape of the f0curve, which in turn affects the height and even the shape of local contours; (b) the tones of adjacent syllables also extensively influence both the shape and height of the f0contour of a syllable, with the preceding tone exerting more influence than the following tone; (c) despite extensive variations in shape and height, the f0contour of a tone remains closely aligned with the associated syllable; and (d) both focus and tonal interaction may generate substantial f0decline over the course of an utterance. These findings seem to be able to reduce the unpredictability in the formation and alignment of f0contours in speech.
Pitch targets and their realization: Evidence from Mandarin Chinese
, 2001
"... In this paper we propose a preliminary framework for accounting for certain surface F 0 variations in speech. The framework consists of definitions for pitch targets and rules of their implementation. Pitch targets are defined as the smallest operable units associated with linguistically functional ..."
Abstract
-
Cited by 22 (8 self)
- Add to MetaCart
In this paper we propose a preliminary framework for accounting for certain surface F 0 variations in speech. The framework consists of definitions for pitch targets and rules of their implementation. Pitch targets are defined as the smallest operable units associated with linguistically functional pitch units, and they are comparable to segmental phones. The implementation rules are based on possible articulatory constraints on the production of surface F 0 contours. Due to these constraints, the implementation of a simple pitch target may result in surface F 0 forms that only partially reflect the underlying pitch targets. We will also discuss possible implications of this framework on our understanding of various observed F 0 patterns, including carryover and anticipatory variations, downstep, declination, and F 0 peak alignment. Finally, we will consider possible interactions between local and non-local pitch targets. 1.0 Introduction To understand the acoustic manifestation of s...
A Phonetic Model of English Intonation
, 1992
"... This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operat ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operate. It is also argued that this model should be capable of both analysis (F 0 to phonology) and synthesis (phonology to F 0 ). Existing phonetic models are reviewed and it is shown that none meet the specification for the type of formal model required. A new phonetic model is presented that has three levels of description: the F 0 level, the intermediate level and the phonological level. The intermediate level uses the three basic elements of rise, fall and connection to model F 0 contours. A mathematical equation is specified for each of these elements so that a continuous F 0 contour can be created from a sequence of elements. The phonological system uses H and L to describe high and low pi...
Generating F0 Contours For Speech Synthesis Using The Tilt Intonation Theory
- In Proceedings of ESCA Workshop on Intonation
, 1997
"... This paper presents a method for generating F 0 contours for a speech synthesis system using the Tilt intonation theory ([10], [9]). The Tilt theory offers an abstract description of natural F 0 contours which may be derived automatically from natural speech. Given a speech database labelled with Ti ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
This paper presents a method for generating F 0 contours for a speech synthesis system using the Tilt intonation theory ([10], [9]). The Tilt theory offers an abstract description of natural F 0 contours which may be derived automatically from natural speech. Given a speech database labelled with Tilt events, this paper shows how that data may be used to train a model which can adequately predict Tilt parameters from features available in a text to speech system and hence produce natural sounding F 0 contours. After a short description of the Tilt theory, the database used and the necessary features used to generate the parameters are presented. For comparison, this work is contrasted with a previous similar experiment on the same database using the ToBI intonation labelling system [2]. The Tilt method not only produces better results (RMSE 32.5 and correlation 0.60) but as it offers automatic labelling of data, it promises the ability to more easily train from general speech databases...
Variation Adds to Prosodic Typology
, 2002
"... Variation has not been a major concern of prosodic typologists. Frequently, it is treated as noise in the data and held to conceal what is really important about the prosodic structure of the language. Consequently, most investigations are restricted to a single standard variety and cross-speaker va ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Variation has not been a major concern of prosodic typologists. Frequently, it is treated as noise in the data and held to conceal what is really important about the prosodic structure of the language. Consequently, most investigations are restricted to a single standard variety and cross-speaker variation is ignored or masked by statistical processing. The results are often assumed to be representative of the language as a whole. Recent research challenges this approach. Acoustic correlates of rhythm class, for instance, show that dialects of one language can differ as much in their rhythmic structures as two different languages. One dialect can be classified as `stress-timed' and the other as `syllable-timed'. Furthermore, considerable cross-speaker variation occurs within dialects. In this paper, I review a selection of data on prosodic variation across dialects and speakers. Then I present data on intonational variation. Examination of cross-speaker and cross-dialect variation in these data leads to new results on dialect-specific characteristics of intonation as well as to cross-dialect and cross-language generalisations.
Towards a Comprehensive Investigation of Factors Relevant to Peak Alignment Using a Unit Selection Corpus
- Proc. Interspeech Pittsburgh
"... This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthesis system [1], in carrying out a comprehensive investigation of factors influencing specific aspects of the phonetic realization of tonal categories. The study restricts itself to the alignment of peaks ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
This paper aims to demonstrate the use of a unit selection corpus, the IMS German Festival synthesis system [1], in carrying out a comprehensive investigation of factors influencing specific aspects of the phonetic realization of tonal categories. The study restricts itself to the alignment of peaks in H*L pitch accents in German. First results confirm not only well-known effects of syllable structure, e.g., peaks occurring relatively early when there is a sonorant onset or relatively late when there is a sonorant in the coda, but also attest to the special status of the nuclear pitch accent vs. accents occurring earlier in the intonation phrase. Furthermore, instances of H*L in syllables directly at the phrase boundaries (initial or final) are shown to behave significantly differently from those that are located farther away. A similar effect is observed when another pitch accent follows the H*L peak in the very next syllable as opposed to a distance of two or more syllables. In these cases it also matters whether a low or high target is following (the peaks occur relatively later when followed by a L target). The results should have the benefit of both describing the specific characteristics of the voice providing the corpus (allowing a more detailed phonetic realization of tonal categories during the synthesis process) and offering general insights into which factors are relevant to the alignment of H*L peaks in German. Index Terms: intonation synthesis, peak alignment, German 1.
F0 Peak Delay: When, Where, and Why It Occurs
"... Peak delay refers to the phenomenon that an F 0 peak sometimes occurs after the syllable it is associated with. This study investigates the relationship between tone, speaking rate and peak delay in Mandarin. Sentences containing H, R, or weakened H (h) were recorded at normal, fast and slow speakin ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Peak delay refers to the phenomenon that an F 0 peak sometimes occurs after the syllable it is associated with. This study investigates the relationship between tone, speaking rate and peak delay in Mandarin. Sentences containing H, R, or weakened H (h) were recorded at normal, fast and slow speaking rates. At normal rate, peak delay occurred regularly in both R and h but only occasionally in H; at slow rate, peak delay continued to occur regularly in R but only occasionally in h and rarely in H; at fast rate, peak delay occurred not only regularly in R and h, but also frequently in H. Peak-alignment analyses revealed that peak delay tended to occur whenever F 0 rose sharply near the syllable offset. The results were interpreted as indicating that peak delay is due to an articulatory constraint that limits how fast the larynx can reverse the direction of pitch movement. 1. INTRODUCTION If a tone, pitch accent, stress or focal prominence is carried by a syllable, by the simplest assum...
The Acquisition of Prosody in Speech Production: English and French
, 1998
"... Introduction 2. The corpus 2.1 Method 2.2 Materials 2.2.1 `Elephants Never Forget' 2.2.2 `Furnish the House' 2.2.3 `The Treasure Chest of Tales' 2.3 Subjects 2.4 Recording and digitisation procedures 2.5. Prosodic transcription 2.5.1. Overview of the labelled files 3. Additional data collected for ..."
Abstract
- Add to MetaCart
Introduction 2. The corpus 2.1 Method 2.2 Materials 2.2.1 `Elephants Never Forget' 2.2.2 `Furnish the House' 2.2.3 `The Treasure Chest of Tales' 2.3 Subjects 2.4 Recording and digitisation procedures 2.5. Prosodic transcription 2.5.1. Overview of the labelled files 3. Additional data collected for the investigation of French intonation 4. Findings 4.1 The phonetic realisation of rhythmic patterns in acquisition 4.1.1 Method 4.1.2 Results 4.2 The phonetic realisation of fundamental frequency falls in acquisition 4.2.1 Method 4.2.2 Results 4.3 The phonetic realisation of rises and falls in French adult speakers 4.3.1 Method 4.3.2 Results 4.3.3 Summary 5. Evaluation 5.1 Recruitment of subjects 5.2 Digital Signal Processing 5.3 Cross-linguistic differences in children's responses to the games 6. Future research and research grant applications 7. Non-technical Summary 8. References Appendix - Experimental Materials 3 The
Generating F0 Contours For Speech Synthesis From Prosodic And Syllabic Content
, 1997
"... This paper describes a method for generating F 0 contours from utterances labelled using the Tilt intonation theory. [8] [9] The method uses classification and regression trees (CART) to predict the five Tilt parameters: starting F 0 , amplitude, duration, tilt, and peak position. The goal of the e ..."
Abstract
- Add to MetaCart
This paper describes a method for generating F 0 contours from utterances labelled using the Tilt intonation theory. [8] [9] The method uses classification and regression trees (CART) to predict the five Tilt parameters: starting F 0 , amplitude, duration, tilt, and peak position. The goal of the experiment is to predict the parameters such that natural intonation contours may be generated from them. Contours generated by this method from a test subset of an American English database have a correlation of 0.60 and a 32.5Hz RMS error when compared with smoothed versions of the original F 0 contour. These results are comparable to other F 0 generation methods which use ToBI intonation labels (0.62 and 34.8Hz, 33Hz). 1. INTRODUCTION One of the tasks involved in speech synthesis is the generation of prosody. As synthetic speech quality advances, adjustments in pitch and duration represent steps toward more natural speech. The experiment discussed here presents one method of generating na...

