Results 1 -
9 of
9
The Rise/Fall/Connection Model of Intonation
, 1994
"... This paper describes a new model of intonation for English. The paper proposes that intonation can be described using a sequence of rise, fall and connection elements. Pitch accents and boundary rises are described using rise and fall elements, and connection elements are used to describe everything ..."
Abstract
-
Cited by 30 (6 self)
- Add to MetaCart
This paper describes a new model of intonation for English. The paper proposes that intonation can be described using a sequence of rise, fall and connection elements. Pitch accents and boundary rises are described using rise and fall elements, and connection elements are used to describe everything else. Equations can be used to synthesize fundamental frequency (F 0 ) contours from these elements. An automatic labelling system is described which can derive a rise/fall/connection description from any utterance without using prior knowledge or top-down processing. Synthesis and analysis experiments are described using utterances from six speakers of various English accents. An analysis/resynthesis experiment is described which shows that the contours produced by the model are similar to within 3.6 to 7.3 Hz of the originals. An assessment of the automatic labeller shows 72% to 92% agreement between automatic and hand labels. The paper concludes with a comparison between this model and o...
The Computational Processing of Intonational Prominence: A Functional Prosody Perspective
, 1997
"... Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two imp ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Intonational prominence, or accent, is a fundamental prosodic feature that is said to contribute to discourse meaning. This thesis outlines a new, computational theory of the discourse interpretation of prominence, from a FUNCTIONAL PROSODY perspective. Functional prosody makes the following two important assumptions: first, there is an aspect of prominence interpretation that centrally concerns discourse processes, namely the discourse focusing nature of prominence; and second, the role of prominence in language processing in general, and discourse processing in particular, is not essentially separate from the processing of other grammatical, nonprosodic information. This thesis develops a computational theory of prominence interpretation by explaining how prominence serves as an inference cue in discourse processing. Prominence signals changes in the attentional status of entities in a discourse model, while nonprominence signals that the realized entities are already in discourse fo...
A Phonetic Model of English Intonation
, 1992
"... This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operat ..."
Abstract
-
Cited by 14 (6 self)
- Add to MetaCart
This thesis proposes a phonetic model of English intonation which is a system for linking the phonological and F 0 descriptions of an utterance. It is argued that such a model should take the form of a rigorously defined formal system which does not require any human intuition or expertise to operate. It is also argued that this model should be capable of both analysis (F 0 to phonology) and synthesis (phonology to F 0 ). Existing phonetic models are reviewed and it is shown that none meet the specification for the type of formal model required. A new phonetic model is presented that has three levels of description: the F 0 level, the intermediate level and the phonological level. The intermediate level uses the three basic elements of rise, fall and connection to model F 0 contours. A mathematical equation is specified for each of these elements so that a continuous F 0 contour can be created from a sequence of elements. The phonological system uses H and L to describe high and low pi...
Prosodic Word Boundary Detection Using Statistical Modeling Of Moraic Fundamental Frequency Contours And Its Use For Continuous Speech Recognition
- in Proc. ICASSP
, 1999
"... A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F0) contours, formerly proposed by the authors. In the developed method, F0 contours of prosodic words were modeled separately accord ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F0) contours, formerly proposed by the authors. In the developed method, F0 contours of prosodic words were modeled separately according to the accent types. An input utterance was matched against the models and was divided into constituent prosodic words. By doing so, prosodic word boundaries can be obtained. The method was first applied to the boundary detection experiments of ATR continuous speech corpus. With mora boundary locations given in the corpus, total detection rate reached 91.5 %. Then the method was integrated into a continuous speech recognition scheme with unlimited vocabulary. A few percentage improvement was observed in mora recognition for the above corpus. Although all the experiments done in closed conditions due to the corpus availability, the results indicated the usefulness of the proposed method. ...
A Study On Pitch Pattern Generation Using Hmm-Based Statistical Information
- in Proc. ICSLP-94
, 1994
"... This paper describes a novel pitch pattern generation method for speech synthesis using Hidden Markov Models (HMMs). In the proposed method, the F0 contours of minor phrase are modeled by HMMs (pitch-HMMs). The pitch-HMMs are trained using F0 and 1F0 considering phonetic environments (e.g. accent ty ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper describes a novel pitch pattern generation method for speech synthesis using Hidden Markov Models (HMMs). In the proposed method, the F0 contours of minor phrase are modeled by HMMs (pitch-HMMs). The pitch-HMMs are trained using F0 and 1F0 considering phonetic environments (e.g. accent type, mora count, mora position, phonemic category, etc.). To evaluate the pitch-HMMs, accent identification experiments are performed. The results indicate that the pitch-HMMs can capture the movement in F0 contours appropriately. In the F0 contour generation experiments, the proposed method yields an averaged root mean square error of 132cent (equivalent to 9.2Hz at 120Hz) between the original and the generated F0 contours. Furthermore, an application of the proposed method to text-to-speech system is also discussed. 1. INTRODUCTION A good generation model of fundamental frequency (F0 contours) is essential for speech systems. Recently, several methods of F0 contour modeling based on stati...
Automatic generation of prosody: comparing two superpositional systems
- In Proceedings of Speech Prosody 2004
, 2004
"... We face many options when designing a system that automatically generates prosody from linguistic and paralinguistic information. The literature provides several candidate phonetic models, phonological models and mapping tools to actually implement the system. We detail here some dimensions along wh ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
We face many options when designing a system that automatically generates prosody from linguistic and paralinguistic information. The literature provides several candidate phonetic models, phonological models and mapping tools to actually implement the system. We detail here some dimensions along which these models have to be compared. We show also that systems employing quite similar phonetic models can still have radically different approaches. We present results of a first evaluation comparing two systems using a superpositional model of melody on a common multilingual prosodic database of spoken math formulae. We conclude that prosodic models and intonation theories could certainly benefit from well-defined tasks and fair benchmarks. 1.
Issues in Thai text-to-speech synthesis: the NECTEC approach
- Proc. of NECTEC Annual Conference, Bangkok
, 2000
"... ABSTRACT – This paper presents all the essential issues in developing the text-to-speech synthesis for Thai- text analysis, prosody generation and speech synthesis. In the text analysis, problems in Thai text processing can be decomposed into the models of sentence extraction, phrase boundary determ ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
ABSTRACT – This paper presents all the essential issues in developing the text-to-speech synthesis for Thai- text analysis, prosody generation and speech synthesis. In the text analysis, problems in Thai text processing can be decomposed into the models of sentence extraction, phrase boundary determination and grapheme-to-phoneme conversion. The syllable duration and F0 contour generation rules are included in the prosody generation. This is to realize the synthetic speech in the suprasegmental level. In the speech synthesis, the definition and the construction of acoustic inventory structure ‘demisyllable ’ are presented. Furthermore, three signal-processing algorithms, amplitude normalization, the segment boundary smoothing and prosodic modification, are also presented in this topic. KEY WORDS-- Thai text-to-speech synthesis, text analysis, prosody generation, speech synthesis, demisyllable
Phonetic Representations for Intonation
"... Introduction Besides its intrinsic ability to help the listener in segmenting utterances into linguistically-relevant parts of speech, intonation is an essential means for signaling ": : : of how we feel about what we say, or how we feel when we say." [Bol89, p.1]. How the cognitive representation ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Introduction Besides its intrinsic ability to help the listener in segmenting utterances into linguistically-relevant parts of speech, intonation is an essential means for signaling ": : : of how we feel about what we say, or how we feel when we say." [Bol89, p.1]. How the cognitive representation of the utterance is encoded in the speech signal is still an open question. Two main approaches may contribute to our understanding of how intonation contributes to both the capture of the overall meaning of the message and the speaker's position vis-`a-vis its own discourse: (a) a bottom-up approach aims to extract salient prosodic events with no linguistic a-priori. Such tentative approaches, linking these events to phonological constructs, face the problem of automatic extraction [Mer93, HNE91], the perceptual relevance of these events [tCC90, HR94] and the coherence of labelling [SBP + 92, HdCng], (b) a top-down a
Survey of Data-Driven Approaches to Speech Synthesis
, 1998
"... this paper, we examine the use of data-driven and machine-learning approaches for speech synthesis. We begin with an overview of a complete speech synthesis system which can be divided into two main subsystems: a natural language processing subsystem and a speech signal processing subsystem. We then ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper, we examine the use of data-driven and machine-learning approaches for speech synthesis. We begin with an overview of a complete speech synthesis system which can be divided into two main subsystems: a natural language processing subsystem and a speech signal processing subsystem. We then describe, in turn, the major components within the two subsystems and examine some of the different approaches that have been explored. In the main portion of the paper, we focus on the speech signal processing subsystem. Here, we describe, compare, and analyze the use of different data-driven and machine-learning approaches as exemplified by appropriate components of the following three speech synthesis systems: Bell Labs' synthesizer, Microsoft's Whistler, and ATR's Chatr. The Bell Labs system can be characterized as more of a traditional diphone-based approach to concatenative synthesis while the Whistler and Chatr systems are representative of the more recent trend towards increasing use of data-driven and machine-learning methods. We finally conclude with some discussion and observations. Many references are used throughout the paper, but the primary references used for the three speech synthesis systems are the following: Bell Labs

