Results 1 - 10
of
21
Parallel Encoding of Focus and Interrogative Meaning in Mandarin Intonation
"... Abstract Despite much research, disagreements abound regarding the detailed characteristics of question intonation in different languages or even in the same language. The present study investigates question intonation in Mandarin by also considering the role of focus that is frequently ignored in p ..."
Abstract
-
Cited by 20 (12 self)
- Add to MetaCart
Abstract Despite much research, disagreements abound regarding the detailed characteristics of question intonation in different languages or even in the same language. The present study investigates question intonation in Mandarin by also considering the role of focus that is frequently ignored in previous research. In Experiment 1, native speakers of Mandarin produced statements, yes/no questions, particle questions, wh-questions, rhetorical questions and confirmation questions with narrow focus on the initial, medial or final word of the sentence, or on none of the words. Detailed F0 contour analyses showed that focus generated the same pitch range modification in questions as in statements, i.e., expanding the pitch range of the focused word, suppressing (compressing and lowering) that of the post-focus words, but leaving that of the pre-focus words largely unaffected. When the effects of focus (as well as other functions also potentially present) was controlled by subtracting statement F0 contours from those of corresponding yes/no questions, the resulting difference curves resembled exponential or even doubleexponential functions. Further F0 analyses also revealed an interaction between focus and interrogative meaning in the form of a boost to the pitch raising by question starting from
Closely related languages, different ways of realizing focus
- In Proceedings of Interspeech 2009
, 2009
"... We investigated how focus was prosodically realized in Taiwanese, Taiwan Mandarin and Beijing Mandarin by monolingual and bilingual speakers. Acoustic analyses showed that all speakers raised pitch and intensity of focused words, but only Beijing Mandarin speakers lowered pitch and intensity of post ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
We investigated how focus was prosodically realized in Taiwanese, Taiwan Mandarin and Beijing Mandarin by monolingual and bilingual speakers. Acoustic analyses showed that all speakers raised pitch and intensity of focused words, but only Beijing Mandarin speakers lowered pitch and intensity of post-focus words. Cross-group differences in duration were mixed. When listening to stimuli from their own language groups, subjects from Beijing had over 80 % focus recognition rate, while those from Taiwan had less than 70% recognition rate. This difference is mainly due to presence/absence of post-focus compression. These findings have implications for prosodic typology, language contact and bilingualism. Index Terms: focus, language contact, bilingualism 1.
Extraction of pragmatic and semantic salience from spontaneous spoken English
, 2005
"... This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. Thi ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
This paper computationalizes two linguistic concepts, contrast and focus, for the extraction of pragmatic and semantic salience from spontaneous speech. Contrast and focus have been widely investigated in modern linguistics, as categories that link intonation and information/discourse structure. This paper demonstrates the automatic tagging of contrast and focus for the purpose of robust spontaneous speech understanding in a tutorial dialogue system. In particular, we propose two new transcription tasks, and demonstrate automatic replication of human labels in both tasks. First, we define focus kernel to represent those words that contain novel information neither presupposed by the interlocutor nor contained in the precedent words of the utterance. We propose detecting the focus kernel based on a word dissimilarity measure, part-of-speech tagging, and prosodic measurements including duration, pitch, energy, and our proposed spectral balance cepstral coefficients. In order to measure the word dissimilarity, we test a linear combination of ontological and statistical dissimilarity measures previously published in the computational linguistics literature. Second, we propose identifying symmetric contrast, which consists of a set of words that are parallel or symmetric in linguistic structure but distinct or contrastive in meaning. The symmetric contrast identification is performed in a way similar to the focus kernel detection. The effectiveness of the proposed extraction of symmetric contrast and focus kernel has been tested on a Wizard-of-Oz corpus collected in the tutoring dialogue scenario. The corpus consists of 630 non-single word/phrase utterances, containing approximately 5700 words and 48 minutes of speech. The tests used
Prosodic Encoding of Topic and Focus in Mandarin
- In Proceedings of Speech Prosody 2006
, 2006
"... Abstract In this study, we investigate whether and how focus and topic can be separately encoded in Mandarin. A total of 60 sentences with three lengths and five tone combinations were recorded in four topic-focus conditions: initial focus, new topic, implicit topic and given topic, by six speakers ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract In this study, we investigate whether and how focus and topic can be separately encoded in Mandarin. A total of 60 sentences with three lengths and five tone combinations were recorded in four topic-focus conditions: initial focus, new topic, implicit topic and given topic, by six speakers. The results of acoustic analysis show that new topic is encoded with a raised pitch range on the initial word. Focus, in contrast, is encoded with an expanded pitch range on the focused word and a suppressed pitch range on the subsequent words.
Principles of tone research
- International Symposium on Tonal Aspects of Language, La
"... Current advances in tone research are rather uneven. The major obstacles to faster progress in the area are not in the lack of technological means, but in the mindset of our discipline. This paper discusses ways to improve tone research by considering a set of basic principles both in research metho ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Current advances in tone research are rather uneven. The major obstacles to faster progress in the area are not in the lack of technological means, but in the mindset of our discipline. This paper discusses ways to improve tone research by considering a set of basic principles both in research methodology and in theoretical thinking. 1.
Speech prosody as articulated communicative functions
- In Proceedings of Speech Prosody 2006
"... Speech prosody, just like the segmental aspect of speech, conveys communicative meanings by encoding functional contrasts. The contrasts are realized through articulation, a biomechanical process with specific constraints. Prosodic phonology or any other theory of prosody therefore cannot be autonom ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Speech prosody, just like the segmental aspect of speech, conveys communicative meanings by encoding functional contrasts. The contrasts are realized through articulation, a biomechanical process with specific constraints. Prosodic phonology or any other theory of prosody therefore cannot be autonomous from either communicative functions or biophysical mechanisms. Successful modeling of speech prosody can be achieved only if communicative functions and biophysical mechanisms are treated as the core rather than the margins of prosody. 1.
Detecting changes in key and range for the automatic modelling and coding of intonation
- In Speech Prosody 2008
"... The analysis of authentic speech, unlike that of laboratory speech, needs to take into account the fact that the fundamental frequency patterns corresponding to the intonation of utterances can be of two types- local pitch characteristics determined by the surface phonological representation of the ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
The analysis of authentic speech, unlike that of laboratory speech, needs to take into account the fact that the fundamental frequency patterns corresponding to the intonation of utterances can be of two types- local pitch characteristics determined by the surface phonological representation of the intonation and longer term characteristics corresponding to less well understood changes in pitch key and range. In this paper a number of acoustic correlates of changes in pitch key and range are examined and compared to subjective annotations and a preliminary attempt is made to estimate these changes automatically. 1.
Analysis and Automatic Recognition of Tones in Mandarin Chinese
, 2007
"... In tonal languages, words are not simply defined by their phonemic sequence, but
also by the intonational pattern with they are spoken. In Mandarin Chinese, each
word is a sequence of syllables, and each syllable is a sequence of phonemes plus an
intonational component called a tone. Syllables can h ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In tonal languages, words are not simply defined by their phonemic sequence, but
also by the intonational pattern with they are spoken. In Mandarin Chinese, each
word is a sequence of syllables, and each syllable is a sequence of phonemes plus an
intonational component called a tone. Syllables can have one of five tones : high,
rising, low, falling, and neutral. The first four tones have distinct ideal shapes, while
the neutral tone is more of a ’none of the above’ tone and is notoriously difficult to recognize.
We first tackle the question of how important it is to recognize tones in Mandarin
Chinese. We propose an information-theoretic measure to compare the relative importance of phonological contrasts in any language, and use it to show that tones are
at least as important as vowels in conveying information in Mandarin.
With the importance of the problem settled, we move on to a large and thorough
investigation of possible acoustic features to recognize tones. We carry out hundreds
of experiments, each involves classifying over a hundred thousand syllables. This is
at least an order of magnitude larger than similar previous experiments.
Traditionally, features for Mandarin tone recognition have been based on the pitch,
duration, and overall intensity of a syllable, and we do indeed find a set of features
based on these that achieve an overall syllable classification rate of 58.9when we add
the effect of local acoustic context, and is a useful baseline.
We investigate a fourth source of features: voice quality. We first determine, using a small experiment with twenty possible voice quality measures, that features
based on band energy consistently work better for tone recognition than those based
on more complicated methods like harmonic-amplitude differences and glottal flow
experiments. We then investigate band energy features using several large-sized experiments to find a set of features that improves classification accuracy to 63.7%. As
we had hoped, most of the improvement is for neutral and low tones; for example,
the F score for Neutral Tone increases from 0.345 without band energy to 0.619 with
it. This opens up a host of new features for future speech researchers in industry and
academia to investigate and use.
We investigate making additional use of context: if we know the tones of the surrounding syllables, we can increase classification accuracy to 67.2%. (This provides a
useful upper bound for our experiments, and further underlines the significance of our
improvements in accuracy.) While we do not have such ideal contextual information,
we can use estimates of it to increase accuracy to 65.0%.
Finally, we investigate the hypothesis that syllables that are better articulated are
easier to recognize. We verify this to be true on a small corpus of lab speech from
Xu (1999), where syllables in focussed words are recognized with over 99% accuracy,
and are able to use this to improve classification accuracy of all syllables. However,
in news broadcast speech, we find that while stronger syllables are recognized better,
the difference is not enough to suggest an algorithm that makes use of the difference.
Xu. The PENTA Model From Sound to THE PENTA MODEL OF SPEECH MELODY: TRANSMITTING MULTIPLE COMMUNICATIVE FUNCTIONS IN PARALLEL
"... ABSTRACT Existing models of intonation typically define intonational components primarily in form and only secondarily in function. They also typically try to link observed F 0 contours directly to intonational meanings. The PENTA model of speech melody presented in this paper deviates from this tr ..."
Abstract
- Add to MetaCart
(Show Context)
ABSTRACT Existing models of intonation typically define intonational components primarily in form and only secondarily in function. They also typically try to link observed F 0 contours directly to intonational meanings. The PENTA model of speech melody presented in this paper deviates from this tradition. First, it makes a clear separation between the meaning-bearing components of intonation, which are functionally defined, and the primitives of speech melody, which are defined purely in form (i.e., devoid of meaning) and readily implementable in articulation. Second, it specifies mechanisms for concurrent transmission of multiple intonational functions. Third, it specifies a continuous link between articulatory mechanisms of F 0 contour generation and the functional components of speech melody. INTRODUCTION An important goal in studying intonation is to identify its components and understand how they function in speech. Much of the research toward this goal is done by observing various aspects of the acoustic signals, including the fundamental frequency (F 0 ), amplitude, duration, voice quality, and spectral characteristics. Of these by far the most researched is F 0 , which is the most direct correlate of speech melody. To identify tonal and intonational components from F 0 , much effort has been devoted to figuring out how observed F 0 curves should be divided into individual intonational components. Various proposals have been made, as seen in a variety of intonation models. Despite extensive differences among them, however, most of the existing models of intonation share two critical assumptions. First, intonational components are defined primarily in form and only secondarily in function. Second, the form-defined intonational components are directly linked to meaning. These assumptions are behind proposed intonational components such as nucleus, head and tail in the British model