Results 1 - 10
of
12
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Articulatory Tradeoffs Reduce Acoustic Variability during American English /r/ Production
, 1999
"... The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory tra ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously ...
The Need for Increased Speech Synthesis Research: Report of the 1998 NSF Workshop for Discussing Research Priorities and Evaluation Strategies in Speech Synthesis
"... This report outlines what these areas are, and what kind of research is needed. ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This report outlines what these areas are, and what kind of research is needed.
Priorities and Evaluation Strategies in Speech Synthesis Contents
, 1999
"... Contributions from some of the participants in the NSF Speech Synthesis Workshop, ..."
Abstract
- Add to MetaCart
Contributions from some of the participants in the NSF Speech Synthesis Workshop,
PARAFAC analysis of the three dimensional tongue shape
- JASA
, 2003
"... this paper is to demonstrate that PARAFAC successfully represents the three-dimensional shape of the tongue surface extracted from coronal magnetic resonance (MR) image stacks. Two types of measurement vectors are analyzed: a vector of 3D pseudo-fleshpoint coordinates extracted uniformly from the le ..."
Abstract
- Add to MetaCart
this paper is to demonstrate that PARAFAC successfully represents the three-dimensional shape of the tongue surface extracted from coronal magnetic resonance (MR) image stacks. Two types of measurement vectors are analyzed: a vector of 3D pseudo-fleshpoint coordinates extracted uniformly from the length and width of the tongue surface, and a vector of 2D pseudo-fleshpoint coordinates extracted from a curve along the tongue surface close to the midsagittal plane. The 2D pseudo-fleshpoint coordinates are structurally similar to the type of data analyzed by Nix et al. (1996). Measurement data are indexed by speaker identity, phonemic vowel identity, and twodimensional measurement position. A variety of data pre-processing strategies were attempted; the method that yields the best results is similar but not identical to the preprocessing methods of Nix et al. (1996)
Principal Components Representation of the
- Phonetica
, 2002
"... This paper uses principal components (PC) analysis to represent coronal tongue contours for the eleven vowels of English in two consonant contexts (/s/, /l/), based upon five replicated measurements in three sessions for each of six subjects. Curves from multiple sessions and speakers were overlaid ..."
Abstract
- Add to MetaCart
This paper uses principal components (PC) analysis to represent coronal tongue contours for the eleven vowels of English in two consonant contexts (/s/, /l/), based upon five replicated measurements in three sessions for each of six subjects. Curves from multiple sessions and speakers were overlaid before analysis onto a common (x, y) coordinate system by extensive preprocessing of the curves including: extension (padding) or truncation within session, translation, and truncation to a common x-range. Four PC's plus a mean level allow accurate representation of coronal tongue curves, but PC shapes depend strongly on the degree of padding or truncation. The PC's successfully reduced the dimensionality of the curves and reflected vowel height, consonant context, and physiological features.
Nayak, “Accelerated threedimensional upper airway MRI using compressed sensing
- Magn. Reson. Med
, 2009
"... upper airway has provided insights into vocal tract shaping and data for its modeling. Small movements of articulators can lead to large changes in the produced sound, therefore improving the resolution of these data sets, within the constraints of a sustained speech sound (6–12 s), is an important ..."
Abstract
- Add to MetaCart
upper airway has provided insights into vocal tract shaping and data for its modeling. Small movements of articulators can lead to large changes in the produced sound, therefore improving the resolution of these data sets, within the constraints of a sustained speech sound (6–12 s), is an important area for investigation. The purpose of the study is to provide a first application of compressed sensing (CS) to high-resolution 3D upper airway MRI using spatial finite difference as the sparsifying transform, and to experimentally determine the benefit of applying constraints on image phase. Estimates of image phase are incorporated into the CS reconstruction to improve the sparsity of the finite difference of the solution. In a retrospective subsampling experiment with no sound production, 5 � and 4 � were the highest acceleration factors that produced acceptable image quality when using a phase constraint and when not using a phase constraint, respectively.
Approximants: Evidence from Two English Dialects
"... A surprising dissimilarity is found in the perception of approximant sounds by speakers of American English (AE) and Standard Southern British English (SSBE) dialects. Eighteen subjects (6 AE and 12 SSBE speakers) performed an identification task in which they judged whether stimuli were more like / ..."
Abstract
- Add to MetaCart
A surprising dissimilarity is found in the perception of approximant sounds by speakers of American English (AE) and Standard Southern British English (SSBE) dialects. Eighteen subjects (6 AE and 12 SSBE speakers) performed an identification task in which they judged whether stimuli were more like /r / or /w/. The stimuli comprised five sounds copy-synthesised from a source /r/, where formant values (F1-F3) were manually adjusted as follows: A: F1=355 F2=1201 F3=1682 (/r/-like formants) B: F1=355 F2 = 963 F3=1682 (F2 at midpoint of /r / and /w/; F3 /r/-like) C: F1=355 F2 = 1201 F3=2541 (F2 /r/-like; F3 raised to /w/-like height) D: F1=355 F2 = 725 F3=1682 (F2 lowered to /w/-like height; F3 /r/-like) E: F1=355 F2 = 725 F3=2541 (/w/-like formants) The only significant difference (t=2.031, p<.05) between the two dialect groups ’ performance occurred with Stimulus D in which F3 was typical for /r / and F2 was typical for /w/. AE speakers identified this stimulus as /r / 90 % of the time and SSBE speakers only 59 % of the time. Such a disparity is unexpected given that alveolar approximant /r / in both dialects is generally characterised
and Change: Preliminary Findings from an Ultrasound Study of Derhoticization in
"... Scottish English is often cited as a rhotic dialect of English. However, in the 70s and 80s, researchers noticed that postvocalic /r / was in attrition in Glasgow (Macafee, 1983) and Edinburgh (Romaine, 1978; Johnston and Speitel 1983). Recent research (Stuart-Smith, 2003) confirms that postvocalic ..."
Abstract
- Add to MetaCart
Scottish English is often cited as a rhotic dialect of English. However, in the 70s and 80s, researchers noticed that postvocalic /r / was in attrition in Glasgow (Macafee, 1983) and Edinburgh (Romaine, 1978; Johnston and Speitel 1983). Recent research (Stuart-Smith, 2003) confirms that postvocalic /r / as a canonical phonetically rhotic consonant is being lost in working-class Glaswegian speech. However, auditory and acoustic analysis revealed that the situation was more complicated than simple /r / vs. zero variation. The derhoticized quality of /r / seemed to vary socially; in particular male working class speakers often produced intermediate sounds that were difficult to identify. It is clear that although auditory and acoustic analysis are useful, they can only hint at what is going on in the vocal tract. A direct articulatory study is thus motivated. Instrumental phonetic studies that examine the vocal tract during the production of sustained rhotic consonants and in laboratory-based studies of American English /r / have identified a complex relationship between articulation and acoustics, including articulatory differences with minimal acoustic consequences (starting with Delattre and Freeman, 1968). In other words, different gestural configurations can be used to generate a canonically rhotic consonant. A pilot study (Scobbie and

