Results 1 - 10
of
19
LANDMARK-BASED SPEECH RECOGNITION: REPORT OF THE 2004 Johns Hopkins Summer Workshop
, 2005
"... ..."
Levels of representation in the electrophysiology of speech perception
- Cognitive Science
, 2001
"... Mapping from acoustic signals to lexical representations is a complex process mediated by a number of different levels of representation. This paper reviews properties of the phonetic and phonological levels, and hypotheses about how category structure is represented at each of these levels, and eva ..."
Abstract
-
Cited by 5 (1 self)
- Add to MetaCart
Mapping from acoustic signals to lexical representations is a complex process mediated by a number of different levels of representation. This paper reviews properties of the phonetic and phonological levels, and hypotheses about how category structure is represented at each of these levels, and evaluates these in light of relevant electrophysiological studies of phonetics and phonology. The paper examines evidence for two alternative views of how infant phonetic representations develop into adult representations, a structure-changing view and a structure-adding view, and suggests that each may be better suited to different kinds of phonetic categories. Electrophysiological results are beginning to provide information about phonological representations, but less is known about how the more abstract representations at this level could be coded in the brain.
Can automatic speech recognition learn more from human speech perception
- Trends in Speech Technology
, 2005
"... 1 Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
1 Although a great deal of progress has been made during the last two decades in automatic speech recognition (ASR), the performance of these ASR systems, as measured by word recognition and concept understanding error rates, is still much worse than that achieved by humans, even for carefully read and articulated speech in quiet conditions. This performance gap (between machines and humans) increases even more in noisy conditions and for conversational speech. Steadily increasing computational speed and computer memory tend to impose fewer and fewer constraints on the types and the amount of recognition processing that can be brought to bear on a particular recognition task. In spite of the increased computation and memory, the state-of-the-art technology in automatic speech recognition appears to have reached a plateau in the past few years. New techniques and principles need to be invented or applied in order to substantially reduce the current performance gap in speech recognition between humans and machines. This paper presents some ideas intended to stimulate further research on applying knowledge and principles derived from studies of human speech perception to automatic speech recognition. Although the mechanisms of human speech perception (HSP) are not fully understood, some findings from neuroscience, physiology, cognitive science and psychology could potentially lead to new understanding and thereby stimulate the development of new techniques and architectures for automatic speech recognition that, eventually, will bridge and reduce the performance gap between machines and humans.
P. Perrier Control and representations in speech production
"... In this paper the issue of the nature of the representations of the speech production task in the speaker's brain is addressed in a production-perception interaction framework. Since speech is produced to be perceived, it is hypothesized that its production is associated for the speaker with the gen ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper the issue of the nature of the representations of the speech production task in the speaker's brain is addressed in a production-perception interaction framework. Since speech is produced to be perceived, it is hypothesized that its production is associated for the speaker with the generation of specific physical characteristics that are for the listeners the objects of speech perception. Hence, in the first part of the paper, four reference theories of speech perception are presented, in order to guide and to constrain the search for possible correlates of the speech production task in the physical space: the Acoustic Invariance Theory, the Adaptive Variability Theory, the Motor Theory and the Direct-Realist Theory. Possible interpretations of these theories in terms of representations of the speech production task are proposed and analyzed. In a second part, a few selected experimental studies are presented, which shed some light on this issue. In the conclusion, on the basis of the joint analysis of theoretical and experimental aspects presented in the paper, it is proposed that representations of the speech production task are multimodal, and that a hierarchy exists among the different modalities, the acoustic modality having the highest level of priority. It is also suggested that these representations are not associated with invariant characteristics, but with regions of the acoustic, orosensory and motor control spaces. 1.
The Hungarian palatal stop The Hungarian palatal stop: Phonological considerations and phonetic data
"... This study examines the movement trajectories of the dorsal tongue movements during symmetrical /VCa /-sequences, where /V / was one of the Hungarian long or short vowels /i,a,u / and C either the voiceless palatal or velar stop consonants. General aims of this study were to deliver a data-driven ac ..."
Abstract
- Add to MetaCart
This study examines the movement trajectories of the dorsal tongue movements during symmetrical /VCa /-sequences, where /V / was one of the Hungarian long or short vowels /i,a,u / and C either the voiceless palatal or velar stop consonants. General aims of this study were to deliver a data-driven account for (a) the evidence of the division between dorsality and coronality and (b) for the potential role coarticulatory factors could play for the relative frequency of velar palatalization processes in genetically unrelated languages. Results suggest a clear-cut demarcation between the behaviour of purely dorsal velars and the coronal palatals. Morevover, factors arising from a general movement economy might contribute to the palatalization processes mentioned. 1
Cognitive Components of Speech at Different Time Scales
"... Cognitive component analysis (COCA) is defined as unsupervised grouping of data leading to a group structure wellaligned with that resulting from human cognitive activity. We focus here on speech at different time scales looking for possible hidden ‘cognitive structure’. Statistical regularities hav ..."
Abstract
- Add to MetaCart
Cognitive component analysis (COCA) is defined as unsupervised grouping of data leading to a group structure wellaligned with that resulting from human cognitive activity. We focus here on speech at different time scales looking for possible hidden ‘cognitive structure’. Statistical regularities have earlier been revealed at multiple time scales corresponding to: phoneme, gender, height and speaker identity. We here show that the same simple unsupervised learning algorithm can detect these cues. Our basic features are 25-dimensional shorttime Mel-frequency weighted cepstral coefficients, assumed to model the basic representation of the human auditory system. The basic features are aggregated in time to obtain features at longer time scales. Simple energy based filtering is used to achieve a sparse representation. Our hypothesis is now basically ecological: We hypothesize that features that are essentially independent in a reasonable ensemble can be efficiently coded using a sparse independent component representation. The representations are indeed shown to be very similar between supervised learning (invoking cognitive activity) and unsupervised learning (statistical regularities), hence lending additional support to our cognitive component hypothesis.
THE INDETERMINACY/ATTESTATION MODEL OF METATHESIS
"... This paper addresses three key observations relating to crosslinguistic patterns of metathesis. First, the order of sounds resulting from metathesis can differ from language to language such that a similar combination of sounds can be realized in one order in one language, but in the reverse order i ..."
Abstract
- Add to MetaCart
This paper addresses three key observations relating to crosslinguistic patterns of metathesis. First, the order of sounds resulting from metathesis can differ from language to language such that a similar combination of sounds can be realized in one order in one language, but in the reverse order in another language. Second, for some sound combinations, only one order is commonly attested as the result of metathesis, while for other combinations, either order can be observed. Third, the acoustic/auditory cues to the identification of the sequence resulting from metathesis are often better than those of the expected, yet nonoccurring, order. These patterns receive a straightforward explanation when we consider the phonetic nature of the sounds involved as well as the speaker/hearer’s knowledge of native sound patterns and their frequency of occurrence. Neither factor alone is sufficient to provide a predictive account of metathesis. This study shows, however, that by taking into account both factors, we are able to understand why certain sound combinations tend to undergo metathesis, why others are common results of metathesis, why patterns of metathesis differ across languages, and, importantly, why metathesis occurs in the first place.* 1. INTRODUCTION. Metathesis
Using Decision Trees To Construct Optimal Acoustic Cues
"... This paper presents an approach to the optimization of acoustic cues used for stop identification in the context of an acoustic-phonetic decoding system which uses automatic acoustic event extractors (a formant tracking algorithm and a burst analyzer). The acoustic cues have been designed on the bas ..."
Abstract
- Add to MetaCart
This paper presents an approach to the optimization of acoustic cues used for stop identification in the context of an acoustic-phonetic decoding system which uses automatic acoustic event extractors (a formant tracking algorithm and a burst analyzer). The acoustic cues have been designed on the basis of acoustic studies on stops and spectrogram reading experiments. This ensures that these cues have a certain amount of discriminating power but we do not know either the optimal thresholds nor which combination of cues are the most efficient.
Place of Articulation Cues for Voiced and Voiceless
"... In this paper, the acoustic correlates of the labial and alveolar place of articulation for both plosive and fricative consonants are investigated, and the results are analyzed in terms of vowel context, voicing and manner of articulation. Several measurements, including formant and noise measuremen ..."
Abstract
- Add to MetaCart
In this paper, the acoustic correlates of the labial and alveolar place of articulation for both plosive and fricative consonants are investigated, and the results are analyzed in terms of vowel context, voicing and manner of articulation. Several measurements, including formant and noise measurements, are reported for CVs spoken by two male and two female talkers. It was found that the spectral amplitude of frication noise relativetoF1 at vowel onset results in 84% or better correct classification for the fricatives in 3 vowel contexts. For plosives, a measure which quantifies the amplitude of noise at high frequencies relative to F1 at vowel onset (Av-Ahi [8]) resulted in 81 % or better correct classification in the three vowel contexts. Formant frequency cues, on the other hand, were not reliable measures for all vowel contexts.
IN NOISE BY COCHLEAR IMPLANT USERS
"... This dissertation was produced which permit the inclusion as part of the dissertation the text of an original paper or papers submitted for publication. The dissertation must still conform to all other requirements explained in the “Guide for the Preparation of Master’s Theses and Doctoral Dissertat ..."
Abstract
- Add to MetaCart
This dissertation was produced which permit the inclusion as part of the dissertation the text of an original paper or papers submitted for publication. The dissertation must still conform to all other requirements explained in the “Guide for the Preparation of Master’s Theses and Doctoral Dissertations at The University of Texas at Dallas. ” It must include a comprehensive abstract, a full introduction and literature review and a final overall conclusion. Additional material (procedural and design data as well as descriptions of equipment) must be provided in sufficient detail to allow a clear and precise judgment to be made of the importance and originality of the research reported. It is acceptable for this dissertation to include as chapters authentic copies of papers already published, provided these meet type size, margin and legibility requirements. In such cases, connecting texts which provide logical bridges between different manuscripts are mandatory. Where the student is not the sole author of a manuscript, the student is required to make an explicit statement in the introductory material to that manuscript describing the student’s contribution to the work and

