Results 1 - 10
of
39
Automatic speech recognition and speech variability: A review
, 2007
"... Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several facto ..."
Abstract
-
Cited by 32 (7 self)
- Add to MetaCart
Major progress is being recorded regularly on both the technology and exploitation of automatic speech recognition (ASR) and spoken language systems. However, there are still technological barriers to flexible solutions and user satisfaction under some circumstances. This is related to several factors, such as the sensitivity to the environment (background noise), or the weak representation of grammatical and semantic knowledge. Current research is also emphasizing deficiencies in dealing with variation naturally present in speech. For instance, the lack of robustness to foreign accents precludes the use by specific populations. Also, some applications, like directory assistance, particularly stress the core recognition technology due to the very high active vocabulary (application perplexity). There are actually many factors affecting the speech realization: regional, sociolinguistic, or related to the environment or the speaker herself. These create a wide range of variations that may not be modeled correctly (speaker, gender, speaking rate, vocal effort, regional accent, speaking style, non-stationarity, etc.), especially when resources for system training are scarce. This paper outlines current advances related to these topics.
The SRI EduSpeak System: Recognition and pronunciation scoring for language learning
- PROC. OF INTEGRATING SPEECH TECHNOLOGY IN LANGUAGE LEARNING
"... The EduSpeak system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. We first report results on the application of adaptation techniques to recognize both native and ..."
Abstract
-
Cited by 20 (0 self)
- Add to MetaCart
The EduSpeak system is a software development toolkit that enables developers of interactive language education software to use state-of-the-art speech recognition and pronunciation scoring technology. We first report results on the application of adaptation techniques to recognize both native and nonnative speech in a speaker-independent manner. We discuss our pronunciation scoring paradigm and show experimental results in the form of correlations between the pronunciation quality estimators included in the toolkit and grades given by human listeners. We review phone-level pronunciation estimation schemes and describe the phone-level mispronunciation detection functionality that we have incorporated in the toolkit. Finally, we mention some of the EduSpeak toolkit system features that facilitate the creation and deployment of computer-assisted language learning (CALL) applications.
Feedback in Computer Assisted Pronunciation Training: When Technology Meets Pedagogy
- in Proceedings of CALL Conference “CALL professionals and the future of CALL research
, 2002
"... This paper is organized around two main endeavours. On the one hand, we examine currently available Computer Assisted Pronunciation Training (CAPT) systems with a view to establishing whether they meet pedagogically sound requirements. In this respect, we show that many commercial systems tend to pr ..."
Abstract
-
Cited by 19 (0 self)
- Add to MetaCart
(Show Context)
This paper is organized around two main endeavours. On the one hand, we examine currently available Computer Assisted Pronunciation Training (CAPT) systems with a view to establishing whether they meet pedagogically sound requirements. In this respect, we show that many commercial systems tend to prefer technological novelties to the detriment of pedagogical criteria that could benefit the learner more. On the other hand, we more narrowly focus on the crucial issue of computer-generated feedback, which still represents a big challenge for state-of-the-art CAPT technology and discuss its impact on learning. In the final part of the paper, we present the PROO project (Programma voor Onderwijsonderzoek), which is aimed at establishing the effects of erroneous feedback on the acquisition of L2 pronunciation.
Prosodic Features for Automatic Text-Independent Evaluation of Nativeness for Language Learners
, 2000
"... Predicting the degree of nativeness of a student utterance is an important issue in computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To achieve improved correlations between human and automatic nativeness scores ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
(Show Context)
Predicting the degree of nativeness of a student utterance is an important issue in computer-aided language learning. This task has been addressed by many studies focusing on the segmental assessment of the speech signal. To achieve improved correlations between human and automatic nativeness scores, other aspects of speech should also be considered, such as prosody. The goal of this study is to evaluate the use of prosodic information to help predict the degree of nativeness of pronunciation, independent of the text. A supervised strategy based on human grades is used in an attempt to select promising features for this task. Preliminary results show improvements in the corre- lation between human and automatic scores.
The pedagogy-technology interface in Computer Assisted Pronunciation Training
- Computer Assisted Language Learning
, 2002
"... In this paper, we examine the relationship between pedagogy and technology in Computer Assisted Pronunciation Training (CAPT) courseware. First, we will analyse available literature on second language pronunciation teaching and learning in order to derive some general guidelines for effective traini ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we examine the relationship between pedagogy and technology in Computer Assisted Pronunciation Training (CAPT) courseware. First, we will analyse available literature on second language pronunciation teaching and learning in order to derive some general guidelines for effective training. Second, we will present an appraisal of various CAPT systems with a view to establishing whether they meet pedagogical requirements. In this respect, we will show that many commercial systems tend to prefer technological novelties to the detriment of pedagogical criteria that could benefit the learner more. While examining the limitations of today's technology, we will consider possible ways to deal with these shortcomings. Finally, we will combine the information thus gathered to suggest some recommendations for future CAPT.
Effect of speech recognition-based pronunciation feedback on second language pronunciation ability,”
- University of Albertay,
, 2000
"... Abstract This study's goal was to determine whether receiving a particular type of feedback on nativeness of secondlanguage accent positively influenced pronunciation over time. Forty-five native speakers of American English of beginning to intermediate Spanish ability were randomly assigned t ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
(Show Context)
Abstract This study's goal was to determine whether receiving a particular type of feedback on nativeness of secondlanguage accent positively influenced pronunciation over time. Forty-five native speakers of American English of beginning to intermediate Spanish ability were randomly assigned to three groups. The first group was asked to practise Spanish using speech recognitionbased software that provided scores of nativeness of pronunciation. The second group practised with software that was identical but with no feedback indicating pronunciation scores. The third group did not practise with the software. The subjects' speech was recorded at the beginning of the study and again after three weeks, and scores based on log posterior probabilities were calculated. The speech recognizer outputs these scores, which have been shown to correlate well with human listeners' nonnativeness judgements
Automatic Speech Recognition for second language learning: How and why it actually works
- IN PROCEEDING OF INTERNATIONAL CONGRESSES OF PHONETIC SCIENCES
, 2003
"... In this paper, we examine various studies and reviews on the usability of Automatic Speech Recognition (ASR) technology as a tool to train pronunciation in the second language (L2). We show that part of the criticism that has been addressed to this technology is not warranted, being rather the resul ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
(Show Context)
In this paper, we examine various studies and reviews on the usability of Automatic Speech Recognition (ASR) technology as a tool to train pronunciation in the second language (L2). We show that part of the criticism that has been addressed to this technology is not warranted, being rather the result of limited familiarity with ASR technology and with broader Computer Assisted Language Learning (CALL) courseware design matters. In our analysis we also consider actual problems of state-of-the-art ASR technology, with a view to indicating how ASR can be employed to develop courseware that is both pedagogically sound and reliable.
Computing and Evaluating Syntactic Complexity Features for Automated Scoring of Spontaneous Non-Native Speech
"... This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construc ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
(Show Context)
This paper focuses on identifying, extracting and evaluating features related to syntactic complexity of spontaneous spoken responses as part of an effort to expand the current feature set of an automated speech scoring system in order to cover additional aspects considered important in the construct of communicative competence. Our goal is to find effective features, selected from a large set of features proposed previously and some new features designed in analogous ways from a syntactic complexity perspective that correlate well with human ratings of the same spoken responses, and to build automatic scoring models based on the most promising features by using machine learning methods. On human transcriptions with manually annotated clause and sentence boundaries, our best scoring model achieves an overall Pearson correlation with human rater scores of r=0.49 on an unseen test set, whereas correlations of models using sentence or clause boundaries from automated classifiers are around r=0.2. 1
Speech is Like a Box of Chocolates...
- In: Proceedings of the 15th International Congress of Phonetic Sciences
, 2003
"... Pronunciation variability is present in both native and foreign words. Since pronunciation variability constitutes a problem for automatic speech recognition (ASR) systems, modeling pronunciation variation for ASR has been the topic of various studies. In most studies, modeling pronunciation variati ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Pronunciation variability is present in both native and foreign words. Since pronunciation variability constitutes a problem for automatic speech recognition (ASR) systems, modeling pronunciation variation for ASR has been the topic of various studies. In most studies, modeling pronunciation variation was attempted within the standard framework used in mainstream ASR systems. Given that some assumptions made within this framework are not in line with the properties of speech signals and the findings in human speech recognition, and that the improvements obtained by modeling pronunciation variation within this framework have generally been small, it might be better to look for a new paradigm in which pronunciation variation can be modeled more accurately. In this paper a novel paradigm for ASR is presented, which has many potential advantages for modeling pronunciation variation.
Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children
- in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
-
Cited by 8 (6 self)
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.