Results 1 - 10
of
12
Crosslingual adaptation of semi-continuous HMMs using acoustic regression classes and sub-simplex projection
- COST278 and ISCA Tutorial and Research Workshop (ITRW) On Applied Spoken Language Interaction in Distributed Environments
"... With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with this task. Typically, such a crosslingual model adaptation task consists of a three step procedure. It starts by polyphone decision tree specialisation (PDTS), specialising the phonetic-acoustic decision tree of the source models to the target language. In a second step initial target language models are predicted out of the adjusted decision tree. Finally, the predicted acoustic models are adapted to the target language using a limited amount of target data. In this work we focus on the final model adaptation step in the case of a system architecture employing semi-continuous HMMs (SCHMM). In contrast to continuous density HMMs (CDHMM), adaptation techniques for SCHMMs are not as well developed. In particular, no powerful transformation based adaptation method for adjusting the information bearing mixture weights of the common prototype densities is on-hand. To overcome this problem we introduce a novel adaptation scheme for SCHMM. The method relies on the projection of retrained model parameters to a solution sub-simplex which is obtained through acoustic regression classes derived from the decision tree of the source models. The performance of the procedure is demonstrated by the transfer of multilingual Spanish-English-German models to Slovenian and to French. In the full paper, reference results for a standard maximum likelihood linear regression (MLLR) approach are given too. 1.
GEOVAQA: A VOICE ACTIVATED GEOGRAPHICAL QUESTION ANSWERING SYSTEM
, 2006
"... In this paper we present GeoVAQA, a Restricted Domain Spoken Question Answering system in the scope of the Spanish geography. The system consists of a webbased application that allows speech input questions about Spanish geography and sends back a concise textual answer. In our system, spoken questi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
In this paper we present GeoVAQA, a Restricted Domain Spoken Question Answering system in the scope of the Spanish geography. The system consists of a webbased application that allows speech input questions about Spanish geography and sends back a concise textual answer. In our system, spoken questions are recognised by an automatic speech recognition (ASR) system. We have used RAMSES, a Spanish recogniser based on semi-continuous HMMs, to perform this task. The transcribed question is the input of GeoTALP-QA, a multilingual Geographical Domain QA system. The retrieval mechanism of the QA system uses Natural Language Processing tools, a Geographical Knowledge Base and the search engine Google to get snippets. A speech question is recorded from web client and transmitted through the Internet to the demo server which transcribes the speech and retrieves a concise answer and a set of relevant snippets.
Phone Transition Acoustic Modeling: Application to Speaker
- Independent and Spontaneous Speech Systems”, ICSLP
, 2000
"... HMM-based large vocabulary speech recognition systems usually have a very large number of statistical parameters. For better estimation, the number of parameters is reduced by sharing them across models. The parameter sharing is decided by regression trees which are built using phonetic classes desi ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
HMM-based large vocabulary speech recognition systems usually have a very large number of statistical parameters. For better estimation, the number of parameters is reduced by sharing them across models. The parameter sharing is decided by regression trees which are built using phonetic classes designed either by a human expert or by data-driven methods. In situations where neither of these are reliable, it may be useful to have techniques for non-decision-tree based state tying which perform comparably to those based on traditional methods. In this paper we propose two methods for non-decision tree based parameter learning in HMM-based systems. In the first method (context-dependent state tying), we restructure acoustic models to explicitly capture the transitions between phones in continuous speech. In the second method (transition-based subword units), we redefine the basic sound units used to model speech to model transitions between sounds explicitly. Experiments show that context-dependent state tying is a viable option for large vocabulary systems. They also show that using transition-based subword units can improve performance on spontaneous speech. 1.
A Second Opinion Approach For Speech Recognition Verification
"... In order to improve the reliability of speech recognition results, a verifying system, that takes profit of the information given from an alternative recognition step is proposed. The alternative results are considered as a second opinion about the nature of the speech recognition process. Some feat ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In order to improve the reliability of speech recognition results, a verifying system, that takes profit of the information given from an alternative recognition step is proposed. The alternative results are considered as a second opinion about the nature of the speech recognition process. Some features are extracted from both opinion sources and compiled, through a fuzzy inference system, into a more discriminant confidence measure able to verify correct results and disregard wrong ones. This approach is tested in a keyword spotting task taken form the Spanish SpeechDat database. Results show a considerable reduction of false rejections at a fixed false alarm rate compared to baseline systems.
Multidialectal Spanish acoustic modeling for speech recognition
"... This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or sel ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit:
Multidialectal Acoustic Modeling: a Comparative Study
, 2006
"... In this paper, multidialectal acoustic modeling based on sharing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds betwee ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, multidialectal acoustic modeling based on sharing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds between dialects, and the decision tree structure applied. Proposed systems are tested with Spanish dialects across Spain and Latin America. All multidialectal proposed systems improve monodialectal performance using data from another dialect but it is shown that the way to share data is critical. The best combination between similarity measure and tree structure achieves an improvement of 7 % over the results obtained with monodialectal systems. 1.
Multidialectal Acoustic Modeling: a Comparative Study
"... In this paper, multidialectal acoustic modeling based on sharing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds betwee ..."
Abstract
- Add to MetaCart
(Show Context)
In this paper, multidialectal acoustic modeling based on sharing data across dialects is addressed. A comparative study of different methods of combining data based on decision tree clustering algorithms is presented. Approaches evolved differ in the way of evaluating the similarity of sounds between dialects, and the decision tree structure applied. Proposed systems are tested with Spanish dialects across Spain and Latin America. All multidialectal proposed systems improve monodialectal performance using data from another dialect but it is shown that the way to share data is critical. The best combination between similarity measure and tree structure achieves an improvement of 7 % over the results obtained with monodialectal systems. 1.
CROSSLINGUAL ADAPTATION OF SEMI-CONTINUOUS HMMS USING ACOUSTIC SUB-SIMPLEX PROJECTION
"... With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with ..."
Abstract
- Add to MetaCart
(Show Context)
With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. Transferring already existing hidden Markov models (HMM) from a source to the target language is seen as a key step to cope with this task. Typically, such a crosslingual model adaptation task consists of a three step procedure. It starts by polyphone decision tree specialisation (PDTS), specialising the phonetic-acoustic decision tree of the source models to the target language. In a second step initial target language models are predicted out of the adjusted decision tree. Finally, the predicted acoustic models are adapted to the target language using a limited amount of target data. In this work we focus on the final model adaptation step in the case of a system architecture employing semi-continuous HMMs (SCHMM). In contrast to continuous density HMMs (CDHMM), adaptation techniques for SCHMMs are not as well developed. In particular, no powerful transformation based adaptation method for adjusting the information bearing mixture weights of the common prototype densities is on-hand. To overcome this problem we introduce a novel adaptation scheme for SCHMM. The method relies on the projection of retrained model parameters to a solution sub-simplex which is obtained through acoustic regression classes derived from the decision tree of the source models. The performance of the procedure is demonstrated by the transfer of multilingual Spanish-English-German models to Slovenian and to French. In the full paper, reference results for a standard maximum likelihood linear regression (MLLR) approach are given too. 1.
CONSTRAINT INDUCTION OF PHONETIC-ACOUSTIC DECISION TREES FOR CROSSLINGUAL ACOUSTIC MODELLING
"... In this work we focus on the construction of phonetic-acoustic decision trees for crosslingual use. Several modifications to the standard decision tree growing procedure are proposed aiming on integrating phonological and acoustical knowledge from source and target languages. This results in multili ..."
Abstract
- Add to MetaCart
(Show Context)
In this work we focus on the construction of phonetic-acoustic decision trees for crosslingual use. Several modifications to the standard decision tree growing procedure are proposed aiming on integrating phonological and acoustical knowledge from source and target languages. This results in multilingual source models which already reflect characteristic of the target language though not trained on target speech data. Test results confirm the validity of this approach by improved system performances. Index Terms — crosslingual, decision tree, acoustic modelling 1.
CROSSLINGUAL ADAPTATION OF SEMI-CONTINUOUS HMMS USING ACOUSTIC REGRESSION CLASSES AND SUB-SIMPLEX PROJECTION
"... With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. To cope with this task the adaptation of hidden Markov models (HMM) is seen as a key step to transfer the models from a source to ..."
Abstract
- Add to MetaCart
(Show Context)
With the demand on providing automatic speech recognition (ASR) systems for many markets the question of porting an ASR system to a new language is of practical interest. To cope with this task the adaptation of hidden Markov models (HMM) is seen as a key step to transfer the models from a source to a target language. In this work we introduce a novel adaptation scheme for semi-continuous HMMs (SCHMM) and apply it to a crosslingual model adaptation task. The task consists in transferring multilingual Spanish-English-German HMMs to Slovenian. Test results show that substantial improvements over not adapted models can be achieved, confirming the efficiency of the method. 1.