Results 1 - 10
of
20
Unlimited vocabulary speech recognition for agglutinative languages
- In Proc. HLT-NAACL
, 2006
"... It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflecti ..."
Abstract
-
Cited by 21 (11 self)
- Add to MetaCart
(Show Context)
It is practically impossible to build a word-based lexicon for speech recognition in agglutinative languages that would cover all the relevant words. The problem is that words are generally built by concatenating several prefixes and suffixes to the word roots. Together with compounding and inflections this leads to millions of different, but still frequent word forms. Due to inflections, ambiguity and other phenomena, it is also not trivial to automatically split the words into meaningful parts. Rule-based morphological analyzers can perform this splitting, but due to the handcrafted rules, they also suffer from an out-of-vocabulary problem. In this paper we apply a recently proposed fully automatic and rather language and vocabulary independent way to build subword lexica for three different agglutinative languages. We demonstrate the language portability as well by building a successful large vocabulary speech recognizer for each language and show superior recognition performance compared to the corresponding word-based reference systems. 1
Unsupervised segmentation of words into morphemes - Challenge 2005: An Introduction and Evaluation Report
- Proceedings of the PASCAL Challenge ‘06 Workshop on Unsupervised Segmentation of Words into Morphemes
, 2006
"... ..."
(Show Context)
Weighted linear prediction for speech analysis in noisy conditions
- in Proc. Interspeech 2009, 2009
"... Abstract Following earlier work, we modify linear predictive (LP) speech analysis by including temporal weighting of the squared prediction error in the model optimization. In order to focus this so called weighted LP model on the least noisy signal regions in the presence of stationary additive no ..."
Abstract
-
Cited by 6 (6 self)
- Add to MetaCart
(Show Context)
Abstract Following earlier work, we modify linear predictive (LP) speech analysis by including temporal weighting of the squared prediction error in the model optimization. In order to focus this so called weighted LP model on the least noisy signal regions in the presence of stationary additive noise, we use shorttime signal energy as the weighting function. We compare the noisy spectrum analysis performance of weighted LP and its recently proposed variant, the latter guaranteed to produce stable synthesis models. As a practical test case, we use automatic speech recognition to verify that the weighted LP methods improve upon the conventional FFT and LP methods by making spectrum estimates less prone to corruption by additive noise.
Unsupervised segmentation of words into morphemes – Morpho Challenge 2005: Application to automatic speech recognition
- in Proc. ICSLP
, 2006
"... Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
(Show Context)
Within the EU Network of Excellence PASCAL, a challenge was organized to design a statistical machine learning algorithm that segments words into the smallest meaning-bearing units of language, morphemes. Ideally, these are basic vocabulary units suitable for different tasks, such as speech and text understanding, machine translation, information retrieval, and statistical language modeling. Twelve research groups participated in the challenge and had submitted segmentation results obtained by their algorithms. In this paper, we evaluate the application of these segmentation algorithms to large vocabulary speech recognition using statistical n-gram language models based on the proposed word segments instead of entire words. Experiments were done for two agglutinative and morphologically rich languages: Finnish and Turkish. We also investigate combining various segmentations to improve the performance of the recognizer. Index Terms: speech recognition, language modelling, morphemes, unsupervised learning.
Missing feature reconstruction and acoustic model adaptation combined for large vocabulary continuous speech recognition
- In Proc. EUSIPCO
, 2008
"... Methods for noise robust speech recognition are often evaluated in small vocabulary speech recognition tasks. In this work, we use missing feature reconstruction for noise compensation in large vocabulary continuous speech recognition task with speech data recorded in noisy environments such as cafe ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
(Show Context)
Methods for noise robust speech recognition are often evaluated in small vocabulary speech recognition tasks. In this work, we use missing feature reconstruction for noise compensation in large vocabulary continuous speech recognition task with speech data recorded in noisy environments such as cafeterias. In addition, we combine missing feature reconstruction with constrained maximum likelihood linear regression (CMLLR) acoustic model adaptation and propose a new method for finding noise corrupted speech components for the missing feature approach. Using missing feature reconstruction on noisy speech is found to improve the speech recognition performance significantly. The relative error reduction 36 % compared to the baseline is comparable to error reductions introduced with acoustic model adaptation, and results further improve when reconstruction and adaptation are used in parallel. 1.
Compact n-gram models by incremental growing and clustering of histories
- in Proc. Interspeech, 2006
"... This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morph ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
(Show Context)
This work concerns building n-gram language models that are suitable for large vocabulary speech recognition in devices that have a restricted amount of memory and space available. Our target language is Finnish, and in order to evade the problems of its rich morphology, we use sub-word units, morphs, as model units instead of the words. In the proposed model we apply incremental growing and clustering of the morph n-gram histories. By selecting the histories using maximum a posteriori estimation, and clustering them with information radius measure, we obtain a clustered varigram model. We show that for restricted model sizes this model gives better cross-entropy and speech recognition results than the conventional n-gram models, and also better recognition results than non-clustered varigram models built with another recently introduced method. Index Terms: language models, clustering, information radius, speech recognition
TinyLex: Static n-gram index pruning with perfect recall
- In Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM
"... Inverted indexes using sequences of characters (n-grams) as terms provide an error-resilient and language-independent way to query for arbitrary substrings and perform approximate matching in a text, but present a number of practical problems: they have a very large number of terms, they exhibit pat ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Inverted indexes using sequences of characters (n-grams) as terms provide an error-resilient and language-independent way to query for arbitrary substrings and perform approximate matching in a text, but present a number of practical problems: they have a very large number of terms, they exhibit pathologically expensive worst-case query times on certain natural inputs, and they cannot cope with very short query strings. In word-based indexes, static index pruning has been successful in reducing index size while maintaining precision, at the expense of recall. Taking advantage of the unique inclusion structure of n-gram terms of different lengths, we show that the lexicon size of an n-gram index can be reduced by 7 to 15 times without any loss of recall, and without any increase in either index size or query time. Because the lexicon is typically stored in main memory, this substantially reduces the memory required for queries. Simultaneously, our construction is also the first overlapping n-gram index to place tunable worst-case bounds on false positives and to permit efficient queries on strings of any length. Using this construction, we also demonstrate the first feasible n-gram index using words rather than characters as units, and its applications to phrase searching.
Higher Order Statistics in Play-out Analysis
"... Playing out the game from the current state to the end many times randomly, provides statistics that can be used for selecting the best move. This play-out analysis has proved to work well in games such as Backgammon, Bridge, and Go. This paper introduces a method that selects relevant patterns of m ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Playing out the game from the current state to the end many times randomly, provides statistics that can be used for selecting the best move. This play-out analysis has proved to work well in games such as Backgammon, Bridge, and Go. This paper introduces a method that selects relevant patterns of moves to collect higher order statistics. This can be used to improve the quality of the play outs. Play-out analysis avoids the horizon effect of regular game-tree search. The proposed method should be especially effective when the game can be decomposed into a number of subgames. Game of Y is a two-player board game played on a graph with a task of connecting three edges of the graph together. Preliminary experiments on Y did not yet show significant improvement over the first-order approach, but a door has been opened for further improvement. The game of Y might prove to be a good testbed for machine learning. 1
COMPARISON OF NOISE ROBUST METHODS IN LARGE VOCABULARY SPEECH RECOGNITION
"... In this paper, a comparison of three fundamentally differ-ent noise robust approaches is carried out. The recognition performances of multicondition training, Data-driven Paral-lel Model Combination (DPMC), and cluster-based missing data reconstruction methods implemented in a large vocab-ulary cont ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
In this paper, a comparison of three fundamentally differ-ent noise robust approaches is carried out. The recognition performances of multicondition training, Data-driven Paral-lel Model Combination (DPMC), and cluster-based missing data reconstruction methods implemented in a large vocab-ulary continuous speech recognition system are evaluated with Finnish language speech data consisting of real record-ings in noisy environments. All three methods improve the recognition accuracy substantially in poor signal-to-noise ra-tio (SNR) conditions when compared to a baseline system trained on clean speech. DPMC and missing data reconstruc-tion systems give the best performance on high SNR condi-tions. On low SNR conditions, the performance of multicon-dition trained system is ranked the best, DPMC the second best and missing data reconstruction the third. 1.
Statistical Language Modeling for Automatic Speech Recognition of Agglutinative Languages
"... Automatic Speech Recognition (ASR) systems utilize statistical acoustic and language models to find the most probable word sequence when the speech signal is given. Hidden Markov Models (HMMs) are used as acoustic models and language model probabilities are approximated using n-grams where the proba ..."
Abstract
- Add to MetaCart
(Show Context)
Automatic Speech Recognition (ASR) systems utilize statistical acoustic and language models to find the most probable word sequence when the speech signal is given. Hidden Markov Models (HMMs) are used as acoustic models and language model probabilities are approximated using n-grams where the probability of a word is conditioned on n-1 previous