Results 1 - 10
of
32
Statistical Language Modeling Using The Cmu-Cambridge Toolkit
, 1997
"... The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of ..."
Abstract
-
Cited by 264 (3 self)
- Add to MetaCart
The CMU Statistical Language Modeling toolkit was released in 1994 in order to facilitate the construction and testing of bigram and trigram language models. It is currently in use in over 40 academic, government and industrial laboratories in over 12 countries. This paper presents a new version of the toolkit. We outline the conventional language modeling technology, as implemented in the toolkit, and describe the extra efficiency and functionality that the new toolkit provides as compared to previous software for this task. Finally,we give an example of the use of the toolkit in constructing and testing a simple language model.
A novel use of statistical parsing to extract information from text
- ANLP
, 2000
"... Since 1995, a few statistical parsing algorithms have demonstrated a breakthrough in parsing accuracy, as measured against the UPenn TREEBANK as a gold standard. In this paper we report adapting a lexicalized, probabilistic context-free parser to information extraction and evaluate this new techniqu ..."
Abstract
-
Cited by 78 (4 self)
- Add to MetaCart
Since 1995, a few statistical parsing algorithms have demonstrated a breakthrough in parsing accuracy, as measured against the UPenn TREEBANK as a gold standard. In this paper we report adapting a lexicalized, probabilistic context-free parser to information extraction and evaluate this new technique on MUC-7 template elements and template relations.
Lexical Modeling Of Non-Native Speech For Automatic Speech Recognition
, 2000
"... This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-n ..."
Abstract
-
Cited by 24 (1 self)
- Add to MetaCart
This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-native data with a single model. In particular, this paper describes an attempt to better model non-native lexical patterns. These patterns are incorporated by applying context-independent phonetic confusion rules, whose probabilities are estimated from training data. Using this approach, the word error rate on a non-native test set is reduced from 20.9% to 18.8%. 1. INTRODUCTION Speech recognition accuracy has been observed to be drastically lower for non-native speakers of the target language than for native speakers [3, 13, 14]. Research on both nonnative accent modeling and dialect-specific modeling shows that large gains in performance can be achieved when the acoustics [1, 9, 14] and ...
Language Model Representations For Beam-Search Decoding
- In Proceedings of the ICASSP'95
, 1995
"... This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and ..."
Abstract
-
Cited by 17 (1 self)
- Add to MetaCart
This paper presents an efficient way of representing a bigram language model for a beam-search based, continuous speech, large vocabulary HMM recognizer. The tree-based topology considered takes advantage of a factorization of the bigram probability derived from the bigram interpolation scheme, and of a tree organization of all the words that can follow a given one. Moreover, an optimization algorithm is used to considerably reduce the space requirements of the language model. Experimental results are provided for two 10,000-word dictation tasks: radiological reporting (perplexity 27) and newspaper dictation (perplexity 120). In the former domain 93% word accuracy is achieved with real-time response and 23 Mb process space. In the newspaper dictation domain, 88.1% word accuracy is achieved with 1:41 real-time response and 38 Mb process space. All recognition tests were performed on an HP-735 workstation. 1. INTRODUCTION Many current ASR systems generate initial hypotheses through a b...
Multilingual Stochastic N-Gram Class Language Models
, 1996
"... Stochastic language models are widely used in continuous speech recognition systems where a priori probabilites of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable stat ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
Stochastic language models are widely used in continuous speech recognition systems where a priori probabilites of word sequences are needed. These probabilities are usually given by n-gram word models, estimated on very large training texts. When n increases, it becomes harder to find reliable statistics, even with huge texts. Grouping words is a way to overcome this problem. We have developed an automatic language independant classification procedure, which is able to optimize the classification of tens of millions of untagged words in less than a few hours on a Unix workstation. With this language independent approach, three corpora each containing about 30 million words of newspaper texts, in French, German and English, have been mapped into different numbers of classes. From these classifications, bi-gram and tri-gram class language models have been built. The perplexities of held-out test texts have been assessed, showing that tri-gram class models give lower values than those ob...
On adaptive decision rules and decision parameter adaptation for automatic speech recognition
- Proc. IEEE
, 2000
"... Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
Recent advances in automatic speech recognition are accomplished by designing a plug-in maximum a posteriori decision rule such that the forms of the acoustic and language model distributions are specified and the parameters of the assumed distributions are estimated from a collection of speech and language training corpora. Maximum-likelihood point estimation is by far the most prevailing training method. However, due to the problems of unknown speech distributions, sparse training data, high spectral and temporal variabilities in speech, and possible mismatch between training and testing conditions, a dynamic training strategy is needed. To cope with the changing speakers and speaking conditions in real operational conditions for high-performance speech recognition, such paradigms incorporate a small amount of speaker and environment specific adaptation data into the training process. Bayesian adaptive learning is an optimal way to combine
The Use of Clustering Techniques for Language Modeling - Application to Asian Languages
"... Cluster-based n-gram modeling is a variant of normal word-based n-gram modeling. It attempts to make use of the similarities between words. In this paper, we present an empirical study of clustering techniques for Asian language modeling. Clustering is used to improve the performance (i.e. perplex ..."
Abstract
-
Cited by 15 (11 self)
- Add to MetaCart
Cluster-based n-gram modeling is a variant of normal word-based n-gram modeling. It attempts to make use of the similarities between words. In this paper, we present an empirical study of clustering techniques for Asian language modeling. Clustering is used to improve the performance (i.e. perplexity) of language models as well as to compress language models. Experimental tests are presented for cluster-based trigram models on a Japanese newspaper corpus, and on a Chinese heterogeneous corpus.
Is N-Best Dead
- In Proceedings of the Human Language Technology Workshop
, 1994
"... We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of s ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
We developed a faster search algorithm that avoids the use of the N-Best paradigm until after more powerful knowledge sources have been used. We found, however, that there was little or no decrease in word errors. We then showed that the use of the N-Best paradigm is still essential for the use of still more powerful knowledge sources, and for several other purposes that are outlined in the paper. 1.
Improvements In Tree-Based Language Model Representation
- in Proc. of EUROSPEECH
, 1995
"... This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced thro ..."
Abstract
-
Cited by 14 (10 self)
- Add to MetaCart
This paper describes an efficient way of representing a bigram language model with a finite state network used by a beam-search based and continuous speech HMM recognizer. In a previous paper [1], a compact tree-based organization of the search space was presented, that could be further reduced through an optimization algorithm. There, it was pointed out that for a 10,000-word newspaper dictation task the minimization step could have taken a lot of time and space on a standard workstation. In this paper, a new compilation technique that takes into account the particular tree-based topology is described. Results show that without additional time and space costs, the new technique produces networks equivalent to the tree-based ones but almost as small as the optimized one. 1 INTRODUCTION The most widely used Language Models (LMs) in speech recognition are n-gram models, due to both easy inference from the training corpus and easy integrability with the decoding algorithms commonly used...
Permugram Language Models
, 1995
"... In natural languages, the words within an utterance are often correlated over large distances. Long-spanning contextual effects of this type cannot be efficiently and robustly captured by the traditional N-gram approaches of stochastic language modelling. We present a new kind of stochastic grammar ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
In natural languages, the words within an utterance are often correlated over large distances. Long-spanning contextual effects of this type cannot be efficiently and robustly captured by the traditional N-gram approaches of stochastic language modelling. We present a new kind of stochastic grammar --- the permugram model. A permugram model is obtained by linear interpolation of a large number of conventional bigram, trigram, or polygram models which operate on different permutations of the input word sequence under consideration. This way, stochastic dependences between word pairs or word triples lying adjacent as well as remote in the input text can be captured simultaneously without the requirement of very large N-grams. Using the permugram model, we achieved test set perplexity reductions of 5--10% compared with interpolated N-gram models, depending on the application. 1. INTRODUCTION In natural languages, the words within an utterance are often correlated over large distances; fo...

