30 citations found. Retrieving documents...
Mitch Weintraub, Francoise Beaufays, et al., "Neural Network Based Measures of Confidence for Word Recognition". Proc. ICASSP-97, Vol. 2, pp. 887-890, 1997.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

A Boosting Approach for Confidence Scoring - Moreno, Logan, Raj (2001)   (1 citation)  (Correct)

....features correlated with word confidence, including basic features of adjacent words. One of a variety of classifiers is then applied to this vector to determine confidence for the word. Features based on the acoustic model (e.g. see [3] the language model (e.g. 4] the decoding process (e.g. [5, 6, 7, 8, 9]) and word semantics [10, 11] have been proposed. Classifiers investigated include simple thresholding [7] linear discriminant analysis followed by a linear thresholds [3, 11] Bayes classifiers [8] neural networks [5, 3, 6, 12] generalized linear models [9, 13] and decision trees [6, 11] In ....

....[3] the language model (e.g. 4] the decoding process (e.g. 5, 6, 7, 8, 9] and word semantics [10, 11] have been proposed. Classifiers investigated include simple thresholding [7] linear discriminant analysis followed by a linear thresholds [3, 11] Bayes classifiers [8] neural networks [5, 3, 6, 12], generalized linear models [9, 13] and decision trees [6, 11] In this paper we explore the use of boosting techniques to classify confidence feature vectors. Boosting combines hundreds or even thousands of very simple classifiers (called weak learners in the Machine Learning literature) by a ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural - network based measures of confidence for word recognition," in Proc. ICASSP, 1997.


Rejection Measures for Handwriting Sentence Recognition - Marukatat, Artieres.. (2002)   (4 citations)  (Correct)

....decoding step, for example the lattice density [7, 16] N bests list [14] or language models [15] etc. Some other works use a post classifier to combine features such as likelihood and other statistics gathered from the decoding process (e.g. the number of letters in word, etc. into one measure [6, 8, 13]. By far, however, the most popular techniques are based on the building of a so called anti model or alternate model [1, 2, 4, 9] Such an anti model is used to normalize the likelihood of an unknown observation sequence O 1 = o 1 ; o 2 ; o T ) by computing a ratio between the joint ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. NEURAL - Network Based Measures of Confidence for Word Recognition. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pages 887-- 890, Munich, Germany, 1997.


Combination Of Confidence Measures In Isolated Word Recognition - Dolfing, Wendemuth (1998)   (2 citations)  (Correct)

.... of confidence measures applied to the acoustic model are [2, 10] to the decoding process [4] and to language model and word graphs [6, 9, 13] It is possible to combine several confidence measures of the same and or neighboring word hypotheses to solve the decision problem as demonstrated by [3, 6, 10, 11]. However, complex combination strategies do not significantly outperform simpler linear feature combinations [6] In Section 2 and 3, we introduce the procedure to arrive at the best classification given the model parameters. Section 4 and 5 introduce the experimental setup and results, ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. In Proc. ICASSP, volume II, pages 887--890, Munich, Germany, April 1997.


Advances in Confidence Measures for Large Vocabulary - Wendemuth, Rose, Dolfing (1999)   (9 citations)  (Correct)

.... of confidence measures applied to the acoustic model are [3, 11] to the decoding process [5] and to language model and word graphs [8, 10, 14] It is possible to combine several confidence measures of the same and or neighboring word hypotheses to solve the decision problem as demonstrated by [4, 8, 11, 12]. However, complex combination strategies do not significantly outperform simpler linear feature combinations [8] In Section 2 and 3, we introduce the procedure to arrive at the best classification given the model parameters. Section 4 and 5 introduce the experimental setup and results, ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. In Proc. ICASSP, volume II, pages 887--890, Munich, Germany, April 1997.


Confidence Measures for Large Vocabulary Continuous.. - Wessel, Schlüter.. (2001)   (1 citation)  (Correct)

....to compute confidence measures. Other authors try to use more probabilistic methods to solve the normalization problem without additional models. In [7] and [22] word graphs are used to compute posterior probabilities. For the computation of these probabilities N best lists are used in [17] and [21] in stead. An example for the use of additional models is the normalization of the word scores produced by a speech recognizer with the scores obtained with a phoneme recognizer [26] or filler models [21] These models are often referred to as garbage models. These approaches are clearly ....

....probabilities. For the computation of these probabilities N best lists are used in [17] and [21] in stead. An example for the use of additional models is the normalization of the word scores produced by a speech recognizer with the scores obtained with a phoneme recognizer [26] or filler models [21]. These models are often referred to as garbage models. These approaches are clearly probabilistic. In many of the cases presented above, the authors use methods from all all of these categories. Very often, a large number of features, including normalized acoustic scores and heuristic features ....

[Article contains additional citation context not shown here]

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neuralnetwork based measures of confidence for word recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing 1997.


Knowing What You Don't Know: Roles for Confidence Measures in.. - Williams (1999)   (4 citations)  (Correct)

....also been identified in several relatively recent reviews of the state of the art in the field of ASR [37, 24] The benefit of assigning a confidence estimate to a decoding is succinctly summarised by the phrase, knowing what you don t know. A popular definition for the term confidence measure [73, 205, 187, 181, 103] is: The posterior probability of word correctness, given the values of some set of confidence indicators. However, it will be argued that a much more useful definition is: A function which quantifies how well a model matches some speech data; where the values of the function must be ....

....0 0.5 1 Performance H(A) E(Z; A) Figure 3.5: Left: Values of I(Z; A) over the range of possible values of H(Z) and UER of an hypothesis test. Right: Values of E(Z; A) plotted on similar axes. A related confidence measure evaluation metric, termed reduction in cross entropy, is described in [34, 181, 103, 205]. The metric, which is similar to that described in [73] has been adopted in recent years by the DARPA NIST CSR evaluation community [217, 181, 103] A clear disadvantage of either the reduction in cross entropy, or the metric described in [73] is that the confidence measure must take the form ....

[Article contains additional citation context not shown here]

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural - network based measures of confidence for word recognition. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pages 887--890. IEEE, 1997.


A Boosting Approach for Confidence Scoring - Moreno, Logan, Raj (2001)   (1 citation)  (Correct)

....features correlated with word confidence, including basic features of adjacent words. One of a variety of classifiers is then applied to this vector to determine confidence for the word. Features based on the acoustic model (e.g. see [3] the language model (e.g. 4] the decoding process (e.g. [5, 6, 7, 8, 9]) and word semantics [10, 11] have been proposed. Classifiers investigated include simple thresholding [7] linear discriminant analysis followed by a linear thresholds [3, 11] Bayes classifiers [8] neural networks [5, 3, 6, 12] generalized linear models [9, 13] and decision trees [6, 11] In ....

....[3] the language model (e.g. 4] the decoding process (e.g. 5, 6, 7, 8, 9] and word semantics [10, 11] have been proposed. Classifiers investigated include simple thresholding [7] linear discriminant analysis followed by a linear thresholds [3, 11] Bayes classifiers [8] neural networks [5, 3, 6, 12], generalized linear models [9, 13] and decision trees [6, 11] In this paper we explore the use of boosting techniques to classify confidence feature vectors. Boosting combines hundreds or even thousands of very simple classifiers (called weak learners in the Machine Learning literature) by a ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural - network based measures of confidence for word recognition," in Proc. ICASSP, 1997.


Minimum Word Error Rate Decoding - Evermann (1999)   (Correct)

....was introduced in [Siu et al. 1997] The hypothesised word sequence is aligned with the reference transcription (e.g. using the Levenshtein distance mentioned in section 2. 6) Based on this alignment each hypothesised word w i is labeled as either correct (c i = 1) or incorrect (c i = 0) In [Weintraub et al. 1997] a minimum overlap of 50 between the hypothesis and the reference word is used as an additional constraint before labelling words as correct. The quality of a set of confidence scores q i is now captured in the following information theoretic measure (normalised cross entropy, see [Siu et al. ....

....purpose of confidence scoring. Following [Siu et al. 1997] this could be defined as p(w i , c i = 1 X, W ) where c i is the tag assigned to word w i by the scoring procedure (i.e. DP alignment with Levenshtein metric and or minimum time overlap between hypothesis and reference as suggested in [Weintraub et al. 1997]) 3.3 Calculating Posterior Probabilities in Lattices In this section an approach to estimating posterior probabilities in a speech recogniser will be discussed. To make this estimation feasible it will be based on the information present in a lattice produced by a conventional Viterbi MAP ....

Weintraub, M., Beaufays, F., Rivlin, Z., Konig, Y., and Stolcke, A. (1997). Neural - network based measures of confidence for word recognition. In Proc. ICASSP97, pages 887--890.


Dragon Systems' Automatic Transcription Of New Tdt Corpus - Larry Gillick Yoshiko (1998)   (1 citation)  (Correct)

....are a measure of the relative value of the predictors. For comparison, the table also shows an analogous model trained for the Switchboard corpus of conversational telephone speech [3] a corpus which has been the subject of numerous studies in confidence estimation (for example, see [4] [6] in addition to [2] The Switchboard model was trained from 20 Switchboard conversation halves (totaling only about 9000 words) Note the differences in the relative value of predictors for the two corpora. In particular, while Broadcast News places greatest importance on the normalized acoustic ....

M. Weintraub et al., "Neural-Network Based Measures of Confidence for Word Recognition," Proc. ICASSP97,


Confidence Scoring For Speech Understanding Systems - Pao, Schmid, Glass (1998)   (8 citations)  (Correct)

....of N best hypotheses. Additionally, recent work in confidence measures suggests that features based on an analysis of the structure of the N best recognition hypotheses can produce powerful rejection features, such as the A stabil feature [7] or the posterior log probabilities of an N best list [9]. For our experiments, we defined a word score feature which was based on the fraction of N best sentences in which a word occurred. 4.2. Linguistic and Application Specific Features Linguistic features are based on parsing a hypothesis into a syntactic and or semantic structure, such as a ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural--Network based Measures of Confidence for Word Recognition," in Proc. ICASSP, pp. 887--890, Munich, Germany, 1997.


Word and Phone Level Acoustic Confidence Scoring - Kamppari, Hazen (2000)   (8 citations)  (Correct)

....focuses on various features that can be extracted from the output of a phonetic classifier, i.e. features that can be derived from acoustic observations only. This means that features based on language model outputs are not utilized, even though their use has proven to be effective in past work [1, 7]. However, our goal is to develop an accurate acoustic confidence measure which could be combined with features from a language understanding component at a later stage in the processing. 2. IMPLEMENTATION 2.1. Overview In this paper the derivation of a word level acoustic confidence metric is ....

....scores. 2.2. Phone Level Scoring The acoustic features are primarily based on two common phonetic classification scoring approaches: normalized log likelihood (NLL) scoring and maximum a posteriori probability (MAP) scoring. This work builds on previous work which has dealt with these techniques [7]. The MAP score for a boundary model, c i , given a landmark observation, x, is expressed as: Cmap(c i j x) P(c i j x) p( xjc i )P(c i ) p( x) 1) Similarly the equivalent NLL score is expressed as: C nll (c i j x) log p( xjC i ) p( x) 2) To appear in ICASSP 2000, June 5 9, ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural-network based measures of confidence for word recognition," in Proc. ICASSP, Munich, 1997.


Fuzzy Reasoning in Confidence Evaluation of Speech.. -..   (Correct)

....that the compilation step of the information included in the recognition features is performed by means of a uniting tool based on the development of conditional probabilities. In such a way, Bayesian classifiers [1] linear discriminative analysis [11] decision trees [9] and neural networks [12] have been used as reasoning schemes to compile the involved features. We have built some classifiers based on some of the schemes mentioned to compare their performance against fuzzy systems. 4.1. Bayesian classifier (BC) This is a rather simple classifier that maps recognition features into a ....

....from the covariance matrix of the features derived from some training data. Throughout this paper, this procedure may also be called BC. 4.2. Neural Networks (MLP) Neural networks have been broadly used to combine recognition features into CM s. High performing results have been reported [12] and their advantages over other combination systems have been largely discussed [11] Network topology is always a delicate issue. Remarkable results have been achieved with multi layer perceptrons (MLP s) when trained under a back propagation framework. Simpler configurations have been preferred ....

M. Weintraub and F. Beaufays. et al. Neural - network based measure of confidence for word recognition. In Proceedings of 1997 ICASSP, volume II, pages 887--890, Munich, April 1997.


Confidence Measures From Local Posterior Probability Estimates - Williams, Renals (1999)   (1 citation)  (Correct)

....the confidence estimates as values of the test statistic. To determine whether the recognizer output is correct or incorrect, the output was aligned with the transcript. In addition to considering errors due to substitutions and insertions, poor time alignment was also considered to be an error (Weintraub et al. 1997). Specifically, for a segment of the recognition output to be considered well time aligned, an identical reference segment was required with greater than 50 of its duration overlapping with that of the recognition segment and vice versa. An example of good and poor time alignment is schematically ....

....of the recognition segment and vice versa. An example of good and poor time alignment is schematically illustrated in figure 4. REF A B A B HYP incorrect correct Figure 4: A schematic illustration of the 50 overlap criterion used to assess the time alignment of the recognition output. After Weintraub et al. 1997). The results of applying a hypothesis test to the recognition output was recorded in a 2 Theta 2 confusion matrix, such as that illustrated in figure 1. H 0 was defined to be the hypothesis that a given segment of the recognition output is correct. From such a matrix, the unconditional error ....

Weintraub, M., Beaufays, F., Rivlin, Z., Konig, Y., and Stolcke, A. (1997). Neural-network based measures of confidence for word recognition. In Proc. Int. Conf. Acoustics, Speech and Signal Processing, pages 887--890, Munich.


Dynamic Classifier Combination In Hybrid Speech Recognition.. - Kirchhoff, Bilmes (1999)   (3 citations)  (Correct)

....method. Since the use of utterance level confidence values yielded only a modest improvement, we also investigated word level confidence values. In order to estimate word correct incorrect tags from the recognition output, we utilized several features that are commonly mentioned in the literature [11, 2, 10]: ffl the duration of the word WER INS DEL SUB weight set both 5.7 0.8 1.5 3.4 0.9 0.1 both 5.8 0.8 1.6 3.4 0.8 0.2 both 5.8 0.9 1.5 3.4 0.7 0.3 both 5.4 0.7 1.4 3.2 0.6 0.4 MFCC only 5.5 0.9 1.5 3.2 0.9 0.1 MFCC only 5.5 0.9 1.5 3.1 0.8 0.2 MFCC only 5.3 0.9 1.3 3.1 0.7 0.3 MFCC only 5.2 0.9 ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. ICASSP, pages 887--890, 1997.


Using Word Probabilities As Confidence Measures - Wessel, Macherey, Schlüter (1998)   (18 citations)  (Correct)

....and n best lists, e.g. 6] Gillick et at. 3] have estimated and evaluated their confidence measure in the framework of a probabilistic approach, making use of generalized linear models for relating a confidence feature vector directly to the probability of a word to be correct. Weintraub et at. [9] have used artifical neural networks to model the relation between the different features and this probability. The computation of posterior word probabilities in this paper can be seen as an extension of [3] i.e. interpreting confidence as This work was partly funded by the European Commision ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, A. Stolcke: `Neural-Network Based Measures of Confidence for Word Recognition', in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing 1997, Munich, Germany, pp. 887-890, April 1997.


Finding Consensus in Speech Recognition: Word Error.. - Mangu, Brill, Stolcke (2000)   (19 citations)  Self-citation (Stolcke)   (Correct)

....given optimal models, the MAP decoder does not necessarily minimize the commonly used performance criterion for recognition, the word error rate (WER) Intuitively, one should maximize word posterior probabilities instead of whole sentence posteriors to minimize WER. Prior work (Stolcke, Konig Weintraub, 1997) has shown how WER can be explicitly minimized in an N best rescoring algorithm. That approach is suboptimal because it restricts hypothesis choice to a rather small set compared to the search space of the recognizer. In this paper we present a new word error minimization algorithm that is ....

....hypothesis space. A common technique uses an additional estimator that converts N best or lattice based word posterior probabilities to unbiased word correctness probabilities. Logistic regression (Siu, Gish Richardson, 1997) decision trees (Evermann Woodland, 2000b) and neural networks (Weintraub, Beaufays, Rivlin, Konig Stolcke, 1997) have been used for this purpose. These techniques also allow miscellaneous other features and knowledge sources to be incorporated into word confidence estimation. However, it has been found that the recognizer based word posteriors are usually among the most informative features predicting word ....

[Article contains additional citation context not shown here]

Weintraub, M., Beaufays, F., Rivlin, Z., Konig, Y. & Stolcke, A. (1997). Neural-Network Based Measures of Confidence for Word Recognition. In Proceedings of the IEEE Conference on Acoustics, Speech, and Signal Processing, volume 2, pp. 887--890, Munich.


The SRI March 2000 Hub-5 Conversational Speech.. - Stolcke, Bratt.. (2000)   (2 citations)  Self-citation (Stolcke)   (Correct)

....solution of DMC does not apply; instead, we optimize by gradient descent on a smoothed word error function in the style of GPD [12] 6. Confidence Estimation As in previous years, we used a neural network to estimate word correctness probabilities (confidences) from word level features [25]. However, because of time constraints, we limited the number of input features severely. Only the combined word log posteriors from the N best ROVER system were used, since this measure already constitutes a confidencemeasure that includes all knowledge sources used by the recognizer. The network ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. In Proc. ICASSP, vol. 2, pp. 887--890, Munich, 1997.


Explicit Word Error Minimization In N-Best List Rescoring - Stolcke, König, Weintraub (1997)   (25 citations)  Self-citation (Weintraub Konig Stolcke)   (Correct)

....sophisticated posterior probability estimators, with the potential for larger improvements. Our experiments so far have been based on the commonly used acoustic and language model scores, but we are already experimenting with more complex posterior estimator methods based on neural network models [6]. ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, vol. II, pp. 887-- 890, Munich, 1987.


Finding Consensus Among Words: Lattice-Based Word Error.. - Mangu, Brill, Stolcke   (34 citations)  Self-citation (Stolcke)   (Correct)

....improved wordspotting results when using the lattice based posteriors obtained as described here. A closely related problem is the estimation of word confidence measures for large vocabulary recognizers. The N best based posterior is one of the most informative features for confidence estimation [8]; consequently, we can expect improved results with lattice based posteriors. Conversely, work on confidence measures suggests that other recognizer features can be combined with acoustic and language model scores to yield improved posterior estimates, and therefore fewer word errors. Finally, we ....

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neural-network based measures of confidence for word recognition. In Proc. ICASSP, vol. 2, pp. 887-- 890, Munich, 1997.


Word Level Confidence Annotation using Combinations of Features - Rong Zhang And (2001)   (4 citations)  (Correct)

No context found.

Mitch Weintraub, Francoise Beaufays, et al., "Neural Network Based Measures of Confidence for Word Recognition". Proc. ICASSP-97, Vol. 2, pp. 887-890, 1997.


A Boosting Approach for Confidence Scoring - Moreno, Logan, Raj (2001)   (1 citation)  (Correct)

No context found.

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural - network based measures of confidence for word recognition," in Proc. ICASSP, 1997.


Confidence Estimation for Machine Translation - Blatz, Fitzgerald, Foster.. (2004)   (3 citations)  (Correct)

No context found.

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke. Neuralnetwork based measures of confidence for word recognition. In ICASSP 1997.


Confidence-Scoring Post-Processing for Off-Line.. - Pitrelli, Perrone (2003)   (Correct)

No context found.

Weintraub, M., F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural-Network Based Measures of Confidence for Word Recognition", Proceedings of ICASSP 1997.


Features for Tree Based Dialogue Course Management - Klaus Macherey And (2003)   (Correct)

No context found.

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural-Network Based Measures of Confidence for Word Recognition," in Proceedings of ICASSP, vol. 2, pp. 887--890, April 1997.


A Boosting Approach for Confidence Scoring - Moreno, Logan, Raj (2001)   (1 citation)  (Correct)

No context found.

M. Weintraub, F. Beaufays, Z. Rivlin, Y. Konig, and A. Stolcke, "Neural - network based measures of confidence for word recognition," in Proc. ICASSP, 1997.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC