• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Near-Miss Modeling: A Segment-Based Approach to Speech Recognition (1998)

by J Chang
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 12
Next 10 →

A Probabilistic Framework For Segment-Based Speech Recognition

by James R. Glass , 2003
"... Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individ ..."
Abstract - Cited by 108 (33 self) - Add to MetaCart
Most current speech recognizers use an observatE9 space based on atS8VV al sequence of measur extn ct from fixed-lengt "frames" (e.g., Mel-cepst-ce Given ahypot9; ical word or sub-word sequence, te acoustO likelihood computp;VW always involves allobservat ion frames,t,;LI t, mapping beting individual frames andintV nal recognizerstr;E will depend on t;hypotEO; zed segmentme;LH There is anotLO tot of recognizer whoseobservat ion space isbetI r represente as anet ork, or graph, where each arc in t; graph correspondst a hypotL;) zed variable-lengt segment tm is represente by a fixed-dimensional "featO e". In suchfeatSE;)E sed recognizers, eachhypotO99 zed segmentme;L will correspondt a segment sequence, orpatH ttHSV tt overall segme ntme aph th; is associato wit a subset of all possible feat revectI s intV tVLI observatEV space. Int;E work we examine a maximum apostW iori decoding stcodin forfeat ure-based recognizers and develop a normalizat ioncrit9S on useful for a segme ntme; ed VitOLO or A # search. Experiment arereport ed for bot phoneto and word recognitco tcog .

Support vector machines for speech recognition

by Aravind Ganapathiraju, Jonathan Hamaker, Joseph Picone - Proceedings of the International Conference on Spoken Language Processing , 1998
"... Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative informati ..."
Abstract - Cited by 47 (2 self) - Add to MetaCart
Statistical techniques based on hidden Markov Models (HMMs) with Gaussian emission densities have dominated signal processing and pattern recognition literature for the past 20 years. However, HMMs trained using maximum likelihood techniques suffer from an inability to learn discriminative information and are prone to overfitting and over-parameterization. Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition, and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Aphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

Modeling Out-Of-Vocabulary Words For Robust Speech Recognition

by Issam Bazzi, James Glass, Arthur C. Smith , 2000
"... This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognize ..."
Abstract - Cited by 43 (5 self) - Add to MetaCart
This thesis concerns the problem of unknown or out-of-vocabulary (00V) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance.

Real-time telephone-based speech recognition in the jupiter domain

by James R. Glass, Timothy J. Hazen, I. Lee Hetherington , 1999
"... This paper describes our experiences with developing a realtime telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of d ..."
Abstract - Cited by 40 (21 self) - Add to MetaCart
This paper describes our experiences with developing a realtime telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, the new weighted finite-state transducer–based lexical access component, and report on the current performance of the recognizer under several different conditions. We also analyze recognition latency to verify that the system performs in real time. 1.

Lexical Modeling Of Non-Native Speech For Automatic Speech Recognition

by Karen Livescu, James Glass , 2000
"... This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-n ..."
Abstract - Cited by 24 (1 self) - Add to MetaCart
This paper examines the recognition of non-native speech in jupiter, a speaker-independent, spontaneous-speech conversational system. Because the non-native speech in this domain is limited and varied, speaker- and accent-specific methods are impractical. We therefore chose to model all of the non-native data with a single model. In particular, this paper describes an attempt to better model non-native lexical patterns. These patterns are incorporated by applying context-independent phonetic confusion rules, whose probabilities are estimated from training data. Using this approach, the word error rate on a non-native test set is reduced from 20.9% to 18.8%. 1. INTRODUCTION Speech recognition accuracy has been observed to be drastically lower for non-native speakers of the target language than for native speakers [3, 13, 14]. Research on both nonnative accent modeling and dialect-specific modeling shows that large gains in performance can be achieved when the acoustics [1, 9, 14] and ...

Probabilistic Segmentation for Segment-Based Speech Recognition

by Steven C. Lee , 1998
"... Segment-based speech recognition systems must explicitly hypothesize segment start and end times. The purpose of a segmentation algorithm is to hypothesize those times and to compose a graph of segments from them. During recognition, this graph is an input to a search that finds the optimal sequence ..."
Abstract - Cited by 19 (3 self) - Add to MetaCart
Segment-based speech recognition systems must explicitly hypothesize segment start and end times. The purpose of a segmentation algorithm is to hypothesize those times and to compose a graph of segments from them. During recognition, this graph is an input to a search that finds the optimal sequence of sound units through the graph. The goal of this thesis is to create a high-quality, real-time phonetic segmentation algorithm for segment-based speech recognition. A high-quality segmentation algorithm produces a sparse network of segments that contains most of the actual segments in the speech utterance. A real-time algorithm implies that it is fast, and that it is able to produce an output in a pipelined manner. The approach taken in this thesis is to adopt the framework of a state-of-the-art algorithm that does not operate in real-time, and to make the modifications necessary to enable it to run in real-time. The algorithm adopted as the starting point for this work makes use of a for...

Telephone-Based Conversational Speech Recognition in the Jupiter Domain

by James R. Glass, Timothy J. Hazen , 1998
"... This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different ..."
Abstract - Cited by 19 (6 self) - Add to MetaCart
This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recognizer vocabulary, pronunciations, language and acoustic models for this system, and report on the current performance of the recognizer under several different conditions.

Speech Recognition Using Acoustic Landmarks and Binary Phonetic Feature Classifiers

by Amit Juneja , 2003
"... In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), i ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
In spite of decades of research, Automatic Speech Recognition (ASR) is far from reaching the goal of performance close to Human Speech Recognition (HSR). One of the reasons for unsatisfactory performance of the state-of-the-art ASR systems, that are based largely on Hidden Markov Models (HMMs), is the inferior acoustic modeling of low level or phonetic level linguistic information in the speech signal. An acoustic-phonetic approach to ASR, on the other hand, explicitly targets linguistic information in the speech signal. But an acoustic phonetic system that carries out large ASR speech recognition tasks, for example, connected word or continuous speech recognition, does not exist. We propose a probabilistic and statistical framework for ASR based on the knowledge of acoustic phonetics for connected word ASR. The proposed system is based on the idea of representation of speech sounds by bundles of binary valued articulatory phonetic features. The probabilistic framework requires only binary classifiers of phonetic features and the knowledge based acoustic correlates of the features for the purpose of connected word speech recognition. We explore the use of Support Vector Machines (SVMs) for binary phonetic feature classification because of the favorable properties well suited to our recognition task that SVMs o#er. In the proposed method, probabilistic segmentation of speech is obtained using SVM based classifiers of manner phonetic features. The linguistically motivated landmarks obtained in each segmentation is used for classification of source and place phonetic features. Probabilistic segmentation paths are constrained using Finite State Automata (FSA) for isolated or connected word recognition. The proposed method could overcome the disadvantages ...

Discriminative training of Acoustic Models in a Segment-Based Speech Recognizer

by I. Lee Hetherington, Eric D. Sandness, Eric D. Sandness , 2000
"... This thesis explores the use of discriminative training to improve acoustic modeling in a segment-based speech recognizer. In contrast with the more commonly used Maximum Likelihood training, discriminative training considers the likelihoods of competing classes when determining the parameters for a ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
This thesis explores the use of discriminative training to improve acoustic modeling in a segment-based speech recognizer. In contrast with the more commonly used Maximum Likelihood training, discriminative training considers the likelihoods of competing classes when determining the parameters for a given class's model. Thus, discriminative training works directly to minimize the number of errors made in the recognition of the training data.

Heterogeneous Measurements And Multiple Classifiers

by For Speech Recognition , 1998
"... This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dim ..."
Abstract - Add to MetaCart
This paper addresses the problem of acoustic phonetic modeling. First, heterogeneous acoustic measurements are chosen in order to maximize the acoustic-phonetic information extracted from the speech signal in preprocessing. Second, classifier systems are presented for successfully utilizing high-dimensional acoustic measurement spaces. The techniques used for achieving these two goals can be broadly categorized as hierarchical, committeebased, or a hybrid of these two. This paper presents committeebased and hybrid approaches. In context-independent classification and context-dependent recognition on the TIMIT core test set using 39 classes, the system achieved error rates of 18.3% and 24.4%, respectively. These error rates are the lowest we have seen reported on these tasks. In addition, experiments with a telephone-based weather information word recognition task led to word error rate reductions of 10--16%.
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University