| Glass, J., Chang, J., and McCandless, M. A Probabilistic Framework for Feature-based Speech Recognition. In Proceedings of ICSLP '96 (Philadelphia, PA, October 1996), vol. 4, pp. 2277-2280. |
....finite state transducers, each of which models different variations resulting from different underlying causes. 2. OVERVIEW 2.1. Segment Based Recognition The experiments presented in this paper use the SUMMIT speech recognition system. SUMMIT uses a segment based approach for acoustic modeling [3]. This approach differs from the standard hidden Markov modeling (HMM) approach in that the acousticphonetic models are compared against pre hypothesized variablelength segments instead of fixed length frames. While HMM systems allow multiple frames to be absorbed by a single phoneme model via ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, Philadelphia, PA, October 1996.
....Inc. TellMe Networks Inc. and Philips Electronics NV. In addition to the above commercial products, several academic institutions also have developed speech recognition systems. Among them are the Sphinx system [4] developed at Carnegie Mellon University and the SUMMIT system developed at MIT [5]. 1.3.2 Speech interface Speech interface is an active research area. State of the art speech recognition and understanding systems have made speech aware applications practical. However, 17 designing a good voice interface for a given application remains a challenge. A number of approaches for ....
J. Glass, J. Chang, and M. McCandless, "A Probabilistic Framework for FeatureBased Speech Recognition", Proc. ICSLP 96, pp. 2277-2280, Philadelphia, PA, October 1996.
....the worth of the new vocabulary entries. The following sections detail how to eliminate vocabulary items the recognizer finds little use for, and how to detect and resolve competition between similar items. Extracting OOV phone sequences Recognizer is that developed by the SLS group at MIT [8]. The recognizer used the OOV model developed by Bazzi in [3] This model can match an arbitrary sequence of phones, and has a phone bigram to capture phonotactic constraints. The OOV model is placed in parallel with the models for the words in the vocabulary. A cost parameter can control how much ....
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for feature-based speech recognition. In Proc. International Conference on Spoken Language Processing, pages 2277--2280, 1996.
....further improvements harder to achieve. Pause model: The pause model (Pau) was a standard backoff trigram model predicting only three levels of pause duration: This is a loose approximation since the features are computed using the segmentation information associated with W . The work in [10] proposes a solution to this problem, but for this study we take the independence assumption. The algorithm is implemented in the SRILM Toolkit, available from http: www.speech.sri.com projects srilm . Tab l e 1 . Word Error rates of the MAP hypothesis using rescoring of N best hypotheses ....
....of the independence assumptions made in our models, and will try to make use of more prosodic features. Currently, we ignore the fact that the prosodic features computed are dependent on the current hypothesis, which makes the likelihood scores of different hypotheses not entirely comparable. In [10] a normalization anti phone model is proposed to account for this error in probability estimation. Also we plan to explore more of the interaction between prosodic features like pitch and energy with the word sequence. In the current work such features were modeled only indirectly through their ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, H. T. Bunnell and W. Idsardi, Eds., Philadelphia, Oct. 1996, vol. 4, pp. 2277--2280.
....a frame rate of 10ms. From the MFCCs, 112 dimension input feature vectors were created by concatenating averages of MFCCs from eight different segments surrounding the current frame. Principal components analysis was then used to reduce the dimensionality of these feature vectors to 50 dimensions [4]. Global GMM models were then trained for each speaker using all non silence frames in their enrollment data. In Proceedings of the 7th International Conference on Spoken Language Processing, Sep. 16 20, 2002, Denver, Colorado, pp. 1337 1340. Phonetically Structured GMM Speech Frames Phone ....
....is limited to approximately 50 variable length utterances per speaker. The total amount of training data per speaker ranges from 30 seconds to 90 seconds of actual speech. 3.2. Experimental Conditions For both corpora we used domain dependent implementations of the MIT SUMMIT speech recognizer [4]. On the YOHO data set, the vocabulary was limited to allow only the set of possible numerical combination lock phrases. On the MERCURY data set, the recognizer was limited to a 2200 word vocabulary for conversational queries regarding airline travel. Empirically determined parameters such as ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, Philadelphia, Oct. 1996, pp. 2277--2280.
....finite state transducers, each of which models different variations resulting from different underlying causes. 2. OVERVIEW 2.1. Segment Based Recognition The experiments presented in this paper use the SUMMIT speech recognition system. SUMMIT uses a segment based approach for acoustic modeling [3]. This approach differs from the standard hidden Markov modeling (HMM) approach in that the acousticphonetic models are compared against pre hypothesized variablelength segments instead of fixed length frames. While HMM systems allow multiple frames to be absorbed by a single phoneme model via ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, Philadelphia, PA, October 1996.
....perhaps acquired from cooperative speech, and bootstrap from it to identify further candidate vocabulary items drawn from arbitrary speech in an unsupervised manner. We show how to cast this process in a form that can be largely implemented using a conventional speech recognition system [8], even though such systems are designed with very different applications in mind. This is advantageous since, after decades of research, such systems are expert at making acoustic judgments in a probabilistically sound way from acoustic, phonological, and language models. Keywords: speech ....
....the new vocabulary entries. The following sections detail how to eliminate vocabulary items the recognizer finds little use for, and how to detect and resolve competition between similar items. Extracting OOV phone sequences We use the speech recognizer system developed by the SLS group at MIT [8]. The recognizer is augmented with the OOV model developed by Bazzi in [2] This model can match an arbitrary sequence of phones, and has a phone bigram to capture phonotactic constraints. The OOV model is placed in parallel with the models for the words in the vocabulary. A cost parameter can ....
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for feature-based speech recognition. In Proc. International Conference on Spoken Language Processing, pages 2277--2280, 1996.
....the framework. Figure 1 1 shows a typical GALAXY configuration consisting of a central programmable hub, which handles the communications among various human language technology services via a frame data structure. The audio server captures a user s speech and sends it to the speech recog nizer [16], which creates a word graph of hypotheses. This information is dispatched to the language understanding service, via the hub, where the word graph is parsed by TINA [40] The best hypothesis is encoded in a semantic frame representation and 2 Conversational system and spoken dialogue system will ....
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for feature- based speech recognition. In Proc. Intl. Conf. on Spoken Language Processing, pages 2277-2280, Philadelphia, Pennsylvania, 1996.
No context found.
J. Glass, J. Chang, and M. McCandless, "A Probabilistic Framework for Feature-Based Speech Recognition," in Proc. ICSLP, Philadelphia, Pennsylvania, 1996.
....In the case of substitutions, measurements are extracted from the outermost portions of a speech segment. For concatenations, measurements are extracted about a boundary up to an arbitrary distance away, because the location of the next boundary on either side is unknown. As in the summit system [50], a segment is divided into roughly thirds (in a 3 4 3 ratio) with the outermost portions being the first and last 30 of a segment. In the lower part of Figure 4 3 these 30 portions are bounded by dotted lines and form regions over which prototype measurements are averaged. Notice that the ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for featurebased speech recognition," in Proc. ICSLP '96, Philadelphia, PA, Oct. 1996, pp. 2277--2280.
....The notation, #[#] refers to a unit, #, with # on its right side. Similarly, the notation, #]#, refers to a unit, #, with # on its left side. As in the SUMMIT system, we form the observation vector from a telescoping average of melfrequency cepstral coefficients on either side of the boundary [7]. ## x#[#] x [#]# Fig. 2. Graphical representation of boundary measurements. As we will see in the next section, we collect second order statistics (i.e. mean and covariance) on measurements observed across boundaries. Note that with the above definitions, joint statistics (x # # ) can be ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," Proc ICSLP, IV:2277-2280, Philadelphia, 1996.
....output language of the current query. This means that a user can carry on a dialogue in mixed languages, with the system producing the appropriate responses to each query. 2.1. Speech Recognition Speech recognition for the MOKUSEI system is performed using the SUMMIT speech recognition system [7]. Currently the recognizer uses a vocabulary of 1,151 words relevant to the weather domain. A majority of these words are names of geographic locations and words describing various weather conditions. A phonetic pronunciation for each word has been created using a standard set of Japanese phonetic ....
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, 1996, pp. 2277--2280.
....are off a path. Recently, we have recognized the necessity of accounting for the entire graph of segments. To this end, we have developed a segment based framework called antiphone modeling based on the idea that an off path segment is not a phone and therefore can be modeled as an anti phone [22]. Anti phone modeling maintains the probabilistic framework by normalizing all paths to implicitly account for all segments. However, anti phone modeling requires all off path segments to be modeled by a single anti phone model even though off path segments can vary greatly with context. For ....
....models each segment on the segment sequence using a lexical unit, u i , and models all remaining off path segments using a single 0 state unit, u. Function The use of the 0 state unit in near miss modeling is functionally equivalent to the use of the anti phone unit in anti phone modeling [22]. Since near miss modeling and anti phone modeling maintain the same probabilistic framework and use the same modeling strategy, they will find the same best path, as long as they search the same space. Further, the two strategies find the same best path, even with pruning, as long as they use the ....
[Article contains additional citation context not shown here]
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for featurebased speech recognition. In Proceedings of the International Conference on Spoken Language Processing, pages 2277--2280, 1996.
No context found.
Glass, J., Chang, J., and McCandless, M. A Probabilistic Framework for Feature-based Speech Recognition. In Proceedings of ICSLP '96 (Philadelphia, PA, October 1996), vol. 4, pp. 2277-2280.
No context found.
Glass J., Chang J., and McCandless M. (1996) : "A probabilistic framework for feature-based speech recognition," Proc. ICSLP '96, pp. 2277-2280.
No context found.
Glass, J., J. Chang, and M. McCandless (1996). A probabilistic framework for featurebased speech recognition. In ICSLP'96, Philadelphia, USA, pp. 2277--2280.
No context found.
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for featurebased speech recognition. In Proc. Intl. Conf. on Spoken Language Processing, pages 2277--2280, Philadelphia, October 1996.
No context found.
J. Glass, J. Chang, and M. McCandless. 1996. A probabilistic framework for feature-based speech recognition. In Proc. ICSLP, pages 1--4, Philadelphia, PA.
No context found.
J. Glass, J. Chang and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. Inter. Conf. on Spoken Language Proc., pp. 2277-2280, 1996.
No context found.
J. Glass, J. Chang, and M. McCandless, "A Probabilistic Framework for Feature-Based Speech Recognition," Proc. ICSLP 96, pp. 2277-2280, Philadelphia, PA, October 1996.
No context found.
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," Proc. of ICSLP, Philadelphia, 1996.
No context found.
James Glass, Jane Chang, and Michael McCandless, "A Probabilistic Framework For Feature-Based Speech Recognition," Proc. of Int. Conf. of Spoken Language Processing, Philadelphia, PA, pp. 2277-2280, 1996.
No context found.
J. Glass, J. Chang, M. McCandless, "A probabilistic framework for feature-based speech recognition," Proc. ICSLP '96, Philadelphia, PA, pp. 2277--2280, October, 1996.
No context found.
J. Glass, J. Chang, and M. McCandless, "A probabilistic framework for feature-based speech recognition," in Proc. ICSLP, H. T. Bunnell and W. Idsardi, Eds., Philadelphia, Oct. 1996, vol. 4, pp. 2277--2280.
No context found.
J. Glass, J. Chang, and M. McCandless. A probabilistic framework for feature-based speech recognition. In Proc. ICSLP, pages 2277--2280, 1996.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC