16 citations found. Retrieving documents...
J. M. Hunt, M. Lenning, and P. Mermelstein, "Experiments in syllable-based recognition of continuous speech," in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Denver, CO, April 1980, pp. 880 883.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Context-Dependent Acoustic Modeling Using Graphemes For Large.. - Kanthak, Ney (2002)   (8 citations)  (Correct)

....lexicon. 1. INTRODUCTION In large vocabulary speech recognition, to satisfy the need for scalable vocabularies and to overcome the sparse training data problem, words are most commonly built from acoustic sub word units. Widely used sub word units are phonemes [t] polyphones [2] and syllables [3]. All these approaches use pronunciation lexica which provide a mapping from words to sequences of sub word units. In general, best recognition results are obtained with pronunciation lexica that are manually designed and tuned, which is a time consuming task. Additionally, context dependent ....

J. M. Hunt, M. Lenning, and P. Mermelstein, "Experiments in syllable-based recognition of continuous speech," in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Denver, CO, April 1980, pp. 880 883.


D Graphics Tools For Sound Collections - George Tzanetakis Computer   (Correct)

.... a correlate of brightness) Spectral Flux (a measure of the time variation of the signal) Spectrol Rolloff (a measure of the shape of the spectrum) and RMS energy (a correlate of loudness) Other features are Linear Prediction Coefficients (LPC) and Mel Frequency Cepstral Coefficients (MFCC) [2], which are perceptually motivated features used in speech processing and recognition. In addition, features directly computed from MPEG compressed audio can be used [3, 4] Information about the dynamic range of the features is obtained by statistically analysing the dataset of interest. ....

M. Hunt, M. Lennig, and P. Mermelstein, "Experiments in syllable-based recognition of continuous speech," in Proc. 1996.


Multifeature Audio Segmentation For Browsing And Annotation - George Tzanetakis Computer (1999)   (Correct)

....speech analysis algorithms to be based on features computed on a frame basis. This is necessary to reduce the amount of data to be processed as well as the variability. These features can be thought of as a short term description of the sound for that particular moment in time. For example MFCC [8] (Mel Frequency Cepstral Coefficients) characterize the vocal tract resonances and are commonly used in speech recognition. Since our methodology is based on frame based features it is easy to use existing front ends of other applications and extend them with segmentation. 2.5. A specific scheme ....

M. Hunt, M. Lennig, and P. Mermelstein, "Experiments in syllable-based recognition of continuous speech," in Proc. 1996.


A Framework for Audio Analysis Based on Classification and.. - Tzanetakis, Cook (1999)   (1 citation)  (Correct)

....Harmonicity is measure of how strong the pitch perception for a sound is [23] It can also be used for voiced unvoiced detection. 5 Mel Frequency Cepstral Coe#cients (MFCC) are commonly used in speech recognition [10, 18] They are a perceptually motivated compact representation of the spectrum [9]. Linear prediction (LPC) reflection coe#cients are used in speech research as an estimate of the speech vocal tract filter [11] Other features supported include Zero Crossings, RMS, Spectral Rollo# and others. For all these features means,variances and higher order statistics over larger time ....

M Hunt, M Lennig, and P Mermelstein. Experiments in syllable-based recognition of continuous speech. In Proc. 1996.


Automatic Musical Genre Classification Of Audio Signals - Tzanetakis, Essl (2001)   (25 citations)  (Correct)

....seconds = 22500 seconds (i.e 6.25 hours of audio) For the Musical Genres (Classical, Country. the combined feature set described in this paper was used. For the Classical Genres (Orchestra, Piano. and for the Speech Genres (MaleVoice, FemaleVoice. mel frequency cepstral coefficients [21] (MFCC) were used. MFCC are perceptually motivated features commonly used in speech recognition research. In a similar fashion to the Music Surface features, the means and standard deviations of the first five MFCC over a larger texture window (1 second long) were calculated. MFCCs can also be ....

Hunt, M., Lennig, M., and Mermelstein, P. Experiments in syllable-based recognition of continuous speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, 1996, 880-883.


Automatic Musical Genre Classification Of Audio Signals - Computer (2001)   (25 citations)  (Correct)

....of audio. For the Musical Genres (Classical, Country. the combined feature set described in this paper was used. For the Classical Genres (Orchestra, Piano. only the Music Surface features were used and for the Speech Genres (MaleVoice, FemaleVoice. mel frequency cepstral coefficients [16] (MFCC) were used. MFCC are features commonly used in speech recognition research. Music Classical Country Disco HipHop Jazz Rock Orchestra Piano Choir StringQuartet Speech MaleVoice, FemaleVoice, SportsAnnouncing Fig. II Genre Classification Hierarchy Table 1. Classification accuracy ....

....mot of the time a rap song will trigger Male Voice, Sports Announcing and HipHop. This exact case is shown in Figure IV. GenreSpace is a tool for visualizing large sound collections for browsing. Each audio file is represented a single point in a 3D space. Principal Component Analysis (PCA) [16] is used to reduce the dimensionality of the feature vector representing the file to the 3 dimensional feature vector corresponding to the point coordinates. Coloring of the points is based on the automatic genre classification. The user can zoom, rotate and scale the space to interact with the ....

Hunt, M., Lennig, M., and Mermelstein, P. Experiments in syllable-based recognition of continuous speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, 1996, 880-883.


Recognition of Consonant-Vowel (CV) Utterances Using Modular Neural .. - Rao (2000)   (Correct)

....accent, word pronunciation etc. and give a high recognition accuracy. The phonemes are context sensitive because a phoneme is signi cantly a ected by its adjacent phonemes [7] However, there are only a small number of phonemes. Larger units such as diphones, demisyllables [12] syllables [13] have been used for better characterization of coarticulation between adjacent sounds [3,14] However, they have long duration and are large in number. In the following sections, we review the approaches to continuous speech recognition using the subword unit models. 2.2 Segmentation based ....

M. J. Hunt, M. Lennig and P. Mermelstein, \Experiments in syllable-based recognition of continuous speech," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, Denver, 1980.


Audio Information Retrieval (AIR) Tools - Tzanetakis, Cook (2000)   (4 citations)  (Correct)

....spectral content of the chunk. Many di#erent features have been proposed in the literature. Our system supports features based on : FFT (Fast Fourier Transform) analysis . MPEG filterbank analysis [15, 8] LPC (Linear Predictive Coding) 6] MFCC (Mel Frequency cepstrum coe#cients) [3] In the majority of the literature in audio analysis a combination of these types of features is used. Another frequently used technique is to compute the means and variances of these features over a larger time window to obtain more smoothly changing features. In addition derivatives of the ....

M. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In Proc.ICASSP, 1980.


Integrating Syllable Boundary Information Into Speech.. - Wu, Shire, Greenberg.. (1997)   (12 citations)  (Correct)

....as well as a model by one of the authors [6] suggests that the syllable is a basic perceptual unit for speech processing in humans. The syllable was proposed as a basic unit of automatic (computer) speech recognition as early as 1975 [7, 8] and this idea has been periodically reexamined (e.g. in [9, 10, 11, 12, 13]) The syllabic level confers several potential benefits; for one, syllabic boundaries are more precisely defined than phonetic segment boundaries in both the speech waveform and in spectrographic displays. Additionally, the syllable may serve as a natural organizational unit useful for reducing ....

....(CV) vowelconsonant (VC) V, or CVC varieties. These structural regularities can, in principle, be exploited to reliably estimate syllabic boundaries. Previous research on detecting syllable boundaries and using this information to improve recognition accuracy is reported for English [8, 9, 10] and for German [12, 13] In this communication we describe a perceptually oriented method for the automatic delineation of syllabic onsets. Artificial neural networks (NNs) are used to classify both phonetic segments and potential syllabic onsets. In a departure from previous research, we focus ....

M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In ICASSP, volume 3, pages 880--883, Denver, Colorado, April 1980. IEEE.


Baby Ears: A Recognition System for Affective Vocalization - Slaney, McRoberts (1998)   (4 citations)  (Correct)

....and the mean of the absolute delta pitch. The mean delta pitch is similar to the slope measurement. When either frame s pitch is undefined, because it is unvoiced, the delta pitch measures are undefined and do not enter into the calculation. We used mel frequency cepstral coefficients (MFCC) [5] to measure the formant information in the speech. MFCC parameters are often used in speech recognition as a simple measure of what is being said. We wanted to investigate whether the speed with which these parameters changed would be a useful feature. Thus, we measured the mean frame by frame ....

M. J. Hunt, M. Lennig, P. Mermelstein. "Experiments in syllable -based recognition of continuous speech." Proceedings of 1980 ICASSP, Denver, CO, pp. 880-883, 1980.


The 1997 Abbot System For The Transcription Of Broadcast News - Cook Robinson   (Correct)

....Syllable Boundary Information This section reports experiments aimed at improving recognition accuracy by incorporating syllable boundary information during search. Previous research on detecting syllable boundaries and using this information to improve recognition accuracy has been reported [7, 8]. In this work we use the method of Wu et al. [7] 3.1. Detecting Syllable Boundaries The broadcast news training data does not include syllable boundary or phonetic alignment information. An automatic procedure for determining syllable boundaries is therefore required. The method used in this ....

M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in Syllable-based Recognition of Continuous Speech. International Conference on Acoustics, Speech, and Signal Processing, 3:880--883, April 1980. Denver, Colorado.


Transcribing Broadcast News With The 1997 Abbot System - Gary Cook   (Correct)

....SYLLABLE BOUNDARY INFORMATION This section reports experiments aimed at improving recognition accuracy by incorporating syllable boundary information during search. Previous research on detecting syllable boundaries and using this information to improve recognition accuracy has been reported [18, 6]. In this work we use the method of Wu et al. [18] 5.1. Detecting Syllable Boundaries The broadcast news training data does not include syllable boundary or phonetic alignment information. An automatic procedure for determining syllable boundaries is therefore required. The method used in this ....

M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in Syllable-based Recognition of Continuous Speech. International Conference on Acoustics, Speech, and Signal Processing, 3:880--883, April 1980. Denver, Colorado.


In This Chapter, - Discuss The (1997)   (Correct)

....will learn what has made the perceptual system successful. We wish to discover which attributes of the perceptual representation are important and should be incorporated into the top down systems. Clearly the mel frequency cepstral coefficient (MFCC) representation in the speechrecognition world (Hunt et al. 1980) is one such win for perception science. Likewise, those of us who design pure audition systems need to acknowledge all the top down information that we are ignoring in the pursuit of our sound understanding systems. Much information is processed without regard to high level representations. We ....

Hunt, M. J., Lennig, M., & Mermelstein, P. (1980). Experiments in syllablebased recognition of continuous speech. Proceedings of the 1980 ICASSP, Denver, CO, pp. 880--883.


Construction And Evaluation Of A Robust Multifeature.. - Scheirer, Slaney (1997)   (59 citations)  (Correct)

....the variance of the derivative, the third central moment, the thresholded value, and a skewness measure. The features used in this system are: ffl 4 Hz modulation energy: Speech has a characteristic energy modulation peak around the 4 Hz syllabic rate [3] We use a portion of the MFCC algorithm [4] to convert the audio signal into 40 perceptual channels. We extract the energy in each band, bandpass filter each channel with a second order filter with a center frequency of 4 Hz, then calculate the short term energy by squaring and smoothing the result. We normalize each channel s 4 Hz energy ....

M. J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-basedrecognition of continuous speech. In Proc. 1980 ICASSP, pages 880--883, 1980.


Analyzing And Improving Statistical Language Models For Speech.. - Ueberla (1994)   (2 citations)  (Correct)

....based language models ( 105] and phone based language models ( 96] 130] or to models like [47] where the language model is made dependent on the state of a LR parser. In general, speech recognition systems have been based on phonemes ( 142] diphones ( 103] 22] 1] 133] syllables ([55], 154] 44] demi syllables ( 127] 124] and disyllables ( 137] Language models can be built on all of these levels and the idea proposed here is applicable to all of them. As an example, we will show how the idea of identifying weaknesses can be applied to a syllable based language model. ....

M. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 880--883. Denver, CO, 1980.


Incorporating Information From Syllable-length Time Scales Into.. - Wu (1998)   (20 citations)  (Correct)

....1990 s. Hunt et al. performed a pilot experiment in which they incorporated syllables into the recognition of a small vocabulary American English task by first attempting to segment the input speech signal into syllabic intervals using what the authors called the loudness contour of the waveform [92, 91]. This syllablebased system attempted to estimate syllable boundaries, then formed recognized syllable sequences into words and sentences. In this system, Mermelstein s automatic segmentation system assessed syllable boundaries from a loudness function computed over the entire power spectrum ....

....most closely related projects are discussed in further detail below. In the mid 1970s, Mermelstein described a method for the automatic segmentation of speech into syllabic units using a loudness criteria [129] Hunt, Lennig and Mermelstein incorporated this method into a speech recognition system [92, 91]. As mentioned in Chapter 2, Mermelstein calculated their loudness function over the entire power spectrum. In their recognition experiments they used a single speaker for both training and testing the system. The test set comprised the same word sequences as the training set, re recorded by the ....

M.J. Hunt, M. Lennig, and P. Mermelstein. Experiments in syllable-based recognition of continuous speech. In ICASSP, volume 3, pages 880--883, Denver, Colorado, April 1980. IEEE.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC