| Lynn D. Wilcox and Marcia A. Bush, "HMM based wordspotting for voice editing and indexing," in Proceedings of EUROSPEECH 91, 1991, pp. 25--28, Genova, Italy. |
....spotting consists in detecting a more or less important set of keywords from the speech stream. This process gives the exact time position of a keyword. Word spotting systems based on hidden Markov models are considered more efficient at modeling arbitrary speech than template based systems [20, 21]. Two main approaches are found in the literature. The most obvious is to use a large vocabulary continuous speech recognition system (LVCSR) to produce a word string. Then, search algorithms are applied for keyword detection in that string [22] This approach is considered as giving the best ....
Lynn D. Wilcox and Marcia A. Bush, "HMM based wordspotting for voice editing and indexing," in Proceedings of EUROSPEECH 91, 1991, pp. 25--28, Genova, Italy.
....by Robertson [17] For the case where the image Y(Q, D) consists of sequences of different lengths, the T ta root in (18) provides a normalization such that short sequences do not have preference over long sequences. The same normalization is successfully used in keyword spotting systems [18] [24]. This completes the presentation of a document retrieval method based on HMMs and its relationship to probabilistic retrieval. We conclude this section by showing that our document retrieval method based on HMMs encompasses a conventional retrieval method that is based on a well known weighting ....
L. D. Wilcox and M. A. Bush. HMM-Based Wordspotting for Voice Editing and Indexing. In European Conference on Speech Communication and Technology (EUROSPEECH), pages 25-28, 1991.
....word or phrase, for example find EUROSPEECH PROCEEDINGS , but in general may also be a non linguistic sound. Word spotting systems based on Hidden Markov Models (HMMs) are considered in this report. These have proved more successful at modelling arbitrary speech than template based systems [5] [9]. For each keyword, an HMM is trained using statistical estimation techniques. Non keyword speech is modelled by one or more HMMs. Word spotting is performed by running a continuous speech recogniser with the keyword and non keyword models, and the recogniser outputs the sequence of HMMs which ....
....In general, performance improves as the number of syllables in a keyword is increased. The improvement is most noticeable at the 1st false alarm level. Wilcox and Bush also noted that increasing the number of syllables in a keyword improved the performance of their sub word keyword system [9]. They suggested that users could optimise voice editing and indexing systems by using phrases rather than single words. 14 No. of No. of False Alarms syallables 1 FA 2 FA 3 FA 10 FA 1 47.9 67.3 69.1 86.2 2 63.1 65.2 72.1 75.1 3 76.4 80.6 80.6 80.6 Table 13: Hits per false alarm for 1, 2, ....
[Article contains additional citation context not shown here]
Wilcox, L.D. and Bush, M.A. HMM-Based Wordspotting for Voice Editing and Indexing. Proc Eurospeech, Vol. 1, pp25-28, Genoa, Sept, 1991
.... energy or magnitude measures, zero crossing rate (ZCR) the autocorrelation coefficient using a one sample delay, LPC coefficients and the LPC prediction error energy [5] While it is useful for component labelling, speaker dependent word spotting is too constrained for general unsupervised use [6]. In the case of non speech audio such as music it is also possible to segment and label the components based on the inherent organisation contained in the music itself. Aigrain, et.al. 7] proposes a representation of music based on a hierarchy of objects which are automatically delimited. This ....
L.D.Wilcox, M.A.Bush, "HMM-Based Word spotting for Voice Editing and Indexing", Proc. of the Second European Conference on Speech Communication & Technology, Italy, Sept. 1991.
....algorithm, which is also the basis of the Baum Welch training algorithm described in chapter 2. Wilcox and Bush describe a single keyword wordspotter based on the forward backward algorithm and claim that it performs more accurately and more quickly than a comparable Viterbi wordspotter [62, 63]. Using the notation introduced in chapter 2, the probability of occupation of the end state e of the keyword model, given the speech data observed so far, is computed for each time t using the formula Pr(x t = ejo 1 : o t ) ff e (t) P j ff j (t) Peaks in this probability are detected ....
....Glavitsch and Scha uble claim that once a set of V CV features is estimated for a particular message domain, such as items of radio news, it is useful in indexing any set of messages from that domain. The acoustic component of their proposed retrieval system is a wordspotter based on Wilcox s [62]. Since Wilcox s forward backward wordspotter can only detect occurrences of a single keyword, it would have to be run over a speech message collection once for each single V CV feature. Whilst all the message indexing is query CHAPTER 4. PREVIOUS WORK IN SPOKEN MESSAGE RETRIEVAL ....
L. D. Wilcox and M. A. Bush. Hmm--based Wordspotting for Voice Editing and Indexing. In Proc. Eurospeech, pages 25--28, Genoa, 1991. ESCA.
....number of utterances available. 3.2.1 Single utterance A single utterance of any word will not necessarily be representative of its general pronunciation. However, if it is assumed to be representative then the simplest method of obtaining the KPS is to take the optimal Viterbi path recognised [2, 1, 14]. Errors in the Viterbi KPS caused by recognition errors and non standard pronunciation, led Kupiec et al. [10] to propose the application of a pronunciation dictionary and confusion matrices to produce an N best phone string list from the original Viterbi string. This provided multiple ....
....by partitioning the acoustic space into regions based on human linguistic knowledge. Alternatively, the acoustic space can be partitioned using automatic clustering techniques, such as Vector Quantisation (VQ) Both model types have been used for open keyword set word spotting. Wilcox and Bush [14] used sub word HMMs based on general acoustic units. An arbitrary segment of the user s speech was used to learn the statistics for a pool of Gaussian distributions, using fuzzy k means clustering. Each frame of the keyword utterance was quantised using these clusters. The KPS was built up by ....
Wilcox, L.D. and Bush, M.A. HMM-Based Wordspotting for Voice Editing and Indexing. Proc Eurospeech, Vol. 1, pp 25-28, Genoa, Sept, 1991
....of speech (background speech) and to score keywords. In [RRRG89] the non keyword part of speech is modeled by a background model consisting of segments of the keyword models. In addition, a causal posterior probability scoring method is used. The wordspotting systems presented in [RP90] and in [WB91] use a concatenation of phoneme models to represent the non keyword part of speech. In [RP90] the score reported for each keyword is a duration normalized likelihood whereas in [WB91] a posterior probability is used to compute keyword end points and two sets of backward probabilities are ....
....a causal posterior probability scoring method is used. The wordspotting systems presented in [RP90] and in [WB91] use a concatenation of phoneme models to represent the non keyword part of speech. In [RP90] the score reported for each keyword is a duration normalized likelihood whereas in [WB91] a posterior probability is used to compute keyword end points and two sets of backward probabilities are computed to detect keyword starting points. In both [RJN 93] and [Wei93] a large vocabulary recognizer is used to transcribe the speech documents and a language model is incorporated to ....
[Article contains additional citation context not shown here]
L. D. Wilcox and M. A. Bush. HMM-Based Wordspotting for Voice Editing and Indexing. In European Conference on Speech Communication and Technology (EUROSPEECH), pages 25--28, 1991.
....Randomly selected negative training data may cause a neural net to split speakers if the negative data contains too many samples of the speaker being segmented. We plan to replace the random selection process with an agglomerative clustering method which has been successfully used in other systems (Wilcox, et al. 1994; Gish, et el. 1991). We are also looking at ways to make the system more robust for running on non BBC audio by: Reducing the minimum speaker turn duration . Reducing the minimum pause required between speakers Acknowledgments The author would like to thank Chris Schmandt for useful discussions about the SI ....
Wilcox, L., Bush, A. (1991). "HMM-Based Wordspotting for Voice Editing and Indexing". Proc.
....information of the parts in the partitioned stream. In case of a text based environment, full text indexing can be applied. If a perfect speech recognizer were available, the same would hold for speech data. Algorithms for content analysis of audio are speaker identification [RS78] word spotting [WB91] and speech recognition [RHL94] Most image analysis algorithms are not real time. For example, it is almost impossible to recognize a car in arbitrary images. For video, this implies that the only content information we will have available are closed captions. Speech recognition of the audio ....
L.D. Wilcox and M.A. Bush. HMM-based wordspotting for voice editing and indexing. In Proceedings of the Second European Conference on Speech Communication and Technology, Genova, Italy, September 1991.
.... pie menus [Callahan] hierarchical pie menus [Hopkins] pop up menus, pop up property sheets [Johnson] pop up handwriting recognition pads [GO] and gesture based interfaces [Kurtenbach, GO, Rubine] Vocal user interface elements have been demonstrated by Schmandt [Schmandt] Bush and Wilcox [Wilcox], and are available in commercial speech based user interface builders [SimonSays] 2 Photo 2. Actual Xerox Liveboard Photo 3. 2X Liveboard mock up This paper addresses issues raised by locationindependence on a large display surface as exemplified by the Xerox Liveboard. We summarize related ....
Wilcox, L.D., and Bush, M. A. HMM-based Wordspotting for Voice Editing and Indexing. In Proceedings of Eurospeech '91 (Genova, Italy, Sep. 24-26). ESCA, 1991, pp. 25-28.
....by Robertson [17] For the case where the image Y (Q; D) consists of sequences of different lengths, the T th root in (18) provides a normalization such that short sequences do not have preference over long sequences. The same normalization is successfully used in keyword spotting systems [18] [24]. This completes the presentation of a document retrieval method based on HMMs and its relationship to probabilistic retrieval. We conclude this section by showing that our document retrieval method based on HMMs encompasses a conventional retrieval method that is based on a well known weighting ....
L. D. Wilcox and M. A. Bush. HMM-Based Wordspotting for Voice Editing and Indexing. In European Conference on Speech Communication and Technology (EUROSPEECH), pages 25--28, 1991.
....to efficiently search, browse, and retrieve audio over the Internet. Recently there have been several efforts to build audio retrieval and indexing systems. The most popular approach has been to index audio based on content words using either large vocabulary speech recognition or keyword spotting [1, 2, 3, 4, 5, 6]. Other cues including pitch contour, pause locations and speaker changes have also been used [7, 8, 9] In one system the closed caption text of television news broadcasts was aligned to the audio track based on pause locations enabling users to perform searches on text and then access ....
L. Wilcox and M. Bush. HMM-based Wordspotting for Voice Editing and Indexing. In Eurospeech `91, 1991, pp. 25-28. 3 The worst case error is limited to the time stamp interval for that section of the Congressional Record.
.... interface elements for the stylus include pie menus [4] hierarchical pie menus [9] pop up menus, pop up property sheets [11] pop up handwriting recognition pads [5] and gesture based interfaces [5, 12, 23] Vocal user interface elements have been demonstrated by Schmandt [25] Bush and Wilcox [30], and are available in commercial speech based user interface builders [10] This paper addresses issues raised by proximate user interfaces on a large display surface as exemplified by the Xerox Liveboard. We summarize related work both early and contemporary, enumerate some of the challenges ....
Wilcox, L.D., and Bush, M. A. HMM-based Wordspotting for Voice Editing and Indexing. In Proceedings of Eurospeech '91 (Genova, Italy, Sep. 24-26). ESCA, 1991, pp. 25-28.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC