| K. M. Knill and S. J. Young, "Speaker Dependent Keyword Spotting for Accessing Stored Speech," Cambridge University Engineering Dept., Tech. Report, No. CUED/F-INFENT/TR 193, 1994. |
....the user is able to interact with the system without keeping start and end of gestures in mind. HMM based pattern spotting is done by placing keywords modeled by HMMs and filler models in parallel in a loop. This way, the keyword can be recognized while rejecting non keywords through filler models[5]. 3.3 Recognition using Data Gloves We used right and left data gloves (CyberGlove) and 6DOF position sensors (Polhemus) as input devices to perform HMM based gesture spotting. Part of the system is based on the Hidden Markov Model Toolkit (HTK) 6] As observable features of the HMMs, we are ....
K. M. Knill and S. J. Young, "Speaker Dependent Keyword Spotting for Accessing Stored Speech," Cambridge University Engineering Dept., Tech. Report, No. CUED/F-INFENT/TR 193, 1994.
....approach is considered as giving the best results [23, 24] Another common approach is based on the use of keyword and filler models. These latest represent the non keyword intervals of the utterance [23] Models can be sub word keyword models like phonetic models or can be whole word models. [25] gives a very interesting overview of re scoring methods applied to these two kinds of models: sub word models are shown to yield a higher hit rate. An original approach based on a HMM based acoustic decoder combines a multikeyword spotter and a RNN prosodic model [26] Prosodic information ....
K. M. Knill and Steve J. Young, "Speaker dependent keyword spotting for accessing stored speech," Tech. Rep., CUED/FINFENG /TR-193, Cambridge University Engineering Department, 1994.
....user is able to interact with the system without keeping the start and end of gestures in mind. HMM based pattern spotting is done by placing keywords modeled by HMMs and filler models in parallel in a loop. This way, the keyword can be recognized while rejecting non keywords through filler models[17]. Recognition using Data Gloves We use right and left data gloves (CyberGlove) and 6 DOF position sensors (Polhemus) as input devices to perform HMM based gesture spotting. Part of the system is based on the Hidden Markov Model Toolkit (HTK) 18] As observable features of the HMMs, we are using ....
K. M. Knill and S. J. Young: "Speaker Dependent Keyword Spotting for Accessing Stored Speech," Cambridge University Engineering Dept., Tech. Report, No. CUED/F-INFENT/TR 193, 1994.
....the user is able to interact with the system without keeping start and end of gestures in mind. HMM based pattern spotting is done by placing keywords modeled by HMMs and filler models in parallel in a loop. This way, the keyword can be recognized while rejecting non keywords through filler models[6]. 4.3 Recognition using Data Gloves We used right and left data gloves (CyberGlove) and 6DOF position sensors (Polhemus) as input devices to perform HMM based gesture spotting. Part of the system is based on the Hidden Markov Model Toolkit (HTK) 7] As observable features of the HMMs, we are ....
K. M. Knill and S. J. Young, "Speaker Dependent Keyword Spotting for Accessing Stored Speech," Cambridge University Engineering Dept., Tech. Report, No. CUED/F-INFENT/TR 193, 1994.
....into the system consisting of a total number of topics and a set of well chosen characteristic words describing the topics of interest. Tools required for matching the spoken words with the audio stream of the news program already exist and can be referred to under the name wordspotting algorithms [3, 6, 9, 10, 13, 14, 17]. Generally speaking, these algorithms are capable of recognizing the appearance of user specified key words in a continuous speech. Increasing quality of wordspotting algorithms can be noticed when investigating the research in this area over the past years. This quality is especially related to ....
....the same segments in the decoded audio stream. These audio segments are stored separately and used in the subsequent indexing procedure, based on wordspotting. 4. 2 News segment identification and report forming by wordspotting Although being aware of more complex and robust approaches (e.g. [6, 9, 10, 13, 14]) we used a simple algorithm [18] to demonstrate the applicability of the word spotting procedure for news segment identification. The main reasons for such a choice were the availability of the software, and the simplicity of implementation and usage. The algorithm is based on template matching ....
Knill K.M., Young S.J.: Speaker Dependent Keyword Spotting for Accessing Stored Speech, Technical Report TR- 193, Cambridge University, Engineering Department, 1994
....number of con dence calculation techniques uses a posteriori recognition probability scores. However, depending on the setup of a recogniser such a posteriori scores can often only be approximated yielding slightly di erent actual implementations of con dence measures. The algorithms proposed in [56, 76, 22] di er only in the approximations of the calculation of the a posteriori based con dence scores for keyword spotting. In all these approaches the con dence score of having spotted a keyword is given by an approximation of the a posteriori probability of the keyword. Additionally, in [22] these ....
....Firstly, the numerator of equation 9.3 is computed using a forced alignment in which the sequence of phone models is xed by the known transcription. Secondly, the denominator is determined using an unconstrained phone loop. This is the same arrangement as commonly used in word spotting, see also [56]. One diculty in equation 9.3 is that if a mispronunciation has occurred, the alignments of the phone loop will di er from the alignment in the forced alignment. Hence, the denominator score is determined by simply summing the log likelihood per frame over the duration of the segment O. In ....
K.M. Knill and S.J. Young. Speaker Dependent Keyword Spotting for Accessing Stored Speech. Technical Report CUED/F-INFENG/TR 193, Cambridge University Engineering Department, Cambridge, U.K., Oct 1994.
....if we can detect important keywords such as touchdown or fumble in the audio stream, then we can use it as a coarse filter to locate candidates for important events. Keyword spotting is an important application of speech recognition, and it has attracted a growing research interest lately [7, 8, 11]. Currently we use a simple, template matching based approach to spotting keywords. We are aware of the more sophisticated and robust algorithms to the problem [7, 8, 11] but for the preliminary implementation, we chose a simpler approach, mainly because our current application is different from ....
....Keyword spotting is an important application of speech recognition, and it has attracted a growing research interest lately [7, 8, 11] Currently we use a simple, template matching based approach to spotting keywords. We are aware of the more sophisticated and robust algorithms to the problem [7, 8, 11], but for the preliminary implementation, we chose a simpler approach, mainly because our current application is different from traditional keyword spotting in the following aspects. ffl In our system, audio processing is used as a pre processing for video analysis and, consequently, false ....
K. M. Knill and S. J. Young. Speaker dependent keyword spotting for accessing stored speech. Technical Report TR-193, Cambridge University Engineering Department, Oct. 1994.
.... are equally likely (P (p) P (q) and also that the sum in the denominator can be approximated by its maximum, the above equation rewrites as: GOP (p) log P (Ojp) max q2Q P (Ojq) 2) Such a score is based on keyword spotting and confidence measure techniques, see for instance [Knill and Young 1994] and [Young 1994] This GOP score can therefore be found using two recognition passes of a sentence, the first uses forced alignment to transcriptions determined from a pronunciation dictionary, thus calculating P (Ojp) The second consists of a monophone loop permitting recognition of all ....
Knill, K.M., and S.J. Young. 1994. Speaker Dependent Keyword Spotting for Accessing Stored Speech. Technical Report CUED/F-INFENG/TR 193. Cambridge, U.K.: Cambridge University, Oct.
....filler models and the sub word units in the keyword models. Putative keyword hits were re scored by dividing the maximum log likelihood keyword score by the average filler model score over the same time frames. This has been shown to yield a better keyword ranking than other proposed schemes [8]. Performance is assessed in terms of the percentage of false alarms per ith false alarm, i.e. hits ith FA = P K j=1 (H i;j =K j ) K Theta 100 (5) where K is the total number of true hits of all keywords, K j the number of true hits of the jth keyword, and H ij is the number of true hits ....
Knill, K. M. and Young, S.J. Speaker Dependent Keyword Spotting for Accessing Stored Speech, Cambridge University Engineering Dept., Tech. Report No. CUED/F-INFENG/TR 193, 1994. Available by anonymous ftp from svr-ftp.eng.cam.ac.uk.
....SYSTEM Two recognition passes are run in the standard wordspotting system. In the first, the keyword and filler models are run together to determine putative keyword hits. The filler models are also applied separately to allow the filler scores to be used to normalise the keyword scores [6]. Figure 1 shows the basic word spotting system structure. A concatenated string of phone HMMs is used to represent the keyword, the keyword phone string. The full set of phone HMMs is used in parallel as filler models to represent nonkeyword speech. The filler only recognition is effectively a ....
....were approximated by a constant log likelihood value of 500.0. Putative keyword hits were re scored by dividing the maximum log likelihood keyword score by the average filler model score over the same time frames. This has been shown to yield a better keyword ranking than other proposed schemes [6]. Results are averaged over the 15 speaker set. The results are presented for acoustic hits, i.e. where the phone sequence matches that of the keyword. All training and testing used version 1.5 of the HTK HMM toolkit [12] with suitable extensions to perform word spotting and implement the various ....
Knill, K. M. and Young, S.J. Speaker Dependent Keyword Spotting for Accessing Stored Speech, Cambridge University Engineering Dept., Tech. Report No. CUED/F-INFENG/TR 193, 1994.
....on Speech, Science and Technology, Perth, Dec 1994 3 keyword hits are re scored in this work by dividing the maximum log likelihood keyword score by the average filler model score over the same time frames. This has been shown to improve performance more than other proposed re scoring schemes (Knill Young, 1994). In the baseline system, the keyword model is built by linking the monophone models according to the keyword s pronunciation rules given by a phonetic dictionary (based on the Advanced Oxford Learner s Dictionary (1) Table 1 shows the performance of this baseline system, where for the ith ....
Knill, K.M. & Young, S.J. (1994) Speaker Dependent Keyword Spotting for Accessing Stored Speech, Cambridge University Engineering Dept., Tech. Report No. CUED/F-INFENG/TR 193.
No context found.
K. M. Knill and S. J. Young: "Speaker Dependent Keyword Spotting for Accessing Stored Speech," Cambridge University Engineering Dept., Tech. Report, No. CUED/F-INFENT/TR 193, 1994.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC