Results 1 - 10
of
50
Graphical models and automatic speech recognition
- Mathematical Foundations of Speech and Language Processing
, 2003
"... Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recog ..."
Abstract
-
Cited by 78 (15 self)
- Add to MetaCart
Graphical models provide a promising paradigm to study both existing and novel techniques for automatic speech recognition. This paper first provides a brief overview of graphical models and their uses as statistical models. It is then shown that the statistical assumptions behind many pattern recognition techniques commonly used as part of a speech recognition system can be described by a graph – this includes Gaussian distributions, mixture models, decision trees, factor analysis, principle component analysis, linear discriminant analysis, and hidden Markov models. Moreover, this paper shows that many advanced models for speech recognition and language processing can also be simply described by a graph, including many at the acoustic-, pronunciation-, and language-modeling levels. A number of speech recognition techniques born directly out of the graphical-models paradigm are also surveyed. Additionally, this paper includes a novel graphical analysis regarding why derivative (or delta) features improve hidden Markov model-based speech recognition by improving structural discriminability. It also includes an example where a graph can be used to represent language model smoothing constraints. As will be seen, the space of models describable by a graph is quite large. A thorough exploration of this space should yield techniques that ultimately will supersede the hidden Markov model.
Mining Reference Tables for Automatic Text Segmentation
- IN PROCEEDINGS OF THE TENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING
, 2004
"... Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data wareh ..."
Abstract
-
Cited by 65 (2 self)
- Add to MetaCart
Automatically segmenting unstructured text strings into structured records is necessary for importing the information contained in legacy sources and text collections into a data warehouse for subsequent querying, analysis, mining and integration. In this paper, we mine tables present in data warehouses and relational databases to develop an automatic segmentation system. Thus, we overcome limitations of existing supervised text segmentation approaches, which require comprehensive manually labeled training data. Our segmentation system is robust, accurate, and efficient, and requires no additional manual effort. Thorough evaluation on real datasets demonstrates the robustness and accuracy of our system, with segmentation accuracy exceeding state of the art supervised approaches.
MVA processing of speech features
, 2003
"... In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) ltering to the time sequence in the cepstral domain. We called this technique the MVA post-processin ..."
Abstract
-
Cited by 45 (2 self)
- Add to MetaCart
In this paper, we investigate a technique consisting of mean subtraction, variance normalization and time sequence filtering. Unlike other techniques, it applies auto-regression moving-average (ARMA) ltering to the time sequence in the cepstral domain. We called this technique the MVA post-processing and the speech features with MVA post-processing the MVA features. Overall, compared to raw features without MVA post-processing, MVA features achieve improvements of 45 % on matched tasks and 65 % on mismatched tasks on the Aurora 2.0 noisy speech database, and well above a 50 % improvement on the Aurora 3.0 database. These improvements are comparable to systems with much more complicated techniques even though MVA is relatively simple and requires practically no additional computational cost. In this paper, in addition to describing MVA processing, we also present a novel analysis of the distortion of mel-frequency cepstral coeffiients and the log energy in the presence of different types of noises. The effectiveness of MVA is extensively investigated with respect to several variations: the configurations used to extract raw features, the domains where MVA is applied, the filters that are used, and the orders of the ARMA lters. Specically, it is argued and demonstrated that MVA works better when applied to the zeroth cepstral coefcient than to the log energy, that MVA works better in the cepstral domain, that an ARMA filter is better than either designed FIR filters or data-driven lters, and that a five-tap ARMA filter is sufficient to achieve good performances in a variety
Speech Recognition Using Augmented Conditional Random Fields
"... Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT
Hidden Markov models as a support for diagnosis: Formalization of the problem and synthesis of the solution
- PROCEEDINGS OF THE 25TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, OCTOBER 2006. – PRELIMINARY ARCHITECTURE SPECIFICATION PAGE 94
, 2006
"... In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heur ..."
Abstract
-
Cited by 21 (4 self)
- Add to MetaCart
In modern information infrastructures, diagnosis must be able to assess the status or the extent of the damage of individual components. Traditional one-shot diagnosis is not adequate, but streams of data on component behavior need to be collected and filtered over time as done by some existing heuristics. This paper proposes instead a general framework and a formalism to model such over-time diagnosis scenarios, and to find appropriate solutions. As such, it is very beneficial to system designers to support design choices. Taking advantage of the characteristics of the hidden Markov models formalism, widely used in pattern recognition, the paper proposes a formalization of the diagnosis process, addressing the complete chain constituted by monitored component, deviation detection and state diagnosis. Hidden Markov models are well suited to represent problems where the internal state of a certain entity is not known and can only be inferred from external observations of what this entity emits. Such over-time diagnosis is a first class representative of this category of problems. The accuracy of diagnosis carried out through the proposed formalization is then discussed, as well as how to concretely use it to perform state diagnosis and allow direct comparison of alternative solutions.
Machine Learning Paradigms for Speech Recognition: An Overview
, 2013
"... Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasional ..."
Abstract
-
Cited by 11 (1 self)
- Add to MetaCart
Automatic Speech Recognition (ASR) has historically been a driving force behind many machine learning (ML) techniques, including the ubiquitously used hidden Markov model, discriminative learning, structured sequence learning, Bayesian learning, and adaptive learning. Moreover, ML can and occasionally does use ASR as a large-scale, realistic application to rigorously test the effectiveness of a given technique, and to inspire new problems arising from the inherently sequential and dynamic nature of speech. On the other hand, even though ASR is available commercially for some applications, it is largely an unsolved problem—for almost all applications, the performance of ASR is not on par with human performance. New insight from modern ML methodology shows great promise to advance the state-of-the-art in ASR technology. This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems. The intent is to foster further cross-pollination between the ML and ASR communities than has occurred in the past. The article is organized according to the major ML paradigms that are either popular already or have potential for making significant contributions to ASR technology. The paradigms presented and elaborated in this overview include: generative and discriminative learning; supervised, unsupervised, semi-supervised, and active learning; adaptive and multi-task learning; and Bayesian learning. These learning paradigms are motivated and discussed in the context of ASR technology and applications. We finally present and analyze recent developments of deep learning and learning with sparse representations, focusing on their direct relevance to advancing ASR technology.
Recognizing activities in multiple contexts using transfer learning
- In AAAI AI in Eldercare Symposium
, 2008
"... Activities of daily living are good indicators of the health status of elderly. Therefore, automating the monitoring of these activities is a crucial step in future care giving. However, many models for activity recognition rely on labeled examples of activities for learning the model parameters. Du ..."
Abstract
-
Cited by 11 (0 self)
- Add to MetaCart
Activities of daily living are good indicators of the health status of elderly. Therefore, automating the monitoring of these activities is a crucial step in future care giving. However, many models for activity recognition rely on labeled examples of activities for learning the model parameters. Due to the high variability of different contexts, parameters learned for one context can not automatically be used in another. In this paper, we present a method that allows us to transfer knowledge of activity recognition from one context to the next, a task called transfer learning. We show the effectiveness of our method using real world datasets.
Applications of classifying bidding strategies for the CAT Tournament
- Proceedings of the International Trading Agent Design and Analysis Workshop (TADA 2008
, 2008
"... In the CAT Tournament, specialists facilitate transactions between buyers and sellers with the intention of maximizing profit from commission and other fees. Each specialist must find a well-balanced strategy that allows it to entice buyers and sellers to trade in its market while also retaining the ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
In the CAT Tournament, specialists facilitate transactions between buyers and sellers with the intention of maximizing profit from commission and other fees. Each specialist must find a well-balanced strategy that allows it to entice buyers and sellers to trade in its market while also retaining the buyers and sellers that are currently subscribed to it. Classification techniques can be used to determine the distribution of bidding strategies used by all traders subscribed to a particular specialist. Our experiments showed that Hidden Markov Model classification yielded the best results. The distribution of strategies, along with other competition-related factors, can be used to determine the optimal action in any given game state. Experimental data shows that the GD and ZIP bidding strategies are more volatile than the RE and ZIC strategies, although no traders ever readily switch specialists. An MDP framework for determining optimal actions given an accurate distribution of bidding strategies is proposed as a motivator for future work. 1
Automatic hand trajectory segmentation and phoneme transcription for sign language
- In: Procs. of FGR
"... This paper presents an automatic approach to segment 3-D hand trajectories and transcribe phonemes based on them, as a step towards recognizing American sign lan-guage (ASL). We first apply a segmentation algorithm which detects minimal velocity and maximal change of directional angle to segment the ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
(Show Context)
This paper presents an automatic approach to segment 3-D hand trajectories and transcribe phonemes based on them, as a step towards recognizing American sign lan-guage (ASL). We first apply a segmentation algorithm which detects minimal velocity and maximal change of directional angle to segment the hand motion trajectory of naturally signed sentences. This yields oversegmented trajectories, which are further processed by a trained naı̈ve Bayesian de-tector to identify true segmented points and eliminate false alarms. The above segmentation algorithm yielded 88.5% true segmented points and 11.8 % false alarms on unseen ASL sentence samples. These segmentation results were re-fined by a simple majority voting scheme, and the final seg-ments obtained were used to transcribe phonemes for ASL. This was based on clustering PCA-based features extracted from training sentences. We then trained Hidden Markov Models (HMMs) to recognize the sequence of phonemes in the sentences. On the 25 test sentences containing 157 seg-ments, the average number of errors obtained was 15.6. 1.
Gaze-Contingent Automatic Speech Recognition
, 2006
"... This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movemen ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
This study investigated recognition systems that combine loosely coupled modalities, integrating eye movements in an Automatic Speech Recognition (ASR) system as an exemplar. A probabilistic framework for combining modalities was formalised and applied to the specific case of integrating eye movement and speech. A corpus of a matched eye movement and related spontaneous conversational British English speech for a visual-based, goal-driven task was collected. This corpus enabled the relationship between the modalities to be verified. Robust extraction of visual attention from eye movement data was investigated using Hidden Markov Models and Hidden Semi-Markov Models. Gaze-contingent ASR systems were developed from a research-grade baseline ASR system by redistributing language model probability mass according to the visual attention. The best performing systems maintained the Word Error Rates but showed an increase in the Figure of Merit- a measure of the keyword spotting accuracy and integration success. The core values of this work may be useful for developing robust multimodal decoding system functions.