Results 1 - 10
of
75
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
- Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract
-
Cited by 226 (17 self)
- Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KL-divergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in human-machine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract
-
Cited by 119 (2 self)
- Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, Input-Output HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variable-length Markov models and Markov switching state-space models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
A memory-based approach to learning shallow natural language patterns
, 1998
"... Recognizing shallow linguistic patterns, such as ba-sic syntactic relationships between words, is a com~ mon task in applied natural language and text pro-(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pat-tern structures, often in the h)rm of re ..."
Abstract
-
Cited by 53 (4 self)
- Add to MetaCart
(Show Context)
Recognizing shallow linguistic patterns, such as ba-sic syntactic relationships between words, is a com~ mon task in applied natural language and text pro-(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pat-tern structures, often in the h)rm of regular expres-sions or finite automata. This paper presents a novel memory-based learning method that recognizes shal-low patterns in new text based on a bracketed train-ing corpus. The training data are stored as-is, in efficient suttix-tree data structures. Generalization is performed on-line at recognition time by compar-ing subsequences of the new text to positive and negative evidence in the corIms. This way, no in-formation in tit(; training is lost, as can happen in other learning systems that construct a single gen-eralized model at the time of training. The paper presents experimental results for recognizing noun phrase, subject-verb and verb-object patterns in l!]n-glish. Since the learning approach enables easy port-ing to new domains, we plan to apply it to syntac-tic patterns in other languages and to sub-language patterns for information extraction. 1
LEARNING DETERMINISTIC REGULAR GRAMMARS FROM STOCHASTIC SAMPLES IN POLYNOMIAL TIME
, 1999
"... In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly wi ..."
Abstract
-
Cited by 53 (12 self)
- Add to MetaCart
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.
Efficient Learning of Typical Finite Automata from Random Walks
, 1997
"... This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the ``typical'' labeling of a finite automaton, while retaining a worst-case model for ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an average-case setting to model the ``typical'' labeling of a finite automaton, while retaining a worst-case model for the underlying graph of the automaton, along with (2) a learning model in which the learner is not provided with the means to experiment with the machine, but rather must learn solely by observing the automaton's output behavior on a random input sequence. The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model. We adopt an on-line learning model in which the learner is asked to predict the output of the next state, given the next symbol of the random input sequence; the goal of the learner is to make as few prediction mistakes as possible. Assuming the learner has a means of resetting the target machine to a fixed start state, we first present an efficient algorithm that
Learning Nonsingular Phylogenies and Hidden Markov Models
- Proceedings of the thirty-seventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract
-
Cited by 45 (7 self)
- Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomial-time algorithm for learning nonsingular phylogenies and hidden Markov models.
PAC-learnability of Probabilistic Deterministic Finite State Automata
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2004
"... We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAC-learning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a ..."
Abstract
-
Cited by 35 (8 self)
- Add to MetaCart
We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAC-learning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a bound on the distinguishability between states. With this, we demonstrate that the class of PDFAs is PAC-learnable using a variant of a standard state-merging algorithm and the KullbackLeibler divergence as error function.
Identification in the limit of substitutable context-free languages
- ALT
, 2005
"... This paper formalisms the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free grammars.
We show that there is a polynomial characteristic set, and thus prove polynomial identificatio ..."
Abstract
-
Cited by 28 (12 self)
- Add to MetaCart
(Show Context)
This paper formalisms the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of context-free grammars.
We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. We also discuss modifications to the algorithm
that produces a reduction system rather than a context-free grammar, that will be much more compact. We discuss the relationship to Angluin’s notion of reversibility for regular languages.
Probabilistic Finite-State Machines - Part I
"... Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
Probabilistic finite-state machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finite-state automata with other well known devices that generate strings as hidden Markov models and n-grams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Learning Models for Robot Navigation
, 1998
"... Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. Th ..."
Abstract
-
Cited by 26 (2 self)
- Add to MetaCart
Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information into both the models and the algorithm that learns them. By taking advantage of such information, learning hmms/pomdps can be made better and require fewer iterations, while being robust in the face of data reduction. That is, the performance of our algorithm does not significantly deteriorate as the training sequences provided to it become significantly shorter. Formal proofs for the convergence of the algorithm to a local maximum of the likelihood function are provided. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach....