Results 1  10
of
75
The Power of Amnesia: Learning Probabilistic Automata with Variable Memory Length
 Machine Learning
, 1996
"... . We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions gene ..."
Abstract

Cited by 226 (17 self)
 Add to MetaCart
(Show Context)
. We propose and analyze a distribution learning algorithm for variable memory length Markov processes. These processes can be described by a subclass of probabilistic finite automata which we name Probabilistic Suffix Automata (PSA). Though hardness results are known for learning distributions generated by general probabilistic automata, we prove that the algorithm we present can efficiently learn distributions generated by PSAs. In particular, we show that for any target PSA, the KLdivergence between the distribution generated by the target and the distribution generated by the hypothesis the learning algorithm outputs, can be made small with high confidence in polynomial time and sample complexity. The learning algorithm is motivated by applications in humanmachine interaction. Here we present two applications of the algorithm. In the first one we apply the algorithm in order to construct a model of the English language, and use this model to correct corrupted text. In the second ...
Markovian Models for Sequential Data
, 1996
"... Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We firs ..."
Abstract

Cited by 119 (2 self)
 Add to MetaCart
Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many machine learning applications, especially for speech recognition. Furthermore, in the last few years, many new and promising probabilistic models related to HMMs have been proposed. We first summarize the basics of HMMs, and then review several recent related learning algorithms and extensions of HMMs, including in particular hybrids of HMMs with artificial neural networks, InputOutput HMMs (which are conditional HMMs using neural networks to compute probabilities), weighted transducers, variablelength Markov models and Markov switching statespace models. Finally, we discuss some of the challenges of future research in this very active area. 1 Introduction Hidden Markov Models (HMMs) are statistical models of sequential data that have been used successfully in many applications in artificial intelligence, pattern recognition, speech recognition, and modeling of biological ...
A memorybased approach to learning shallow natural language patterns
, 1998
"... Recognizing shallow linguistic patterns, such as basic syntactic relationships between words, is a com~ mon task in applied natural language and text pro(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pattern structures, often in the h)rm of re ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
Recognizing shallow linguistic patterns, such as basic syntactic relationships between words, is a com~ mon task in applied natural language and text pro(:essing. Tile common practice for approaching this task is by tedious manual definition of possible pattern structures, often in the h)rm of regular expressions or finite automata. This paper presents a novel memorybased learning method that recognizes shallow patterns in new text based on a bracketed training corpus. The training data are stored asis, in efficient suttixtree data structures. Generalization is performed online at recognition time by comparing subsequences of the new text to positive and negative evidence in the corIms. This way, no information in tit(; training is lost, as can happen in other learning systems that construct a single generalized model at the time of training. The paper presents experimental results for recognizing noun phrase, subjectverb and verbobject patterns in l!]nglish. Since the learning approach enables easy porting to new domains, we plan to apply it to syntactic patterns in other languages and to sublanguage patterns for information extraction. 1
LEARNING DETERMINISTIC REGULAR GRAMMARS FROM STOCHASTIC SAMPLES IN POLYNOMIAL TIME
, 1999
"... In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly wi ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
In this paper, the identification of stochastic regular languages is addressed. For this purpose, we propose a class of algorithms which allow for the identification of the structure of the minimal stochastic automaton generating the language. It is shown that the time needed grows only linearly with the size of the sample set and a measure of the complexity of the task is provided. Experimentally, our implementation proves very fast for application purposes.
Efficient Learning of Typical Finite Automata from Random Walks
, 1997
"... This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for ..."
Abstract

Cited by 49 (9 self)
 Add to MetaCart
This paper describes new and efficient algorithms for learning deterministic finite automata. Our approach is primarily distinguished by two features: (1) the adoption of an averagecase setting to model the ``typical'' labeling of a finite automaton, while retaining a worstcase model for the underlying graph of the automaton, along with (2) a learning model in which the learner is not provided with the means to experiment with the machine, but rather must learn solely by observing the automaton's output behavior on a random input sequence. The main contribution of this paper is in presenting the first efficient algorithms for learning nontrivial classes of automata in an entirely passive learning model. We adopt an online learning model in which the learner is asked to predict the output of the next state, given the next symbol of the random input sequence; the goal of the learner is to make as few prediction mistakes as possible. Assuming the learner has a means of resetting the target machine to a fixed start state, we first present an efficient algorithm that
Learning Nonsingular Phylogenies and Hidden Markov Models
 Proceedings of the thirtyseventh annual ACM Symposium on Theory of computing, Baltimore (STOC05
, 2005
"... In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov m ..."
Abstract

Cited by 45 (7 self)
 Add to MetaCart
In this paper, we study the problem of learning phylogenies and hidden Markov models. We call the Markov model nonsingular if all transtion matrices have determinants bounded away from 0 (and 1). We highlight the role of the nonsingularity condition for the learning problem. Learning hidden Markov models without the nonsingularity condition is at least as hard as learning parity with noise. On the other hand, we give a polynomialtime algorithm for learning nonsingular phylogenies and hidden Markov models.
PAClearnability of Probabilistic Deterministic Finite State Automata
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2004
"... We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAClearning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a ..."
Abstract

Cited by 35 (8 self)
 Add to MetaCart
We study the learnability of Probabilistic Deterministic Finite State Automata under a modified PAClearning criterion. We argue that it is necessary to add additional parameters to the sample complexity polynomial, namely a bound on the expected length of strings generated from any state, and a bound on the distinguishability between states. With this, we demonstrate that the class of PDFAs is PAClearnable using a variant of a standard statemerging algorithm and the KullbackLeibler divergence as error function.
Identification in the limit of substitutable contextfree languages
 ALT
, 2005
"... This paper formalisms the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of contextfree grammars.
We show that there is a polynomial characteristic set, and thus prove polynomial identificatio ..."
Abstract

Cited by 28 (12 self)
 Add to MetaCart
(Show Context)
This paper formalisms the idea of substitutability introduced by Zellig Harris in the 1950s and makes it the basis for a learning algorithm from positive data only for a subclass of contextfree grammars.
We show that there is a polynomial characteristic set, and thus prove polynomial identification in the limit of this class. We discuss the relationship of this class of languages to other common classes discussed in grammatical inference. We also discuss modifications to the algorithm
that produces a reduction system rather than a contextfree grammar, that will be much more compact. We discuss the relationship to Angluin’s notion of reversibility for regular languages.
Probabilistic FiniteState Machines  Part I
"... Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translatio ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Probabilistic finitestate machines are used today in a variety of areas in pattern recognition, or in fields to which pattern recognition is linked: computational linguistics, machine learning, time series analysis, circuit testing, computational biology, speech recognition and machine translation are some of them. In part I of this paper we survey these generative objects and study their definitions and properties. In part II, we will study the relation of probabilistic finitestate automata with other well known devices that generate strings as hidden Markov models and ngrams, and provide theorems, algorithms and properties that represent a current state of the art of these objects.
Learning Models for Robot Navigation
, 1998
"... Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. Th ..."
Abstract

Cited by 26 (2 self)
 Add to MetaCart
Hidden Markov models (hmms) and partially observable Markov decision processes (pomdps) provide a useful tool for modeling dynamical systems. They are particularly useful for representing environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information into both the models and the algorithm that learns them. By taking advantage of such information, learning hmms/pomdps can be made better and require fewer iterations, while being robust in the face of data reduction. That is, the performance of our algorithm does not significantly deteriorate as the training sequences provided to it become significantly shorter. Formal proofs for the convergence of the algorithm to a local maximum of the likelihood function are provided. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach....