Results 1  10
of
141
SRILM  An extensible language modeling toolkit
 IN PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING (ICSLP 2002
, 2002
"... SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation ..."
Abstract

Cited by 1218 (21 self)
 Add to MetaCart
(Show Context)
SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on Ngram statistics, as well as several related tasks, such as statistical tagging and manipulation of Nbest lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Dynamic Bayesian Networks: Representation, Inference and Learning
, 2002
"... Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have bee ..."
Abstract

Cited by 770 (3 self)
 Add to MetaCart
Modelling sequential data is important in many areas of science and engineering. Hidden Markov models (HMMs) and Kalman filter models (KFMs) are popular for this because they are simple and flexible. For example, HMMs have been used for speech recognition and biosequence analysis, and KFMs have been used for problems ranging from tracking planes and missiles to predicting the economy. However, HMMs
and KFMs are limited in their “expressive power”. Dynamic Bayesian Networks (DBNs) generalize HMMs by allowing the state space to be represented in factored form, instead of as a single discrete random variable. DBNs generalize KFMs by allowing arbitrary probability distributions, not just (unimodal) linearGaussian. In this thesis, I will discuss how to represent many different kinds of models as DBNs, how to perform exact and approximate inference in DBNs, and how to learn DBN models from sequential data.
In particular, the main novel technical contributions of this thesis are as follows: a way of representing
Hierarchical HMMs as DBNs, which enables inference to be done in O(T) time instead of O(T 3), where T is the length of the sequence; an exact smoothing algorithm that takes O(log T) space instead of O(T); a simple way of using the junction tree algorithm for online inference in DBNs; new complexity bounds on exact online inference in DBNs; a new deterministic approximate inference algorithm called factored frontier; an analysis of the relationship between the BK algorithm and loopy belief propagation; a way of
applying RaoBlackwellised particle filtering to DBNs in general, and the SLAM (simultaneous localization
and mapping) problem in particular; a way of extending the structural EM algorithm to DBNs; and a variety of different applications of DBNs. However, perhaps the main value of the thesis is its catholic presentation of the field of sequential data modelling.
A Probabilistic Model of Lexical and Syntactic Access and Disambiguation
 COGNITIVE SCIENCE
, 1995
"... The problems of access  retrieving linguistic structure from some mental grammar  and disambiguation  choosing among these structures to correctly parse ambiguous linguistic input  are fundamental to language understanding. The literature abounds with psychological results on lexical access, ..."
Abstract

Cited by 207 (12 self)
 Add to MetaCart
The problems of access  retrieving linguistic structure from some mental grammar  and disambiguation  choosing among these structures to correctly parse ambiguous linguistic input  are fundamental to language understanding. The literature abounds with psychological results on lexical access, the access of idioms, syntactic rule access, parsing preferences, syntactic disambiguation, and the processing of gardenpath sentences. Unfortunately, it has been difficult to combine models which account for these results to build a general, uniform model of access and disambiguation at the lexical, idiomatic, and syntactic levels. For example psycholinguistic theories of lexical access and idiom access and parsing theories of syntactic rule access have almost no commonality in methodology or coverage of psycholinguistic data. This paper presents a single probabilistic algorithm which models both the access and disambiguation of linguistic knowledge. The algorithm is based on a parallel parser which ranks constructions for access, and interpretations for disambiguation, by their conditional probability. Lowranked constructions and interpretations are pruned through beamsearch; this pruning accounts, among other things, for the gardenpath effect. I show that this motivated probabilistic treatment accounts for a wide variety of psycholinguistic results, arguing for a more uniform representation of linguistic knowledge and for the use of probabilisticallyenriched grammars and interpreters as models of human knowledge of and processing of language.
Learning Hidden Markov Model Structure for Information Extraction
 in Proc. AAAI’99 Workshop Machine Learning for Information Extraction
, 1999
"... Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hidden Markov models for information extraction tasks, specifically focusing on how to learn model structure f ..."
Abstract

Cited by 201 (10 self)
 Add to MetaCart
(Show Context)
Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hidden Markov models for information extraction tasks, specifically focusing on how to learn model structure from data and how to make the best use of labeled and unlabeled data. We show that a manuallyconstructed model that contains multiple states per extraction field outperforms a model with one state per field, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of distantlylabeled data to set model parameters provides a significant improvement in extraction accuracy. Our models are applied to the task of extracting important fields from the headers of computer science research papers, and achieve an extraction accuracy of 92.9%.
An efficient, probabilistically sound algorithm for segmentation and word discovery
 MACHINE LEARNING
, 1999
"... This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstract ..."
Abstract

Cited by 201 (2 self)
 Add to MetaCart
This paper presents a modelbased, unsupervised algorithm for recovering word boundaries in a naturallanguage text from which they have been deleted. The algorithm is derived from a probability model of the source that generated the text. The fundamental structure of the model is specified abstractly so that the detailed component models of phonology, wordorder, and word frequency can be replaced in a modular fashion. The model yields a languageindependent, prior probability distribution on all possible sequences of all possible words over a given alphabet, based on the assumption that the input was generated by concatenating words from a fixed but unknown lexicon. The model is unusual in that it treats the generation of a complete corpus, regardless of length, as a single event in the probability space. Accordingly, the algorithm does not estimate a probability distribution on words; instead, it attempts to calculate the prior probabilities of various word sequences that could underlie the observed text. Experiments on phonemic transcripts of spontaneous speech by parents to young children suggest that our algorithm is more effective than other proposed algorithms, at least when utterance boundaries are given and the text includes a substantial number of short utterances.
Inducing probabilistic grammars by bayesian model merging
 In: Int. Conf. Grammatical Inference. URL: citeseer.nj.nec.com/stolcke94inducing.html
, 1994
"... We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding adhoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact repr ..."
Abstract

Cited by 156 (0 self)
 Add to MetaCart
We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are incorporated by adding adhoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a tradeoff between a close fit to the data and a default preference for simpler models (‘Occam’s Razor’). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, classbasedgrams, and stochastic contextfree grammars. 1
Syntax without Natural Selection: How compositionality emerges from vocabulary in a population of learners
 In
, 1998
"... this paper I put forward a new approach to understanding the origins of some of the key ingredients in a syntactic system. I show, using a computational model, that compositional syntax is an inevitable outcome of the dynamics of observationally learned communication systems. In a simulated populati ..."
Abstract

Cited by 119 (13 self)
 Add to MetaCart
this paper I put forward a new approach to understanding the origins of some of the key ingredients in a syntactic system. I show, using a computational model, that compositional syntax is an inevitable outcome of the dynamics of observationally learned communication systems. In a simulated population of individuals, language develops from a simple idiosyncratic vocabulary with little expressive power, to a compositional system with high expressivity, nouns and verbs, and word order expressing meaning distinctions.
Action Recognition using Probabilistic Parsing
 IEEE CVPR’98
, 1998
"... A new approach to the recognition of temporal behaviors and activities is presented. The fundamental idea, inspired by work in speech recognition, is to divide the inference problem into two levels. The lower level is performed using standard independent probabilistic temporal event detectors such a ..."
Abstract

Cited by 104 (6 self)
 Add to MetaCart
(Show Context)
A new approach to the recognition of temporal behaviors and activities is presented. The fundamental idea, inspired by work in speech recognition, is to divide the inference problem into two levels. The lower level is performed using standard independent probabilistic temporal event detectors such as hidden Markov models (HMMs) to propose candidate detections of low level temporal features. The outputs of these detectors provide the input stream for a stochastic contextfree grammar parsing mechanism. The grammar and parser provide longer range temporal constraints, disambiguate uncertain low level detections, and allow the inclusion of a priori knowledge about the structure of temporal events in a given domain. To achieve such a system we provide techniques for generating a discrete symbol stream from continuous low level detectors, for enforcing temporal exclusion constraints during parsing, and for generating a control method for low level feature application based upon the current parsing state. We demonstrate the approach in several experiments using both visual and other sensing data.
Unsupervised Language Acquisition
, 1996
"... Children are exposed to speech and other environmental evidence, from which they learn language. How do they do this? More specifically, how do children map from complex, physical signals to grammars that enable them to generate and interpret new utterances from their language? This thesis presents ..."
Abstract

Cited by 102 (0 self)
 Add to MetaCart
Children are exposed to speech and other environmental evidence, from which they learn language. How do they do this? More specifically, how do children map from complex, physical signals to grammars that enable them to generate and interpret new utterances from their language? This thesis presents a computational theory of unsupervised language acquisition. By computational we mean that the theory precisely defines procedures for learning language, procedures that have been implemented and tested in the form of computer programs. By unsupervised we mean that the theory explains how language learning can take place with no explicit help from a teacher, but only exposure to ordinary spoken or written utterances. The theory requires very little of the learning environment. For example, it predicts that much knowledge of language can be acquired even in situations where the learner has no access to the meaning of utterances. In this way the theory is extremely conservative, making few or no assumptions that are not obviously true of the situation children learn in. The theory is based heavily on concepts borrowed from machine learning and statistical estimation. In particular, learning takes place by fitting a stochastic, generative model of language to the evidence.