57 citations found. Retrieving documents...
A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph. D. dissertation, University of California, 1994.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Vision-Based Recognition of Actions using Context - Moore (2000)   (2 citations)  (Correct)

....context free grammars (SCFG) are generally less popular than other grammars in computational linguistics, they are very appropriate for describing multitasked activities. The SCFG formalism has been likened to the HMM, as have non probabilistic context free grammar to nite state grammars [132]. Standard algorithms for implementing probabilistic nite state models, like HMMs, also have generalized versions for SCFGs. Unfortunately, they are computationally more demanding than simpler language models that employ nite state and # gram approaches. More speci cally, parameter estimation ....

....nite state models, like HMMs, also have generalized versions for SCFGs. Unfortunately, they are computationally more demanding than simpler language models that employ nite state and # gram approaches. More speci cally, parameter estimation for SCFG is often intractable in practice [132]. Due to the conditional independence assumptions used by SCFGs, nite state models may perform better when quantifying rst order Markovian contingencies between words. However, Jurafsky and others argue that SCFGs can be applied successfully when using very speci c, semantically oriented ....

[Article contains additional citation context not shown here]

A. Stolcke, \Bayesian Learning of Probabilistic Language Models," Ph.D. Dissertation, University of California at Berkeley, 1994.


Recognizing Multitasked Activities using Stochastic.. - Moore, Essa (2001)   (4 citations)  (Correct)

....(Section 2, a more detailed review appears in [8] parsing (Section 3 and 4) error detection and recovery (Section 5) A brief description of the SCFG representation and parsing is primarily included for completeness of our exposition and to explain our work. Full details are available from [15, 14]. We conclude with a discussion of our experiments in the domain of playing Blackjack (Section 6) 2. Representation using SCFG To characterize multitasked activities, we need models that can identify the regularities associated with complex tasks while also tolerating the dynamics associated ....

....In a SCFG, the probability of the complete derivation of a string is determined by the product of the rule probabilities in the derivation. The notion of context freeness is extended to include probabilistic conditional independence of the expansion of a nonterminal from its surrounding context [14]. Our motivation for using stochastic context free grammar is to aggregate low level events detected so that we can construct higher level models of interaction. See [14] for more on SCFG. Symbol Generation (Event Detection) In our earlier work, we introduced a vision system called VARS that ....

[Article contains additional citation context not shown here]

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph.d., University of California at Berkeley, 1994.


Inferring Attribute Grammars with Structured Data for Natural.. - Starkie   (Correct)

....some methods that have been applied to the inference of other types of grammars are candidate approaches for the inference of attribute grammars. We will classify these techniques into four broad categories as follows. a) Model Merging, where a grammar with small grammar coverage is generalised [3][4] b) Model Splitting, where an overly general grammar is made more specific[5] 6] c) Explanation based learning, where a new phrase is explained (partially parsed) using existing rules [7] 8] and d) Parameter Estimation, where the structure of a grammar is fixed but probabilities are ....

....where the structure of a grammar is fixed but probabilities are modified. In parameter estimation algorithms, zero probability rules can be deleted to make the grammar more specific[9] In practice grammatical inference implementations often combine several of these techniques together [3][5] Model merging algorithms involve the creation of a starting grammar that generates the training data explicitly. Hierarchical structure is then added to the grammar through a process known as chunking. Chunking involves creating new nonterminals that represent commonly occurring sequences ....

[Article contains additional citation context not shown here]

Stolcke, Andreas, 1994, Bayesian Learning of Probabilistic Language Models. Berkely CA: Univeristy of California dissertation.


Programming Spoken Dialogs using Grammatical Inference - Starkie   (Correct)

....toolkit starts from a simple description of the goals of the application and then learns from examples to improve the interface. We will also describe the grammatical inference algorithm in more detail. The performance of the algorithm is then compared to the Bayesian Model Merging Algorithm [2] [3]. Finally the results of a comparison between the Lyrebird algorithm and an experienced human application developer are presented. The results suggest that the use of the grammatical inference can significantly reduce development effort time without sacrificing grammar quality. 1.2 Attribute ....

Stolcke, Andreas, 1994, Bayesian Learning of Probabilistic Language Models. Berkely CA: Univeristy of California dissertation.


The Acquisition of a Unification-Based Generalised Categorial.. - Villavicencio (2002)   (Correct)

....with the conceptually simpler problem of computing the lengths of the hypotheses. MDL is a more practical paradigm than either Identification in the Limit and PAC Learning, and it has been successfully applied in a number of learning systems, like those in [Chen 1996] Osborne and Briscoe 1997] [Stolcke 1994], Briscoe 1999] In this chapter an overview of some relevant work on human language acquisition and on computational learning systems was presented. The UG is a useful concept that provides the core knowledge that the learner needs in order to acquire a particular language. By doing this, the ....

Stolcke, A. Bayesian Learning of Probabilistic Language Models Ph.D. Thesis, University of California, Berkeley, 1994.


Shallow Parsing using Probabilistic Grammatical Inference - Thollard, Clark (2002)   (Correct)

..... 0 Many models exist to estimate the probability of a symbol given a history. An example is the n gram model and all the improvements applied to it: smoothing techniques (see [6] for a recent survey) and variable memory models [20, 16] Another approach would be to infer Hidden Markov Models [21] or probabilistic automata [11, 5, 17, 22] Among these techniques only four infer models in which the size of the dependency is not bound [21, 5, 22] The first one is too computationally expensive to be feasible with our data. The others have been used in the context of natural language ....

.... to it: smoothing techniques (see [6] for a recent survey) and variable memory models [20, 16] Another approach would be to infer Hidden Markov Models [21] or probabilistic automata [11, 5, 17, 22] Among these techniques only four infer models in which the size of the dependency is not bound [21, 5, 22]. The first one is too computationally expensive to be feasible with our data. The others have been used in the context of natural language modeling. We have applied these algo rithms to this task but only the best performing one (namely DDSM [22] will be presented here. Reassuringly, the ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph.D. dissertation, University of California, 1994.


Probabilistic Estimation Of Stochastic Context-Free Grammars.. - Sánchez, Benedí (1999)   (Correct)

....VS algorithm. Keywords: Stochastic Context Free Grammars, Probabilistic Estimation, N Best Solutions. 1 Introduction Stochastic Context Free Grammars (SCFGs) are an appealing alternative for Language Modeling in real tasks of Syntactic Pattern Recognition [2, 11] and Natural Language Processing [17, 5]. The reason for this can be found in the capability of SCFGs to express long distance dependencies and to be potentially more compact than the n gram models. However, the learning of SCFGs is still an important obstacle to their application to language modeling. The most widelyused method for ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California, Berkeley, CA., 1994.


Consistency of Stochastic Context-Free Grammars from.. - Sánchez, Benedí (1997)   (Correct)

.... Stochastic Context Free Grammars (SCFGs) are an important specication formalism and are frequently used in closed domains in Syntactic Pattern Recognition [7, 11] In particular, SCFGs have been widely used to characterize the probabilistic modeling of language in Computational Linguistic [13, 6] and Speech Recognition and Understanding [1, 11] An important aspect for all of these applications is the learning of the models. The goal is to estimate an adequate probabilistic description in terms of a set of training samples. There exist dioeerent methods for solving the problem of ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California, Berkeley, CA., 1994.


Comparison between the Inside-Outside Algorithm and the.. - Benedí, Casacuberta (1996)   (Correct)

....cases is necessary. 1 Introduction In Syntactic Pattern Recognition, Stochastic Context Free Grammars (SCFG) are an adequate alternative to Stochastic Regular Grammars for representing syntactic semantic constraints. Recently, some applications of SCFG have been proposed for Language Modeling [8, 5, 18], Acoustic Phonetic Decoding [11] ADN Sequences Modeling [17] etc. One of the reasons for using these models is their ability to establish long term statistical dependencies among primitives. One problem related to SCFG is the learning of the rules and or the probability distributions ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California, Berkeley, CA., 1994.


DCG Induction using MDL and Parsed Corpora - Osborne (1999)   (Correct)

.... techniques are arguably ad hoc and dicult to understand [MacKay, 1994] By contrast, learning using the Minimum Description Length principle (MDL) is less prone to over tting, less reliant upon smoothing, and as such usually leads to better results than those yielded by MLE (for example, see [Stolcke, 1994,Chen, 1996,de Marcken, 1996,Osborne, 1997] MDL is a model selection technique, and balances the complexity of the model against its degree of t. MLE, on the other hand, simply ts the model to the data, irrespective of model complexity. Unlike parsed corpora, MDL as an learning bias is ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California, Berkley, 1994.


Learning Dialogue Structures From A Corpus - Alexandersson, Reithinger (1997)   (1 citation)  (Correct)

....hierarchy is compiled into a grammar which is, during runtime, processed using a simple top down left to right parsing technique, not consuming words or part of speech tags, but dialogue acts. The research in the field of grammar induction has recently produced a lot of interesting results (c.f. [9, 5]) For the turn level we derive (stochastic) context free grammars for each turn class using the Boogie [10] system. It is a workbench for deriving structures enriched with statistical information (e.g. Hidden Markov Models and Stochastic Context Free Grammars) based on Bayesian model merging. It ....

Andreas Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California at Berkeley, 1994.


Incrementally Learning Parameters of Stochastic.. - Brent Heeringa And   (Correct)

....in a model whereas negative training examples illustrate what we don t want in our model. While we have interest in learning the structure of stochastic context free grammars, our main focus here is on learning parameters. For a thorough overview of techniques for learning structure, see [10] and [3] 3.1 Learning Parameters The Inside Outside algorithm [6, 7] is the standard method for estimating parameters in SCFGs. The algorithm uses the general purpose expectation maximization (EM) procedure to iterate between calculating expectations of the parameters and then maximizing the ....

....there were any regions of parameter space for which one algorithm was signi cantly better. We did this by stochastically sampling from this space. Note that a new corpus was generated for each new set of parameters as they in uence which sentences are generated. The grammar shown in table 2 [10] was used in this manner with 50 different target parameter settings and 500 sentences in O T for each setting. The mean and standard deviation of the log likelihoods for the online algorithm with h = s = 100 (histogram size and learning corpus size respectively) were = 962:58 and = 241:25. ....

Stolcke, A. (1994). Bayesian learning of probabilistic language models. Doctoral dissertation, Division of Computer Science, University of California, Berkeley.


Action Recognition using Probabilistic Parsing - Bobick, Ivanov (1998)   (19 citations)  (Correct)

....events, common to monitoring applications, based on blob interaction primitives. Using combined probabilisticsyntactic approaches to problems of vision is shown in [16] and [7] Examples of probabilistic parsing in speech processing can be found in literature (e.g. 9, 6] etc. In particular, [15] developing probabilistic extensions to Earley contextfree parser is of interest. 3 Probabilistic input formation To recognize the components of the model vocabulary, we train one HMM per atomic gesture. At run time each of these HMMs performs a Viterbi parse ( 10] of the incoming signal and ....

....which is a) consistent with the overall expected structure, b) forms a temporally consistent sequence and c) has an overall maximum probability. To perform probabilistic parsing of such a sequence we use an efficient Earley based context free parsing algorithm extended for probabilistic input ([15]) The structure of the sequence is described by a Stochastic Context Free Grammar (SCFG) which is formed by associating a probability with each production. Analogously to HMMs, forward, backward and inner probabilities are introduced. A forward probability, for instance, is computed as a sum of ....

[Article contains additional citation context not shown here]

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph.d., University of California at Berkeley, 1994.


A Minimum Description Length Approach to Grammar Inference - Grünwald (1994)   (5 citations)  (Correct)

....production rules if this can help reduce the description length. This opens up more ways to travel through the hypothesis space. The workings of Wolff s model are much less efficient however. This seems to be one of the reasons it has not been tried on real natural language texts. Stolcke [12] developed a Bayesian approach to Grammar Inference using an MDL prior. However, he does not take the description length of the parameter sets into account; his notion of description length is only about the structure of the grammar that is to be inferred. On the other hand, he uses (variants of) ....

....grammars are ambiguous at some point. In fact, the first point made above can be seen as a special case of the second point, namely the case when we use the restriction of our algorithm that performs only merges. We are currently studying the EM algorithm and other techniques used by Stolcke [12] in order to try to resolve this problem. 8 Acknowledgements The author wishes to thank his supervisor Prof. Dr. Ir. P.M.B. Vit anyi for many useful remarks on his Master s Thesis. Further thanks are due to Mark Steijvers and Jeroen van Maanen for many useful discussions. ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, ICSI, Berkeley, 1994.


Probabilistic Parsing in Action Recognition - Ivanov, Bobick (1998)   (Correct)

....CFGs, demonstrating the techniques of computing the sequence probability characteristics, familiar from HMMs, such as forward and backward probabilities in the chart parsing framework. An e#cient probabilistic version of Earley parsing algorithm was developed by Stolcke in his dissertation ([20]) The author develops techniques of embedding the probability computation and maximization into the Earley algorithm. He also describes grammar structure learning strategies and the rule probability learning technique, justifying usage of Stochastic Context Free Grammars for natural language ....

....P (S # # L #) 1. 4 Earley Stolcke Parsing Algorithm The method most generally and conveniently used in stochastic parsing is based on an Earley parser ( 11] extended in such a way as to accept probabilities. In parsing stochastic sentences we adopt a slightly modified notation of [20]. The notion of a state is an important part of the Earley parsing algorithm. A state is denoted as: i : X k # #.Y where . is the marker of the current position in the input stream, i is the index of the marker, and k is the starting index of the substring denoted by nonterminal X . ....

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph.d., University of California at Berkeley, 1994.


Refining the Structure of a Stochastic Context-Free Grammar - Bockhorst, Craven (2001)   (Correct)

....in places. The most obvious way of addressing this problem is by introducing a generalization operator. For example, a MERGE operator could be used to combine two nonterminals with the same RHS domain if their probability distributions are sufficiently similar. Note however, as mentioned by Stolcke (1994), that this operator will not introduce any new embedding structure into the grammar. This may not be a problem if such structure is present in the initial grammar. 6 Related Work The approach we present here is related to the grammar induction algorithms of Stolcke (1994) and Chen (1996) These ....

....as mentioned by Stolcke (1994) that this operator will not introduce any new embedding structure into the grammar. This may not be a problem if such structure is present in the initial grammar. 6 Related Work The approach we present here is related to the grammar induction algorithms of Stolcke (1994) and Chen (1996) These works both address the induction of SCFGs from text corpora for use as language models. Both methods incorporate a prior probability distribution over model structures and perform a search through posterior model probability space where Stolcke proceeds specific to general ....

Stolcke, A. 1994. Bayesian Learning of Probabilistic Language Models. Ph.D. Dissertation, University of California, Berkeley.


Grounded Learning of Grammatical Constructions - Chang, Maia (2001)   (Correct)

....also apply to constructions. Generalizations driven in a bottom up fashion by similar or co occurring constructions lead to the reorganization of the set of known constructions (or constructicon) We extend previous work using Bayesian model merging as the basis for both types of generalization (Stolcke 1994) to handle relational structures. Details of each procedure are best illustrated by example. Consider the utterance U 1 = you throw a ball spoken to a child throwing a ball. The situation S consists of entities S e and relations S r ; the latter includes role bindings between pairs of entities, ....

....constructicon on the basis of similarities among and co occurrences of multiple constructions. Figure 5 gives a high level description of this process; we refrain from delving into too much detail here, since these processes are closely related to those described for other generalization problems (Stolcke 1994; Bailey 1997) Reorganize constructicon. Incorporate a new construction Cn into an existing set of constructions C, reorganizing C to consolidate similar and co occurring constructions if necessary: 1. Find potential construction pairs to consolidate. # Merge constructions involving ....

Stolcke, A. 1994. Bayesian Learning of Probabilistic Language Models. Ph.D. Dissertation, University of California, Berkeley.


SRILM - An Extensible Language Modeling Toolkit - Stolcke (2002)   (20 citations)  Self-citation (Stolcke)   (Correct)

No context found.

A. Stolcke, Bayesian Learning of Probabilistic Language Models, PhD thesis, University of California, Berkeley, CA, July 1994.


SRILM - An Extensible Language Modeling Toolkit - Stolcke (2002)   (20 citations)  Self-citation (Stolcke)   (Correct)

....Lattice Toolkit [5] to which SRILM has an interface) provided many good ideas for a viable and efficient API for language models. The decision to explore objectoriented design was based on a prior project, an implementation of various types of statistical grammars in the Common Lisp Object System [6]. The software build system was borrowed from SRI s Decipher TM speech recognition system [7] A first implementation with minimal functionality for standard N gram models was created prior to the 1995 Johns Hopkins Language Modeling Summer Workshop [8] By the end of the workshop, support for ....

A. Stolcke, Bayesian Learning of Probabilistic Language Models, PhD thesis, University of California, Berkeley, CA, July 1994.


Incremental Regular Inference - Pierre Dupont Carnegie   (16 citations)  (Correct)

No context found.

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph. D. dissertation, University of California, 1994.


A Markovian Approach to the Induction of Regular String.. - Callut, Dupont (2004)   (Correct)

No context found.

A. Stolcke. Bayesian Learning of Probabilistic Language Models. Ph. D. dissertation, University of California, 1994.


Learning Hidden Markov Models to Fit Long-Term Dependencies - Callut, Dupont (2005)   (Correct)

No context found.

Stolcke, A. (1994). Bayesian Learning of Probabilistic Language Models. Ph. D. dissertation, University of California.


How the Poverty of the Stimulus Solves the Poverty of the Stimulus - Zuidema   (Correct)

No context found.

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, Dept. of Electrical Engineering and Computer Science, University of California at Berkeley, 1994.


Grammar Model-based Program Evolution - Shan, McKay, Baxter, al.   (Correct)

No context found.

Andreas Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California, Berkeley, CA, 1994.


On-Line Cumulative Learning of Hierarchical Sparse n-Grams - Pfleger (2004)   (Correct)

No context found.

A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, Berkeley, 1994.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC