| T.L. Booth and R.A. Thomson. Applying probability measures to abstract languages. IEEE Transaction on Computers C, 22:442--450, 1973. |
....and unit production matrices #L and #U , respectively, are required to provide the corrections in the prediction and completion stages. Booth and Thompson provide a rigorous proof that guarantees the existence of these matrices if the grammar is well behaved according to axioms provided by [27]. For further clarity, Stolcke also provides a simple example in Table D.3. 6.4.4 Viterbi Parse Motivated by the use of the Viterbi parsing in the HMM, we can also apply a generalization of the Viterbi method for parsing a string # to retrieve the most likely probability among For more ....
T.L. Booth and R.A. Thompson, \Applying probability measures to abstract languages, " IEEE Transactions on Computers, Vol. 22, pp. 442-450, 1973.
....necessarily generate a stochastic langnage. This is illustrated in the following exalnple. 2 Example 1.1 Consider the stochastic grammar G with nonterminal set Vv S , terminal set Vr = a . The productions with their probabilities are given by: s z 1 . q Following the technique presented in [2] we find that the production generating function is given by 9(s) qs 1 q, and that. the first molnent matrix E is given by [2q] We can conclude that the grammar is consistent if and only ifq 1 2. For details we refer to [5] Notice that all the different trees of string o have the ....
....defined analoguous to the distribu[ion langnage and stochastic language of an unres[ric;cd 3 Consistency In this section consistency of weakly restricted stochastic grmnnmrs will be considered. The theory of nmltiLyl)e branching processes will bc need to come Lo a similar theorem as is given [n [2] for unres[ric[ed stochastic gramnars. Definition 3.1 ]br ghe j lb occurrence of hal Ai VN the production generating fnncbion wectkl rcslricted slochaslic grammars is dcfiucd where r. k) is I if nonterminal occurrence A . appears in the riflhl hand sidc of the k lb producgion rule with ....
[Article contains additional citation context not shown here]
T.L. Booth, R.A. Thompson. Applying Probability Measures to Abstract Languages. In: IEIfE Transactions on ComputersVoh C-22, No. 5, May 197,3.
....Valencia Camino de Vera, s n. 46071 Valencia (Spain) e mail: jandreu,jbenedi dsic.upv.es Abstract An important problem related to the probabilistic estimation of Stochastic ContextFree Grammars (SCFGs) is guaranteeing the consistency of the estimated model. This problem was considered in [3, 14] and studied in [10, 4] for unambiguous SCFGs only, when the probabilistic distributions were estimated by the relative frequencies in a training sample. In this work, we extend this result by proving that the property of consistency is guaranteed for all SCFGs without restrictions, when the ....
....A fundamental question which is related to these PE algorithms is to guarantee whether or not the learned SCFG generates a probabilistic language, that is to say if this SCFG is consistent. Moreover, the consistency of a SCFG determines the validity of various interesting probabilistic properties [3, 14]. For unambiguous SCFGs, it was proven in [10, 4] that when the probabilistic distributions are estimated by the relative frequencies in the sample, the obtained SCFG is consistent. In this work, this result is generalized by proving that the property of consistency is satised for SCFGs without ....
[Article contains additional citation context not shown here]
T.L. Booth and R.A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442450, May 1973.
....study, two languages were chosen: the palindrome language with three terminal symbols (PAL3) and the arithmetic expression language with 5 terminal symbols (EXP) A SCFG was created for each language and was used only for generating a training sample. These grammars were consistent according to [3]. Each training sample had 5000 strings, but only 630 of them were different for PAL3, and 896 were different for EXP. For the training process, an initial characteristic grammar was created for each language. The number of non terminal (n) symbols was chosen heuristically as is described in ....
....EXP) Each grammar had the maximumnumber of rules that could be created with the chosen number of non terminal symbols and the given number of terminal symbols (v) that is, n n Delta v . The probabilities of the rules were attached randomly, but guaranteeing that the grammar was proper [3]. In order to avoid the problem of a bad initialization, ten different initializations were used for each task and for each algorithm. With this initial grammar, a reestimation process was carried out with the IO algorithm on the one hand and with the Viterbi algorithm on the other hand. At each ....
T.L. Booth and R.A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442--450, May 1973.
.... number of times rule r is seen in a tree T , then the probability of a tree T can be written as P (T jQ) Y r2R p(r) c(T;r) or equivalently log P (T jQ) X r c(T ; r) log p(r) OE(T ) Delta Q where we define OE(T ) to be an n dimensional vector whose i th component is c(T ; r i ) [Booth and Thompson 1973] give conditions on the weights which ensure that P (T jQ) is a valid probability distribution over the set T , in other words that P T2T P (T jQ) 1, and 8T 2 T , P (T jQ) 0. The main condition is that the parameters define conditional distributions over the alternative ways of rewriting each ....
....trees fT 1 ; T 2 : Tm g. The log likelihood of the training set given parameters Q is L(Q) P j log P (T j jQ) The maximum likelihood estimates are to take Q = arg maxQ2W L(Q) where W is the set of allowable parameter settings (i.e. the parameter settings which obey the constraints in [Booth and Thompson 1973]) It can be proved using constrained optimization techniques (i.e. using Lagrange multipliers) that the maximumlikelihood estimate for the weight of a rule r = ff fi is p(ff fi) P j c(T j ; ff fi) P j c(T j ; ff) here we overload the notation c so that c(ff) is the number of times ....
[Article contains additional citation context not shown here]
Booth, T. L., and Thompson, R. A. 1973. Applying Probability Measures to Abstract Languages. IEEE Transactions on Computers, C-22(5), 442--450.
....to be fruitful, e.g. for the problem of ambiguity resolution. The simple but useful approximation adopted here is to assume the most plausible analysis of a string to be the most probable analysis of that string. An attempt to transfer the techniques of probabilistic context free grammars (see [3]) to CLGs was presented in [7] In this approach the derivation process of CLGs is dened as a stochastic process by the following stochastic model: Each program clause gets assigned an application probability and the probabilities of all clauses dening one predicate have to sum to 1. The ....
....should easily be given a formal basis in terms of our quantitative CLP scheme. 5 This calculation scheme also could easily be captured by our quantitative CLP scheme by replacing min by a product accordingly in the relevant denitions of the declarative and procedural semantics of our scheme. 6 [3] discuss further conditions on consistency of probabilistic grammars which would have to be satised also by a probabilistic CLG model. be incorrect, in the sense that it makes an independency assumption for clause applications which is violated by the languages generated from such probabilistic ....
Taylor L. Booth and Richard A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442450, 1973.
....frequency information with the components making up a grammar formalism. For example, just two of 13 14 the options in the case of CFG are: 1) associating a single probability with each production that determines the probability of its use wherever it is applicable (i.e. Stochastic CFG; SCFG (Booth and Thompson, 1973)) or (2) associating different probabilities with a production depending on the particular nonterminal occurrence (on the RHS of a production) that is being rewritten (Chitrao and Grishman, 1990) In the latter case probabilities depend on the context (within a production) of the nonterminal ....
Booth, T. and Thompson, R. (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442--450.
.... [of his son] man] a man] proud] of his son] a [so tall] man] so tall] a man] a [six feet] tall man] six feet] tall] a six foot tall man] was [every three weeks] fixing] his bike [was frequently fixing] his bike ffl More precisely, F C selection must be in same chunk 131 General [2, 3, 4, 35, 36, 50, 61, 62, 81, 82, 84, 116, 117, 118, 129, 143, 144, 148, 200] Tagging [10, 19, 28, 56, 57, 66, 90, 91, 124, 125, 126, 131, 138, 153, 163, 168, 188] HMMs [21, 22, 23, 24, 25, 49, 64, 67, 78, 115, 119, 155, 157, 160, 161] Search [156] The Inside Outside Algorithm [85, 86, 136, 137] Regression [20, 30, 29, 38, 41, 42, 45, 46, 154, 162] Partial Parsing [6, 7, ....
T.L. Booth and R.A. Thompson. Applying probability measures to abstract languages. IEEE Trans. Comput., C-22:442--450, 1973.
....the general case, innite trees can be included in the sample space: innite labeled trees are labeled trees with an innite node set X. This requires an extension in the denition of the measure (not all subsets of the sample space are measurable) but does not aoeect the probabilities of nite trees. Booth and Thompson (1973) analyzes the conditions under which a probability measure over nite trees is dened. AIMS VOL. 4 NO. 3 1998 37 yields of the ordered daughters of the node. A sentence is a nite sequence of words, i.e. an element of W . We have already dened the event monomial e( of a tree licensed by a ....
Booth, T. L. and R. A. Thompson (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers C-22 (5), 442450.
.... left corner and right corner probabilities, P (X )L w 1 ) and P (Y )R w 2 ) which can each be obtained from a single matrix inversion (Jelinek Lafferty 1991) It should be mentioned that there are some technical conditions that have to be met for a SCFG to be well defined and consistent (Booth Thompson 1973). These condition are also sufficient to guarantee that the linear equations given by (3) have positive probabilities as solutions. The details of this are discussed in the Appendix. Finally, it is interesting to compare the relative ease with which one can solve the substring expectation problem ....
Booth, Taylor L., & Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442--450.
....the general case, innite trees can be included in the sample space: innite labeled trees are labeled trees with an innite node set X. This requires an extension in the denition of the measure (not all subsets of the sample space are measurable) but does not aoeect the probabilities of nite trees. Booth and Thompson (1973) analyzes the conditions under which a probability measure over nite trees is dened. AIMS VOL. 4 NO. 3 1998 37 yields of the ordered daughters of the node. A sentence is a nite sequence of words, i.e. an element of W . We have already dened the event monomial e( of a tree licensed by a ....
Booth, T. L. and R. A. Thompson (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers C-22 (5), 442450.
....language L; OE that cannot be generated by an unrestricted stochastic grammar G c ; D such that L; OE = L(G c ) p u (p u designates here the probability function as defined earlier) We will not give a full proof of this theorem. We only mention the counter example that is used in [2] to proof it: let L = fa i b i ji 2 f0; 1; gg. We know that L can be generated by a context free grammar. The following probability function is associated with L: OE(a i b i ) e Gammaa a i i where a is any (nonzero) real number. The theorem can now be proved by proving ....
....E will satisfy this condition if the magnitude of all the characteristic roots of E are less than one 1: the grammar is consistent. Similarly if one or more of these roots has a magnitude greater than 1 the limit diverges: the grammar is inconsistent. 2 The proof of the theorem is taken from [2]. Note that the theorem does not decide consistency if the largest eigenvalue of the first moment matrix is equal to 1. We will discuss that special case later on. An unrestricted stochastic grammar is called strongly consistent if all the eigenvalues of the E matrix have magnitude less than 1. ....
T.L. Booth, R.A. Thompson. Applying Probability Measures to Abstract Languages. In: IEEE Transactions on Computers Vol. C-22, No. 5 , May 1973.
....a stochastic language. This is illustrated in the following example. 2 Example 1.1 Consider the stochastic grammar G with nonterminal set VN = fSg, terminal set V T = fag. The productions with their probabilities are given by: S q S S S 1 Gammaq a Following the technique presented in [2] we find that the production generating function is given by g 1 (s 1 ) qs 2 1 1 Gamma q, and that the first moment matrix E is given by [2q] We can conclude that the grammar is consistent if and only if q 1=2. For details we refer to [5] Notice that all the different trees of string a ....
....analoguous to the distribution language and stochastic language of an unrestricted grammar. 3 Consistency In this section consistency of weakly restricted stochastic grammars will be considered. The theory of multitype branching processes will be used to come to a similar theorem as is given in [2] for unrestricted stochastic grammars. Definition 3.1 For the j th occurrence of nonterminal A i 2 VN the production generating function for weakly restricted stochastic grammars is defined as: g ij (s 1;1 ; s k;R(Ak ) jCA i j X u=1 p iju k Y m=1 R(Am ) Y n=1 s rmn (u) ....
[Article contains additional citation context not shown here]
T.L. Booth, R.A. Thompson. Applying Probability Measures to Abstract Languages. In: IEEE Transactions on ComputersVol. C-22, No. 5, May 1973.
....to non terminal X and the column corresponding to non terminal Y is the expected number of times X will be replaced by Y in exactly one production rule. As the spectral radius ae(M ) which is the modulus of the largest eigenvalue, is always less then 1 the probabilistic grammar is consistent [4]. That is, the sum over all the sentences generated from this grammar is 1. M = LP C B L T LP C B L T 2 6 6 6 6 6 4 1 Gamma 1 C 1 1 Gamma 1 C 1 0 0 0 0 0 1 0 0 0 0 1 Gamma 1 L 1 1 Gamma 1 L 1 0 0 0 0 0 1 nL P nL i=1 arity i 0 0 0 0 1 Gamma 1 V 1 3 7 7 7 7 7 5 ....
T. L. Booth and R. A. Thompson. Applying probability measures to abstract languages. IEEE Trans. Comput., C-22:442--450, 1973.
....that the score of a derivation is determined by the scores of the subderivations into which the derivation is factored by tabulation. For probabilistic functions, this amounts to a strengthened Markovian condition for derivations, which for instance is satisfied by stochastic context free grammars (Booth Thompson, 1973; Baker, 1979) certain kinds of parsers for constraint based grammars (Briscoe Carroll, 1993) and stochastic tree adjoining grammars (Schabes, 1992) In such cases, the tabular search algorithms 140 Chapter 3: Language Analysis and Understanding can converted into dynamic programming ....
Booth, T. L. and Thompson, R. A. (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442--450.
....multiplication as the inverse of an exponential function cannot be matched by any TAG because of the constant growth property of TAGs (see (Vijay Shanker, 1987) p. 104) An example of such a function is a simple Poisson distribution (2) which in fact was also used as the counterexample in (Booth and Thompson, 1973) for CFGs, since CFGs also have the constant growth property. a n b n c n d n ) 1 e Delta n (2) This shows that probabilistic TAGs, like CFGs, are constrained in the probabilistic languages that they can recognize or learn. As shown above, a probabilistic language can fail to have ....
....the values of M in terms of the grammar probabilities indicates that M ij contains the values we wanted, i.e. expectation of obtaining node A j when node A i is rewritten by adjunction at each level of the TAG derivation process. By construction we have ensured that the following theorem from (Booth and Thompson, 1973) applies to probabilistic TAGs. A formal justification for this claim is given in the next section by showing a reduction of the TAG derivation process to a multitype GaltonWatson branching process (Harris, 1963) Theorem 4.1 A probabilistic grammar is consistent if the spectral radius ae(M) 1, ....
[Article contains additional citation context not shown here]
T. L. Booth and R. A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442--450, May.
....algorithm proposed in his dissertation ( 16] Earley developed a combined top down bottom up approach which is shown to perform at worst at O(N 3 ) for an arbitrary CFG formulation. A simple introduction of probabilistic measures into grammars and parsing was shown by Booth and Thompson ([7]) Thomason ( 33] and others. Aho and Peterson addressed the problem of ill formedness of the input stream. In [2] they described a modified Earley s parsing algorithm where substitution, insertion and deletion errors are corrected. The basic idea is to augment the original grammar by error ....
T. L. Booth and R. A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, c-22(5):442--450, 1973.
....a state, and transition probabilities correspond to class n gram probabilities. 3. 3 Stochastic Context Free Grammars Based on the model merging approach to HMM induction, we have extended the algorithm to apply to stochastic context free grammars (SCFGs) the probabilistic generalization of CFGs (Booth Thompson 1973; Jelinek et al. 1992) A more detailed description of SCFG model merging can be found in Stolcke (1994) Data incorporation. To incorporate a new sample string into a SCFG we can simply add a top level production (for the start nonterminal S) that covers the sample precisely. For example, the ....
BOOTH, TAYLOR L., &RICHARDA. THOMPSON. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442--450.
.... P (S )L x) of x given G) is the sum of the probabilities of all sentence strings having x as a prefix, P (S )L x) X y2 Sigma P (S ) xy) In particular, P (S )L ffl) 1) In the following, we assume that the probabilities in a SCFG are proper and consistent as defined in Booth Thompson (1973), and that the grammar contains no useless nonterminals (ones that can never appear in a derivation) These restrictions ensure that all nonterminals define probability measures over strings, i.e. P (X ) x) is a proper distribution over x for all X. Formal definitions of these conditions are ....
Booth, Taylor L., & Richard A. Thompson. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442--450.
....to the treatment of compound forms. 3 Dealing with probabilities The purpose of this section is to present the extension of the CYK algorithm to the class of SCFG 7 . It is not in the scope of this paper to argue on the status of SCFGs which have been thoroughly investigated for many years [1, 2, 3, 6, 8, 9, 12]. We will use the SCFG paradigm for itself, as, even if it is not a perfect model of natural language, SCFGs are still superior to non probabilistic CFGs [11] The computation of the maximum probability for each of the parse trees is rather straightforward since in the SCFG context the probability ....
T. L. Booth and R. A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442#450, May 1973.
.... P (S )L x) of x given G) is the sum of the probabilities of all sentence strings having x as a prefix, P (S )L x) X y2 Sigma P (S ) xy) In particular, P (S )L ffl) 1) In the following, we assume that the probabilities in a SCFG are proper and consistent as defined in Booth Thompson (1973), and that the grammar contains no useless nonterminals (ones that can never appear in a derivation) These restrictions ensure that all nonterminals define probability measures over strings, i.e. P (X ) x) is a proper distribution over x for all X . Formal definitions of these conditions are ....
....and completion steps. It was shown how these matrices could be obtained as the result of matrix inversions. In this appendix we give a proof that the existence of these inverses is assured if the grammar is well defined in the following three senses. The terminology used here is taken from Booth Thompson (1973). Definition 6.7 For a SCFG G over an alphabet Sigma, with start symbol S, we say that 14 a) G is proper iff for all nonterminals X the rule probabilities sum to unity, i.e. X : X )2G P (X ) 1 : b) G is consistent iff it defines a probability distribution over finite strings, i.e. X ....
[Article contains additional citation context not shown here]
BOOTH, TAYLOR L., & RICHARD A. THOMPSON. 1973. Applying probability measures to abstract languages.
.... with rational transductions [9, 23] The notion of weighted rational transduction arises from the combination of two ideas in automata theory: rational transductions, used in many aspects of formal language theory [2] and weighted languages and automata, developed in pattern recognition [4, 15] and algebraic automata theory [3, 5, 8] Ordinary (unweighted) rational transductions have been successfully applied by researchers at Xerox PARC [7] and at the University of Paris 7 [13, 14, 20, 21] among others, to several problems in language processing, including morphological analysis, ....
....a closed semiring is one in which collecting over infinite sets is well defined. Finally, some particular cases arising in the discussion below can be shown to be well defined for the plus times semiring under certain mild conditions on the weights assigned to strings or automata transitions [4, 8]. 2.2 Weighted Transductions and Languages In the transduction cascade (1) each stage corresponds to a mapping from input output pairs (r; s) to probabilities P (sjr) More formally, stages in the cascade will be weighted transductions T : Sigma Theta Gamma K where Sigma and ....
Taylor R. Booth and Richard A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22(5):442--450, May 1973.
.... P (S )L x) of x given G) is the sum of the probabilities of all sentence strings having x as a prefix, P (S )L x) X y2 Sigma P (S ) xy) In particular, P (S )L ffl) 1) In the following, we assume that the probabilities in a SCFG are proper and consistent as defined in Booth and Thompson (1973), and that the grammar contains no useless nonterminals (ones that can never appear in a derivation) These restrictions ensure that all nonterminals define probability measures over strings, i.e. P (X ) x) is a proper distribution over x for all X . Formal definitions of these conditions are ....
....and completion steps. It was shown how these matrices could be obtained as the result of matrix inversions. In this appendix we give a proof that the existence of these inverses is assured if the grammar is well defined in the following three senses. The terminology used here is taken from Booth and Thompson (1973). Definition 9 For a SCFG G over an alphabet Sigma, with start symbol S, we say that 20 a) G is proper iff for all nonterminals X the rule probabilities sum to unity, i.e. X : X )2G P (X ) 1 : b) G is consistent iff it defines a probability distribution over finite strings, i.e. X ....
[Article contains additional citation context not shown here]
Booth, Taylor L., and Thompson, Richard A. (1973). "Applying probability measures to abstract languages". IEEE Transactions on Computers, C-22(5):442--450.
No context found.
T.L. Booth and R.A. Thomson. Applying probability measures to abstract languages. IEEE Transaction on Computers C, 22:442--450, 1973.
No context found.
T. L. Booth and R. A. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, 22(5):442-450, 1973.
No context found.
Taylor L. Booth and Richard A. Thompson, "Applying proba- bility measures to abstract languages," IEEE Transactions on Computers, vol. 22, pp. 442-450, 1973.
No context found.
T. Booth and R. Thompson. 1973. "Applying Probability Measures to Abstract Languages". In IEEE Transactions on Computers, 2(5).
No context found.
T. Booth and R. Thompson. 1973. "Applying Probability Measures to Abstract Languages". In IEEE Transactions on Computers, 22(5).
No context found.
BOOTH,TAYLOR L., & RICHARD A. THOMPSON. 1973. Applying probability measures to abstract languages. IEEE Transactions on Computers C-22.442--450.
No context found.
Booth, T. L. and R. A. Thompson (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers C-22 (5), 442450.
No context found.
Booth, T. L. and R. A. Thompson (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers C-22 (5), 442450.
No context found.
Booth, T. L. and R. A. Thompson (1973). Applying probability measures to abstract languages. IEEE Transactions on Computers C-22 (5), 442450.
No context found.
Taylor Booth and Richard Thompson. 1973. Applying probability measures to abstract languages.
No context found.
Taylor L. Booth & Richard A. Thompson[May 1973], "Applying Probability Measures to Abstract Languages," IEEE Transactions on Computers C-22, 442--450.
No context found.
T. Booth and R. Thompson. Applying probability measures to abstract languages. IEEE Transactions on Computers, C-22:442--450, 1973.
No context found.
Taylor L. Booth & Richard A. Thompson[May 1973], "Applying Probability Measures to Abstract Languages," IEEE Transactions on Computers C-22, 442--450.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC