| L. Ramshaw and M. Marcus. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In ACL Balancing Act Workshop, pages 86-95, 1994. |
....of tagging by providing a reliable anchor or seed around which to tag. 1 Introduction Part of speech tagging is a prerequisite task for many natural language processing applications, among them parsing, word sense disambiguation, machine translation, etc. The Brill Tagger (c.f. 1] 2] 3] [5]) is one of the most widely used tools for assigning parts of speech to words. It is a hybrid of machine learning and statistical methods that is based on transformation based learning. The Brill Tagger has several virtues that we feel recommend it above other taggers. First, the source code is ....
L. Ramshaw and M. Marcus. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In ACL Balancing Act Workshop, pages 86-95, 1994.
....tagger (379 KB vs. 2158 KB) It is also the fastest tagger I have seen reported in the literature (10,800 wps vs. 1200 wps for an HMM tagger) 1.3 Generative Processes vs. Classification Regression The Brill rule acquisition technique can be seen as a kind of regression or classification model [68], related to classification and regression trees (CART) 14] and decision lists [70, 89] Regression techniques can be contrasted, at least heuris5 tically, with generative process models like HMM s. In both cases the goal is to assign a structure to an observed sentence. In a generative process ....
Lance A. Ramshaw and Mitchell P. Marcus. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Balancing Act Workshop, 1994.
.... 36, 50, 61, 62, 81, 82, 84, 116, 117, 118, 129, 143, 144, 148, 200] Tagging [10, 19, 28, 56, 57, 66, 90, 91, 124, 125, 126, 131, 138, 153, 163, 168, 188] HMMs [21, 22, 23, 24, 25, 49, 64, 67, 78, 115, 119, 155, 157, 160, 161] Search [156] The Inside Outside Algorithm [85, 86, 136, 137] Regression [20, 30, 29, 38, 41, 42, 45, 46, 154, 162] Partial Parsing [6, 7, 8, 9, 11, 37, 43, 47, 48, 51, 52, 53, 57, 58, 112, 65, 69, 70, 71, 72, 73, 74, 75, 76, 88, 100, 101, 102, 103, 104, 107, 110, 113, 114, 120, 121, 127, 132, 133, 134, 140, 142, 145, 147, 149, 152, 163, 164, 165, 166, 169, 178, 182, 186, 190, 191, 192, 194, 195, 196, 197] ....
Lance A. Ramshaw. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Balancing Act Workshop, 1994.
....tagger (379 KB vs. 2158 KB) It is also the fastest tagger I have seen reported in the literature (10,800 wps vs. 1200 wps for an HMM tagger) 1.3 Generative Processes vs. Classification Regression The Brill rule acquisition technique can be seen as a kind of regression or classification model [68], related to classification and regression trees (CART) 14] and decision lists [70, 89] Regression techniques can be contrasted, at least heuristically, with generative process models like HMM s. In both cases the goal is to assign a structure to an observed sentence. In a generative process ....
Lance A. Ramshaw and Mitchell P. Marcus. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the ACL Balancing Act Workshop, 1994.
....simple rules instead of thousands of probabilities. Furthermore, the learned rules can be converted into a deterministic finite state transducer. Basically, this method learns a sequence of symbolic rules that characterize important contextual factors and uses them to predict a most likely value [83]. First, unannotated text is passed through the initial state annotator. The initial state annotator can be as simple as assigning random structure or as complex as assigning the output of a manually created annotator. Once text has been passed through the initial state annotator, it is then ....
L. Ramshaw and M. Marcus. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. Proceedings of the ACL Balancing Act Workshop.
....transformation based errordriven learning. Transformation based error driven learning has been applied to a number of natural language problems, including part of speech tagging, prepositional phrase attachment disambiguation, speech generation and syntactic parsing [ Brill, 1992; Brill, 1994; Ramshaw and Marcus, 1994; Roche and Schabes, 1995; Brill and Resnik, 1994; Huang et al. 1994; Brill, 1993a; Brill, 1993b ] Figure 1 illustrates the learning process. First, unannotated text is passed through an initial state annotator. The initial state annotator can range in complexity from assigning random structure ....
Ramshaw, L. and Marcus, M. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In The Balancing Act: Proceedings of the ACL Workshop on Combining Symbolic and Statistical Approaches to Language, New Mexico State University.
....far, one can be certain of having found one of the globally best rules when one reaches candidate rules in the sorted list whose positive score is not greater than the net score of the best rule so far. 5.2. INDEXING STATIC RULE ELEMENTS In earlier work on transformational part of speech tagging (Ramshaw and Marcus, 1994), we noted that it is possible to greatly speed up the learning process by constructing a full, bidirectional index linking each candidate rule to those locations in the corpus at which it applies and each location in the corpus to those candidate rules that apply there. Such an index allows the ....
Ramshaw, Lance A. and Mitchell P. Marcus. 1994. Exploring the statistical derivation of transformational rule sequences for part-of-speech tagging. In Proceedings of the Balancing Act Workshop on Combining Symbolic and Statistical Approaches to Language, Association for Computational Linguistics, pages 86--95.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC