| L. Karttunen, J.-P. Chanod, G. Grefenstette, A Schiller, Regular Expressions for language Engineering, Journal of Natural Language Engineering vol 2 no 4 (1997) pp 307-330, Cambridge University Press, 1997. |
.... and a parallel implementation of a parsing algorithm is proposed in [HJZH93] Another example of changed architecture is given by a parsing technique called finite state cascade where shallow parsers (finite states transducers) are composed in a pipeline or, more generally, a network fashion (see [KCGS96] These types of linguistic components are very efficient since they are essentially finite state automata, though their single generative power is very poor (i.e. regular languages) A reductionist parser for French which addresses robustness issues is proposed in [CT96] Changed ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering. CUP Journals: Natural Language Engineering, 2(4):305--328, 1996.
....1 Motivation Most recent work on nite state transducers (FSTs) falls into two camps according to how the transducers are constructed. The algebraic camp employs experts who write (possibly weighted) regular expressions by hand, using an ever growing language of powerful algebraic operators [10, 7]. The statistical camp, which prefers to extract expertise automatically from data, builds transducers with much simpler topology so that their arc probabilities can be easily trained (e.g. 17, 12, 11] This paper o ers a clean way to combine the two traditions: an Expectation Maximation ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, , and Anne Schiller. Regular expressions for language engineering. Journal of Natural Language Engineering, 2(4):305-328, 1996.
....corpus that suciently covers the generator s vocabulary and provides recordings of frequently used words and phrases in the appropriate prosodic context. 2. 8 WFSTs for Speech Processing Many areas of language and speech processing have adopted the weighted nitestate transducer (WFST) formalism [46, 77, 89, 59], because it supports a complete representation of regular relations and provides ecient mechanisms for performing 27 various operations on them. A nite state transducer encodes a regular relation, i.e. a set of pairs of strings. The elements in each pair correspond to the transducer s input and ....
L. Karttunen et al. Regular expressions for language engineering. CUP Journals: Natural Language Engineering, 4:305-328, 1996.
....1 Motivation Most recent work on nite state transducers (FSTs) falls into two camps according to how the transducers are constructed. The algebraic camp employs experts who write (possibly weighted) regular expressions by hand, using an ever growing language of powerful algebraic operators [10, 7]. The statistical camp, which prefers to extract expertise automatically from data, builds transducers with much simpler topology so that their arc probabilities can be easily trained (e.g. 17, 12, 11] 1 This paper o ers a clean way to combine the two traditions: an Expectation Maximation ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, , and Anne Schiller. Regular expressions for language engineering. Journal of Natural Language Engineering, 2(4):305-328, 1996.
....1 Motivation Most recent work on finite state transducers (FSTs) falls into two camps according to how the transducers are constructed. The algebraic camp employs experts who write (possibly weighted) regular expressions by hand, using an ever growing language of powerful algebraic operators [10, 7]. The statistical camp, which prefers to extract expertise automatically from data, builds transducers with much simpler topology so that their arc probabilities can be easily trained (e.g. 17, 12, 11] 1 This paper o#ers a clean way to combine the two traditions: an Expectation Maximation ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, , and Anne Schiller. Regular expressions for language engineering. Journal of Natural Language Engineering, 2(4):305--328, 1996.
....to a weighted transducer. A weighted transducer permits to increase the eciency of parsing by the addition of heavy weights on the most common paths. Our aim here is not to increase the eciency of parsing, but to be able to say that, in a certain context, an expression is no more valid. 8 See [8] [13] for problems caused by multiword expressions 8 The analysis functions by success failure: the analyser looks rst for the core expression, then for its negative contexts. When an expression is found with the positive feature being active, it has been validly recognised. The analysis fails ....
L. Karttunen, J.-P. Chanod, G. Grefenstette and A. Schiller, Regular expressions for language engineering, Natural Language Engineering, 2(4) (1996) 305-328.
....singular, definite) target(say) Figure 2. Example of an input sentence and its corresponding output. To implement the finite state grammar we have applied several operations on regular expressions and relations, among them composition and replacement, using the Xerox Finite State Tool (XFST) [Karttunen et al. 1997]. We use both ordinary composition and the lenient composition operator [Karttunen 1998] This operator allows the application of different eliminating constraints to a sentence, always with the certainty that when some constraint eliminates all the interpretations, then the constraint is not ....
Karttunen L., Chanod J-P., Grefenstette G., Schiller A. 1997. Regular Expressions For Language Engineering. Natural Language Engineering.
....many practical applications. It can accelerate the processing of input because no time is spent on failing paths, and allows analyzing and manipulating separately the different parts of an FST (or of the described relation) In Natural Language Processing where FSTs are used for many basic steps (Karttunen et al. 1996; Mohri, 1997) the latter advantage concerns, e.g. finite state based shallow parsers (Koskenniemi, Tapanainen, and Voutilainen, 1992; At Mokhtar and Chanod, 1997) Their ambiguity could be analyzed more easily in factorized form. 1.1 Conventions Every FST has one initial state, labeled with ....
Karttunen, Lauri, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. 1996. Regular expressions for language engineering. Natural Language Engineering, 2(4):305--328.
....No additional information or special algorithm, that could decelerate the processing of input, is required at runtime. Intermediate alphabet reduction can be beneficial for many practical applications that use FST cascades. In Natural Language Processing, FSTs are used for many basic steps (Karttunen et al. 1996; Mohri, 1997) such as phonological (Kaplan Kay, 1994) and morphological analysis (Koskenniemi, 1983) part of speech disambiguation (Roche Schabes, 1995; Kempe, 1997; Kempe, 1998) spelling correction (Oflazer, 1996) and shallow parsing (Koskenniemi et al. 1992; At Mokhtar Chanod, 1997) ....
KARTTUNEN L., CHANOD J.-P., GREFENSTETTE G. & SCHILLER A. (1996). Regular expressions for language engineering. Natural Language Engineering, 2(4), 305--328.
....pattern database (dictionary problem) is blindingly fast linear with respect to the length of searching word as with other finite state approaches. 1 Introduction There is a need to store empirical language data in almost all areas on natural language engineering (LE) Finite state methods [21,13,16,17,10] have found their revival in the last decade. The theory of finite state automata (FSA) and transducers (FST) is a well developed part of theoretical computer science (for an overview, see e.g. 6,2] As the finite state machines (FSM) needed tend to grow with increased demand for quality of ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular Expressions for Language Engineering. Natural Language Engineering, 2(4):305--328, 1996.
....are used by the following transducers on the cascade. This strategy permits to use intermediate annotations which help at a certain stage of the analysis but can be later removed for the final result. The transducers identifying the different kind of clusters are compiled from regular expressions [Karttunen et al. 1997] 2 . These regular expressions have a simple syntax but can create very complex transducers covering complex syntactic phenomena. This is one of the advantages of the finite state approach for shallow parsing. 3 The Spanish Parser As mentioned above, the IFSP approach leads to a partial ....
Karttunen, L., Chanod, J., Grefenstette, G., and Schiller, A. (1997). Regular Expressions for Language Engineering. Natural Language Engineering, 1:1--23. 8
....The main interest in these tasks lies in detecting the constituent structures and sometimes their syntactic functions in a robust and fast way. In this study, our aim is to develop a parser for Swedish part of speech tagged texts, based on finite state techniques using the Xerox Finite State Tool (Karttunen et al., 1997). Finite state techniques have been shown to be very useful for parsing unrestricted texts for several languages, such as English, Finnish, French, German, Swedish, etc. Under certain circumstances, these parsers are robust, fast and accurate. There are mainly three approaches that have been ....
....the different modules. Acknowledgements We would like to thank the Department of Linguistics, Uppsala university, for giving us the opportunity to participate in the course Automata theory , and especially Torbjrn Lager who first introduced us to the XFST during this course. Footnotes 1 See Karttunen et al. (1997) for a good description of the XFST operators. 2 Ord is defined as a string of accepted characters in the natural language that forms a word. 3 Note that neither the maximal projection of the NPs ( vattenledningar och avloppsrr ) nor the PP consisting of a preposition and infinitive verb ....
Karttunen, L., Chanod, J. P., Grefenstette, G., & Schiller, A. 1997. Regular Expressions for Language Engineering. Natural Language Engineering 2, 305--238, Cambridge University Press.
....rules provide a modular, declarative and flexible workbench to deal with the resulting chart. Currently we use the Xerox Finite State Tool (XFST, http: www.rxrc.xerox.com research mltt fst home.html) which has as its main characteristic a rich set of operations, like the replacement operator [Karttunen et al. 1997], defined in terms of simpler regular expressions (or relations) so that the combined expressions always belong to the finite state calculus and can, therefore, be implemented using a finite state automaton (transducer) Among the finite state operators used we apply composition, intersection and ....
Karttunen L., Chanod J-P., Grefenstette G., Schiller A. 1997. Regular Expressions For Language Engineering. Natural Language Engineering.
....A tool is needed that will allow the definition of complex linguistic error patterns over the chart. For that reason, we view the chart as an automaton to which finite state constraints can be applied encoded in the form of automata and transducers (we use the Xerox Finite State Tool, XFST, (Karttunen et al. 1997)) Finitestate rules provide a modular, declarative and flexible workbench to deal with the resulting chart. Among the finite state operators used, we apply composition, intersection and union of regular expressions and relations. chart (automaton) No Error Error Type(s) Finite state parser ....
....error is defined (an NP consisting of a month in ergative or absolutive case followed by an inflected number) and named Error Type5. Second, a transducer (Mark Error Type 5) is defined which surrounds the incorrect pattern (represented by 1 For more information on XFST regular expressions, see (Karttunen et al. 1997). by two error tags (BEGINERRORTYPE5 and ENDERRORTYPE5) To further restrict the application of the rule, left and right contexts for the error can be defined (in a notation reminiscent of two level morphology) mostly to assure that the rule is only applied to dates, thus preventing the ....
Karttunen L., Chanod J-P., Grefenstette G., Schiller A. (1997) Regular Expressions For Language Engineering. Journal of Natural Language Engineering.
.... a parallel implementation of a parsing algorithm is proposed in [HJZH93] Another example of changed architecture is given by a parsing technique called finite state cascade where shallow parsers (finite states transducers) are composed in a pipeline or, more generally, a network fashion 8 (see [KCGS96] These types of linguistic components are very efficient since they are essentially finitestate automata, though their single generative power is very poor (i.e. regular languages) A reductionist parser for French which addresses robustness issues is proposed in [CT96] Changed ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering. CUP Journals: Natural Language Engineering, 2(4):305-- 328, 1996.
.... linguistics in general or in recent approaches to building morphological analyzers (e.g. Koskenniemi, 1983] Antworth, 1990] Karttunen et al. 1992] Karttunen, 1994] and the operation of state of the art finite state tools (e.g. Karttunen, 1993] Karttunen and Beesley, 1992] [Karttunen et al. 1996]) in particular, the generation of the morphological analyzer component has to be accomplished almost semi automatically. The user must be guided through a knowledge elicitation procedure for the knowledge required for the morphological analyzer. This is accomplished using the elicitation ....
....a series of regular expressions for describing the morphological lexicon and morphographemic rules. The morphographemic rules describing changes in spelling as a result of affixation operations, are induced from the ex 3 We currently use XRCE finite state tools as our target environment [Karttunen et al. 1996]. 4 Also independently elicited from either the human informant or compiled from any on line resources for the language in question. amples provided, by using transformation based learning [Brill, 1995, Satta and Henderson, 1997] The result is an ordered set of contextual replace or rewrite ....
[Article contains additional citation context not shown here]
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering. Natural Language Engineering, 2(4):305--328, 1996.
....However, when solving real problems most researchers use software supporting high level descriptions of automata, automatic compilation and optimisation, and debugging facilities. Packages for two level morphology, such as PCKIMMO (Antworth 1990) are well known examples. As demonstrated in Karttunen, Chanod, Grefenstette and Schiller (1997), an even more flexible use of finite state technology can be obtained by using a calculus of regular expressions. A 1 See www.let.rug.nl gosse tt for a preliminary version of the text and links to the exercises described here. 4 Gosse Bouma high level description language suited for language ....
Karttunen, L., Chanod, J., Grefenstette, G. and Schiller, A.(1997), Regular expressions for language engineering, Natural Lanuage Engineering pp. 1--24.
....previous word is a determiner. Weighted automata or transducers are automata or transducers in which each transition has a weight as well as input output labels. There already exist a variety of well known finite state packages, e.g. FSA6 [ van Noord, 1998 ] Finite State Tool from Xerox Parc [ Kartunen et al. 1996 ] or AT T FSM Tools [ Mohri et 3 0 1 DET:DET 2 N:N V:N Figure 1: A simple FST al. 1996 ] However only few of them provide algorithms for WFSA s and even less support operations for WFST s. Algorithms on WFSA s have strong similarities with their better known unweighted counterparts, but ....
L. Kartunen, J-P. Chanod, G. Grefenstette, and A. Schiller. Regular expressions for language engineering. Natural Language Engineering, 2:2:305--328, 1996.
....(nucleus) and X[a] unparsed) In addition, a.N overparses that is, it freely inserts empty onset 0 [ nucleus I [ and coda D [ brackets. For the sake of concretehess, we give here an explicit definition of o.N using the notation of the Xerox regular expression calculus (Karttunen. et al. [15]) We define (EN as the com position of four simple components, Input, Paxse, 0verPaxse and SyllableStructure. The definitions of the fu st three components are shown in Figure 4. For more discussion of these issues, see Karttunen [17] The Yokuts case is problematic for Optimality theory (Cole ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. 1996. Regular expressions for language engineering. Journal of Natural Language Engineering, 2(4):305-328.
.... February 29, 1969 , which would qualify if we were allowed 12 Lauri Karttunen to take into account only the first three digits, 196 being a leap year in the Gregorian calendar. The construction of the WeakDayDates constraint is not trivial but not as difficult as it might initially seem. See [13] for details. Having constructed the auxiliary constraint languages we can define the language of valid dates as ValidDates = AllDates MaxDaysInMonth LeapDays WeekDayDates The network contains 805 states, 6472 arcs, and about 7 million date expressions. We could now construct a parser that ....
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. Regular expressions for language engineering. Journal of Natural Language Engineering, 2(4):305--328, 1996.
No context found.
L. Karttunen, J.-P. Chanod, G. Grefenstette, A Schiller, Regular Expressions for language Engineering, Journal of Natural Language Engineering vol 2 no 4 (1997) pp 307-330, Cambridge University Press, 1997.
No context found.
L.Karttunen, J-P.Chanod, G.Grefenstette, and A.Schiller. Regular expressions for language engineering. http://www.xrce.xerox.com/research/mltt/fst/articles/jnle-97/rele.html, Fri Aug 13 13:07:42 GMT DST 1999, January 1997. 39 A x-vario The script for x-vario: NUMORTH=$DATDIR/numOrth.txt SORTED=$DATDIR/sortedvariantsLexDB.results
No context found.
Karttunen, L., Chanod, J.P., Grefenstette, G., Schille, A., Regular expressions for language engineering, Natural Language Engineering 2 (4) 305-328, Cambridge University Press, 1996
No context found.
Karttunen, L., Chanod, J.-P., Grefenstette, G. and Schiller, A.(1997), Regular expressions for language engineering.
No context found.
Lauri Karttunen, Jean-Pierre Chanod, Gregory Grefenstette, and Anne Schiller. 1996. Regular expressions for language engineering. Natural Language Engineering, 2(4):305-329.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC