| Silberztein M. Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX. Masson, Paris, 1993. |
....form, strings are normalized to remove casesensitivity and diacritics, and decompounded (N c form) Decompounding is the most effective normalization [3] The importance of inflected form substitutions is shown by the two last entries in Table 6. Root forms were obtained using the INTEX system [12]. A relative error reduction of over 20 is obtained by replacing inflected forms by their root forms. Normalization err N b form (baseline) 13.6 N c (N b no comp. ci, no diac. 12.7 N b root forms 10.3 N c root forms 9.6 Table 5: Word error rates as a function of different text ....
M. Silberztein, Dictionnaires electroniques et analyse automatique de textes : le syst`eme INTEX, Masson, 1993.
....(1994) in HPSG, Van der Linden (1993) in CG. These proposals discuss ways in which MWLs should be treated in so called high level grammar formalisms. Within the finite state approach to NL parsing, it has been proposed to represent MWLs by local grammars (e.g. Roche (1993) Maurel (1993) and Silberztein (1993)) Local grammars as discussed in these contributions are finite state networks that are matched against (parts of) a NL string. Informally speaking, if the string corresponds to one traversal of the network, it is recognised by the local grammar represented by that network. In the proposal made ....
....:sous POSS :bonnet ; prendre qch sous son bonnet to make something one s concern) the inserted NP and adverbial constituents are external to the MWL and as such should be handled by the general syntax. The general syntax of an NP can then again be a local grammar as for instance proposed in Silberztein (1993) where all linguistic phenomena are treated as local grammars. But it need not to be used in this way. In fact, contrary to the Parisian philosophy, we intend to investigate the integration of local FS grammars with other more powerful grammar formalisms. The local grammar rules we write cover at ....
Max Silberztein. 1993. Dictionnaires 'electroniques et analyse automatique de textes -- Le syst`eme INTEX. Masson, Paris, France.
....this paper, we will present a number of applications of finite state technology to other language engineering problems, and describe part of the regular expression calculus that we have developed that make these applications possible. Although large efforts have been made to build local grammars (Silberztein 1993) without the help of a finite state calculus, the expressive power of a well designed calculus makes it possible to create modular rule sets and lexical descriptions that are easy to update and maintain, accelerating the production of diverse engineering applications. Our paper is structured in ....
Silberztein, Max. 1993. Dictionnaires 'electroniques et analyse automatique de textes. Le syst`eme INTEX. Masson, Paris, 1993.
....can be combined to produce different versions of normalized texts. Eleven such combinations are given in Table 1 using the normalizations listed above. Only the baseline normalizations are used to produce the reference text V 0 . We use two large French dictionnaries: BDLEX [12] and DELAF [13] to produce V 1 and V 2 texts. A more detailed description of the normalizations can be found in [1] While any normalization results in a reduction of information, the amount of information loss varies for the different types of normalizations. It is straightforward to recover a V 0 text (or an ....
M. Silberztein, Dictionnaires electroniques et analyse automatique de textes : le systeme INTEX, Masson, 1993.
....semirings, object oriented programming, lazy evaluation and memoization. Key words: Weighted automata; finite state transducers; rational power series; speech recognition. 1 Introduction Finite state techniques have proven valuable in a variety of natural language processing applications [5 11,14,16,18,19,29,33,34,37,39,40]. However, speech processing imposes requirements that were not met by any existing finite state library. In particular, speech recognition requires a general means for managing uncertainty: all levels of representation, and all mappings between levels, involve alternatives with different ....
M. Silberztein. Dictionnaires 'electroniques et analyse automatique de textes: le syst`eme INTEX. Masson: Paris, France, 1993.
....or the multiple forms of a verb according to person, tense, etc. Inflectional morphology is well handled in many languages, and lemmatizers, which reduce inflected forms to their canonical forms, are available for a large set of languages (for instance for French, Silberztein s INTEX system (Silberztein, 1993) based on LADL s DELAF dictionaries, or Namer s FLEMM lemmatizer (Toussaint et al. 1998) Derivation is used to obtain, e.g. the adjectival form of a noun (noun vesicule yields adjective vesicular) Compounding combines several radicals to obtain complex word forms (e.g. vesicule virus ....
Silberztein, M. (1993). Dictionnaires electroniques et analyse automatique de textes : le syst eme INTEX. Paris: Masson.
....(2) for the human expert in charge of the tagging of reference corpora. It was indeed initially assumed that the reference tagging would be performed on the basis of tags extracted from the reference lexicon. We therefore studied the availability of existing French lexica, such as: INTEXT (LADL) Silberztein 1993), BREFLEX and BDLEX which have both been build in the framework of the GDR PRC CommunicationHommeMachine , lexica resulting from in house efforts of the organizers (the electronic thesaurus extracted from the Tresor de la Langue Francaise (INaLF) and the lexicon of the Ecole Nationale ....
M. Silberztein, Dictionnaires electroniques et analyse automatique de textes - Le systeme INTEX, Masson, Paris, 1993.
....process, but closely related to the morphological analysis. The tokens are not described in detail here. What kind of tokens are needed depends on the finite state network based syntactic analyser for French which has been developed during the last few years [3, 4] 1 . For related work, see [1, 8, 10, 12]. 2 Non deterministic tokenisation As the first step in the analysis, a tokeniser segments the input sentence into tokens. In many applications, it is assumed that at this level of processing there is no ambiguity. Karttunen [6] describes the compilation of unambiguous tokenisers from direct ....
Max Silberztein, Dictionnaires 'electroniques et analyse automatique de textes. Le syst`eme INTEX, Masson, Paris, 1993.
....less than 4 parses, including the correct one. A test on very long sentences from newspaper corpora and a discussion of errors provide more insight into the parser. 1 Introduction We introduce a parser that uses finite state networks from the tokenisation of the text to syntactic analysis. See [1, 12, 14, 16, 18] for related work. Descriptions of presented on the Workshop on Robust Parsing, pages 16 25. Prague, Czech, 1996 other approaches to robust parsing, especially the TOSCA, ANLT, FIDDITCH or PLNLP systems, can be found in [13, 8, 10, 9] 1.1 Tokenisation The tokenisation uses a tokenising ....
Max Silberztein, Dictionnaires 'electroniques et analyse automatique de textes. Le syst`eme INTEX, Masson, Paris, 1993.
No context found.
Silberztein M. Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX. Masson, Paris, 1993.
No context found.
M. Silberztein, Dictionnaires electroniques et analyse automatique de textes: le systeme INTEX, Paris, France, 1993.
No context found.
SILBERZTEIN M. (1993). Dictionnaire electronique et analyse automatique des textes. Le systeme INTEX. Masson, Paris.
No context found.
Silberztein Max, Dictionnaires 'electroniques et analyse automatique de textes- Le syst`eme INTEX, Masson, 1993.
No context found.
Max Silberztein. 1993. Dictionnaires 'electroniques et analyse automatique de textes: le syst`eme INTEX. Masson, Paris, France.
No context found.
Max Silberztein. 1993. Dictionnaires 'electroniques et analyse automatique de textes: le syst`eme INTEX.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC