Results 1 - 10
of
27
Regular expressions for language engineering
- Natural Language Engineering
, 1996
"... Many ofthe processing steps in natural language engineering can be performed using nite state transducers. An optimal way tocreate such transducers is to compile them from regular expressions. This paper is an introduction to the regular expression calculus, extended with certain operators that have ..."
Abstract
-
Cited by 68 (2 self)
- Add to MetaCart
Many ofthe processing steps in natural language engineering can be performed using nite state transducers. An optimal way tocreate such transducers is to compile them from regular expressions. This paper is an introduction to the regular expression calculus, extended with certain operators that have proved very useful in natural language applications ranging from tokenization to light parsing. The examples in the paper illustrate in concrete detail some of these applications. 1
Incremental Finite-State Parsing
- In Proceedings of the Fifth Conference on Applied Natural Language Processing
, 1997
"... This paper describes a new finite-state shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual information available at a given stage. This approa ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
This paper describes a new finite-state shallow parser. It merges constructive and reductionist approaches within a highly modular architecture. Syntactic information is added at the sentence level in an incremental way, depending on the contextual information available at a given stage. This approach overcomes the inefficiency of previous fully reductionist constraintbased systems, while maintaining broad coverage and linguistic granularity. The implementation relies on a sequence of networks built with the replace operator. Given the high level of modularity, the core grammar is easily augmented with corpusspecific sub-grammars. The current system is implemented for French and is being expanded to new languages. 1 Background Previous work in finite-state parsing at sentence level falls into two categories: the constructive approach or the reductionist approach. The origins of the constructive approach go back to the parser developed by Joshi (Joshi, 1996). It is based on a lexical ...
Arabic Morphology Using Only Finite-State Operations
, 1998
"... Finite-state morphology has been successful in the description and computational implementa. tion of a wide variety of natural languages. However, the particular challenges of Arabic, and the limitations of some implementations of finite-state morphology, have led many researchers to believe that fi ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
Finite-state morphology has been successful in the description and computational implementa. tion of a wide variety of natural languages. However, the particular challenges of Arabic, and the limitations of some implementations of finite-state morphology, have led many researchers to believe that finite-state power was not sufficient to handle Arabic and other Semitic morphology. This paper illustrates how the morphotactics and the variation rules of Arabic have been described using only finitestate operations and how this approach has been implemented in a significant morphological analyzer/generator.
Finite-State Non-Concatenative Morphotactics
, 2000
"... Finite-state morphology in the general tradition of the Two-Level and Xerox implementations has proved very successful in the production of robust morphological analyzer-generators, including many large-scale commercial systems. However, it has long been recognized that these implementations have se ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Finite-state morphology in the general tradition of the Two-Level and Xerox implementations has proved very successful in the production of robust morphological analyzer-generators, including many large-scale commercial systems. However, it has long been recognized that these implementations have serious limitations in handling non-concatenative phenomena. We describe a new technique for constructing nitestate transducers that involves reapplying the regular-expression compiler to its own output. Implemented in an algorithm called compilereplace, this technique has proved useful for handling non-concatenative phenomena; and we demonstrate it on Malay full-stem reduplication and Arabic stem interdigitation.
An Efficient Implementation of Phonological Rules using Finite-State Transducers
, 2001
"... Context-dependent phonological rules are used to model the mapping from phonemes to their varied phonetic surface realizations. Others, most notably Kaplan and Kay, have described how to compile general context-dependent phonological rewrite rules into finite-state transducers. Such rules are very p ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Context-dependent phonological rules are used to model the mapping from phonemes to their varied phonetic surface realizations. Others, most notably Kaplan and Kay, have described how to compile general context-dependent phonological rewrite rules into finite-state transducers. Such rules are very powerful, but their compilation is complex and can result in very large nondeterministic automata. In this paper we present a simplified rewrite rule system and a technique to efficiently compile such a system into finite-state transducers.
Transducers from Rewrite Rules with Backreferences
, 1999
"... Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing.
Finite State Transducers with Predicates and Identities
- Grammars
, 2001
"... An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicat ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
An extension to finite state transducers is presented, in which atomic symbols are replaced by arbitrary predicates over symbols. The extension is motivated by applications in natural language processing (but may be more widely applicable) as well as by the observation that transducers with predicates generally have fewer states and fewer transitions. Although the extension is fairly trivial for finite state acceptors, the introduction of predicates is more interesting for transducers. It is shown how various operations on transducers (e.g. composition) can be implemented, as well as how the transducer determinization algorithm can be generalized for predicate-augmented finite state transducers.
Multilingual Finite-State Noun Phrase Extraction
- In Proceedings of the ECAI'96
, 1996
"... . The paper describes a tool for noun phrase mark-up based on finite-state techniques and statistical part-of-speech disambiguation. We illustrate the proceeding by examples from realizations for seven languages (Dutch, English, French, German, Italian, Portuguese, and Spanish). 1 Introduction For ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
. The paper describes a tool for noun phrase mark-up based on finite-state techniques and statistical part-of-speech disambiguation. We illustrate the proceeding by examples from realizations for seven languages (Dutch, English, French, German, Italian, Portuguese, and Spanish). 1 Introduction For the purpose of terminology extraction from technical documents we designed a tool which applies finite-state techniques to mark potential terms, especially noun phrases corresponding to given regular patterns. The paper describes the general architecture of the tool and shows how finite-state transducers representing noun phrase patterns are used for noun phrase mark up for a range of languages. The noun phrase extraction is the continuation of a chain of finite-state tools which include tokenizing, lexicon and guesser construction, and a statistical part-of-speech disambiguator (tagger) which uses a finite-state lexicon and guesser. 2 Architecture The noun-phrase extraction tool consists...
Detecting Emerging Concepts in Textual Data Mining
- In Computational Information Retrieval
, 2001
"... This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecastin ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecasting is another example with numerous applications of both academic and practical interest. In general, trending analysis of textual data can be performed in any domain that involves written records of human endeavors whether scientific or artistic in nature
Light Parsing as Finite State Filtering
- In A. Kornai (Eds.), Extended Finite State Models of Language
, 1996
"... . For a number of language processing tasks, such as information retrieval and information extraction tasks, pertinent information can be extracted from text without doing a full parse of the individual sentences. The most common restriction of the parser is to adopt a non-recursive model of the lan ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
. For a number of language processing tasks, such as information retrieval and information extraction tasks, pertinent information can be extracted from text without doing a full parse of the individual sentences. The most common restriction of the parser is to adopt a non-recursive model of the language treated, which allows an implementation of the parser using efficient finite-state tools at the cost of missing some coverage. These light parsers allow the successive introduction of symbols into the input string wherever specified regular expressions of words and/or part-of-speech tags match. Recent advances in finite-state expression compilation make writing mark-up transducers simpler, leading to quicker implementations of layered finite-state parsers. The resulting parsers are easier to create and maintain. In this article, we describe a light parsing method using recently created finitestate operators. Two applications of this parser are described: grouping adjacent syntactically-related units, and extracting non-adjacent n-ary grammatical relations. A system for evaluating the parser over a large corpus is described.

