Results 1 - 10
of
26
Regular models of phonological rule systems." Paper presented to
- Oxford University
, 1988
"... This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive re ..."
Abstract
-
Cited by 290 (4 self)
- Add to MetaCart
This paper presents a set of mathematical and computational tools for manipulating and reasoning about regular languages and regular relations and argues that they provide a solid basis for computational phonology. It shows in detail how this framework applies to ordered sets of context-sensitive rewriting rules and also to grammars in Koskenniemi's two-level formalism. This analysis provides a common representation of phonological constraints that supports efficient generation and recognition by a single simple interpreter. 1.
TextTiling: Segmenting text into multi-paragraph subtopic passages
- Computational Linguistics
, 1997
"... TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation t ..."
Abstract
-
Cited by 275 (1 self)
- Add to MetaCart
TextTiling is a technique for subdividing texts into multi-paragraph units that represent passages, or subtopics. The discourse cues for identifying major subtopic shifts are patterns of lexical co-occurrence and distribution. The algorithm is fully implemented and is shown to produce segmentation that corresponds well to human judgments of the subtopic boundaries of 12 texts. Multi-paragraph subtopic segmentation should be useful for many text analysis tasks, including information retrieval and summarization. 1.
Noun Homograph Disambiguation Using Local Context in Large Text Corpora
- University of Waterloo
, 1991
"... This paper describes an accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora. The algorithm checks the context surrounding the target noun against that of previously observed instances and chooses the sense for which the most evidence is found, w ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
This paper describes an accurate, relatively inexpensive method for the disambiguation of noun homographs using large text corpora. The algorithm checks the context surrounding the target noun against that of previously observed instances and chooses the sense for which the most evidence is found, where evidence consists of a set of orthographic, syntactic, and lexical features. Because the sense distinctions made are coarse, the disambiguation can be accomplished without the expense of knowledge bases or inference mechanisms. An implementation of the algorithm is described which, starting with a small set of hand-labeled instances, improves its results automatically via unsupervised training. The approach is compared to other attempts at homograph disambiguation using both machine readable dictionaries and unrestricted text and the use of training instances is determined to be a crucial difference. 1 Introduction Large text corpora and the computational resources to handle them have ...
Two-Level Morphology with Composition
- In Proceedings of the 14 th International Conference on Computational Linguistics (COLING'92
, 1992
"... this paper are the following: (1) Lexical representations tend to be arbitrary. Because it is difficult to write and test two-level systems that map between pairs of radically dissimilar forms, lexical representations in existing two-level analyzers tend to stay close to the surface forms. This is n ..."
Abstract
-
Cited by 68 (7 self)
- Add to MetaCart
this paper are the following: (1) Lexical representations tend to be arbitrary. Because it is difficult to write and test two-level systems that map between pairs of radically dissimilar forms, lexical representations in existing two-level analyzers tend to stay close to the surface forms. This is not a problem for morphologically simple languages like English because, for most words, inflected forms are very similar to the canonical dictionary entry. Except for a small number of irregular verbs and nouns, it is not difficult to create a two-level description for English in which lexical forms coincide with the canonical citation forms found in a dictionary. However, current analyzers for morphologically more complex languages (Finnish and Russian, for example) are not as satisfying in this respect. In these systems, lexical forms typically contain diacritic markers and special symbols; they are not real words in the language. For example, in Finnish the lexical counterpart of otin `I took' might be rendered as
The Proper Treatment of Optimality in Computational Phonology
- Bilkent University
, 1998
"... This paper presents a novel forrealization of optimality theory. Unlike pre- yions treatments of optimality in computational linguistics, starting with Ellison (1994), the new approach does not require any explicit marking and counting of constraint violations. It is based on the notion of "lenient ..."
Abstract
-
Cited by 44 (5 self)
- Add to MetaCart
This paper presents a novel forrealization of optimality theory. Unlike pre- yions treatments of optimality in computational linguistics, starting with Ellison (1994), the new approach does not require any explicit marking and counting of constraint violations. It is based on the notion of "lenient composition", defined as the combination of ordinary composition and priority union. If an underlying form has outputs that can meet a given constraint, lenient composition enforces the constraint; ff none of the output candidates meets the constraint, lenient composition allows all of them. For the sake of greater efficiency, we may "eniently compose" the a. relation and all the constraints into a single finite-state transducer that maps each underlying form directly into its op- timal surface realizations, and vice versa.. Seen from this perspective, optimality theory is surprisingly similar to the two older strains of finite-state phonology: classical rewrite systems and two-level models. In particular, the ranking of optimality constraints corresponds to the ordering of rewrite rules.
Finite-state Constraints
, 1993
"... This paper is a report on the application of finite-state methods to phonological and morphological analysis that has brought about spectacular progress in computational morphology over the last several years. We will review the fundamental theoretical work that underlies this progress and discuss i ..."
Abstract
-
Cited by 38 (8 self)
- Add to MetaCart
This paper is a report on the application of finite-state methods to phonological and morphological analysis that has brought about spectacular progress in computational morphology over the last several years. We will review the fundamental theoretical work that underlies this progress and discuss its relevance for linguistics. The two central problems in morphology are word formation and morphological alternations.
An Object-Oriented Architecture for Text Retrieval
- In Conference Proceedings of RIAO'91, Intelligent Text and Image Handling
, 1991
"... For almost all aspects of information access systems it is still the case that their optimal composition and functionality is hotly debated. Moreover, different application scenarios put different demands on individual components. It is therefore of the essence to be able to quickly build systems th ..."
Abstract
-
Cited by 35 (10 self)
- Add to MetaCart
For almost all aspects of information access systems it is still the case that their optimal composition and functionality is hotly debated. Moreover, different application scenarios put different demands on individual components. It is therefore of the essence to be able to quickly build systems that permit exploration of different designs and implementation strategies. This paper presents a software implementation architecture for text retrieval systems that facilitates (a) functional modularization (b) mix-and-match combination of module implementations and (c) definition of inter-module protocols. We show how an object-oriented approach easily accommodates this type of architecture. The design principles are exemplified by code examples in Common Lisp. Taken together these code examples constitute an operational retrieval system. The design principles and protocols implemented have also been instantiated in a large scale retrieval prototype in our research laboratory. 1 Introductio...
Directed Replacement
, 1996
"... This paper introduces to the finite-state calculus a family of directed replace operators. In contrast ..."
Abstract
-
Cited by 35 (9 self)
- Add to MetaCart
This paper introduces to the finite-state calculus a family of directed replace operators. In contrast
Arabic Morphological Analysis on the Internet
- Proceedings of the International Conference on Multi-Lingual Computing (Arabic
, 1998
"... : [Arabic, Morphology, Finite State] This paper describes a finite-state morphological analyzer of written Modern Standard Arabic words that is available for testing on the Internet at http://www.xrce.xerox.com/research/mltt/arabic. The system consists of the analyzer proper, running on a network s ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
: [Arabic, Morphology, Finite State] This paper describes a finite-state morphological analyzer of written Modern Standard Arabic words that is available for testing on the Internet at http://www.xrce.xerox.com/research/mltt/arabic. The system consists of the analyzer proper, running on a network server, and Java applets that run on the user's machine and render words in standard Arabic orthography both for input and output. An overview of the system is provided, including the history, finite-state technology, dictionary coverage and status. 1 Introduction In 1996, the Xerox Research Centre Europe produced a large morphological analyzer for Modern Standard Arabic, henceforth Arabic (Beesley, 1996). In 1997, the rules were rewritten to more reliably support generation, and a Java user interface was added to allow users to interact with the system via the Internet in standard Arabic orthography. The analyzer-generator is based on dictionaries from an earlier project at ALPNET (Beesley,...

