Results 1 - 10
of
11
Unsupervised Learning of Derivational Morphology From Inflectional Lexicons
- UNIVERSITY OF MARYLAND
, 1999
"... We present in this paper an unsupervised method to learn suffixes and suffixation operations from an inflectional lexicon of a language. The elements acquired with our method are used to build stemming procedures and can assist lexicographers in the development of new lexical resources. ..."
Abstract
-
Cited by 39 (0 self)
- Add to MetaCart
We present in this paper an unsupervised method to learn suffixes and suffixation operations from an inflectional lexicon of a language. The elements acquired with our method are used to build stemming procedures and can assist lexicographers in the development of new lexical resources.
Guessing Morphology from Terms and Corpora
- Proceedings of SIGIR 97
"... This study proposes an algorithm for automatically acquiring morphological links between words. This algorithm relies on the concurrent use of a corpus and a list of multi-word terms, and does not require any prior linguistic knowledge. The four steps of the algorithm are (1) single-word truncation, ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
This study proposes an algorithm for automatically acquiring morphological links between words. This algorithm relies on the concurrent use of a corpus and a list of multi-word terms, and does not require any prior linguistic knowledge. The four steps of the algorithm are (1) single-word truncation, (2) conflation of multi-word terms, (3) classification and filtering, and (4) clustering of conflation classes. At each step a precise evaluation is performed in order to chose the optimal parameters. The final results indicate a clustering of 45% of the classes with a precision of 87%. The derivational knowledge acquired through this method can be used for conceiving a domain-oriented stemmer for scientific and technical corpora. In Proceedings, 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'97), Philadelphia, PA. 27-31 July 1997. 1
NLP for Term Variant Extraction: Synergy between Morphology, Lexicon, and Syntax
, 1999
"... . We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to t ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
. We present a natural language processing (NLP) approach to automatic indexing over controlled vocabulary which accounts for term variation. The approach combines a part of speech tagger, a generator of morphologically related forms, and a shallow transformational parser. The system is applied to the French language; it is trained on newspaper articles and tested on scientific literature. Precision rate of indexing on term and variants is 97.2%. It is only slightly lower than indexing without accounting for term variation (99.7%). Recall rate of indexing on term and variants (93.4%) is much higher than recall of indexing on term occurrences only (72.4%). Conflation of term variants increases indexing coverage up to 30%. The system is a convincing example of the potential synergy between full-fledged morphological analysis and local syntactic analysis. Many details are provided on the implementation of the system. Illustrative examples of syntactic transformations for the French language are given together with the theoretical and empirical methods for their formulation. 2 CHRISTIAN JACQUEMIN AND EVELYNE TZOUKERMANN 1.
Effective Use of Natural Language Processing Techniques for Automatic Conflation of Multi-Word Terms: The Role of Derivational Morphology, Part of Speech Tagging, and Shallow Parsing
- In Research and Development in Information Retrieval
"... We present a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a full-fledged derivational morphological system, combined with a shallow parser. The system has been applied to French. The unique contribution of the research is in using these linguistically based ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
We present a corpus-based system to expand multi-word index terms using a part-of-speech tagger and a full-fledged derivational morphological system, combined with a shallow parser. The system has been applied to French. The unique contribution of the research is in using these linguistically based tools with safety filters in order to avoid the problems of degradation typically associated with derivational analysis and generation. The successful expansion and thus conflation of terms, increases indexing coverage up to 30% with precision of nearly 90% for correct identification of related terms. The fully implemented system is described with particular attention on the role of derivational morphology and phrasal relations. Results and evaluation are presented in terms of precision and recall, with an analysis and discussion of errors. This paper illustrates how natural language processing tools, when combined effectively for tasks to which they are especially suited, indicates the pote...
Automatic language-specific stemming in information retrieval
- In Cross-language information retrieval and evaluation: Proceedings of the CLEF 2000 workshop
, 2001
"... Abstract. We employ Automorphology, an MDL-based algorithm that determines the suffixes present in a language-sample with no prior knowledge of the language in question, and describe our experiments on the usefulness of this approach for Information Retrieval, employing this stemmer in a SMARTbased ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
Abstract. We employ Automorphology, an MDL-based algorithm that determines the suffixes present in a language-sample with no prior knowledge of the language in question, and describe our experiments on the usefulness of this approach for Information Retrieval, employing this stemmer in a SMARTbased IR engine. 1
STRUCTURES AND DISTRIBUTIONS IN MORPHOLOGY LEARNING
, 2008
"... One of the great challenges in linguistics and cognitive science is to understand the nature of the mental representation of language. The precise mechanisms of the mind are unknown, but can be modeled through observation and experimentation. By viewing the mind as a computational device that receiv ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
One of the great challenges in linguistics and cognitive science is to understand the nature of the mental representation of language. The precise mechanisms of the mind are unknown, but can be modeled through observation and experimentation. By viewing the mind as a computational device that receives input (primary linguistic data) and produces output (the development of grammatical speech) during language acquisition, one can reason about what representations and algorithms must be internal to the learner. In this thesis, I investigate the acquisition of morphology. The principal challenges are how to learn a theory in the presence of sparse data, and in a manner that can provide explanations for the developmental processes in child language acquisition. The main idea underlying this work is that a consideration of the different aspects of language acquisition places strong constraints on cognitively plausible representations and algorithms that are internal to the learner. To develop a model of morphology acquisition, I pursue three lines of work: iv First, I formulate a cognitively-oriented computational framework for studying language acquisition that consists of four components: the linguistic representation, the
Evaluation of a Dutch stemming algorithm
- The New Review of Document and Text Management
, 1995
"... This paper describes the development and evaluation of a suffix stripper for Dutch. We have chosen to modify the stemming algorithm developed by Porter (1980) because it is well known and is frequently used in experimental IR systems. 2 Suffix stripping The core of every suffix stripper is a set of ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes the development and evaluation of a suffix stripper for Dutch. We have chosen to modify the stemming algorithm developed by Porter (1980) because it is well known and is frequently used in experimental IR systems. 2 Suffix stripping The core of every suffix stripper is a set of rules which first test whether a word ends with a certain character sequence and subsequently delete this sequence. However, some strippers are a bit more sophisticated than others. Instead of deleting a suffix, they might replace it by another (shorter) suffix or modify the stem itself. Harman (1991) compared three well-known stemming algorithms for English: ffl S--stemmer: a simple stemmer removing the plural s
Is 1 Noun Worth 2 Adjectives? Measuring Relative Feature Utility
, 2006
"... Are two adjectives worth the same as a single noun when documents are ordered based on decreasing topicality? We propose an easy to interpret single number Relative Feature Utility (RFU) measure of the relative worth of using specific linguistic or non-linguistic features or sets of features in ..."
Abstract
-
Cited by 3 (3 self)
- Add to MetaCart
Are two adjectives worth the same as a single noun when documents are ordered based on decreasing topicality? We propose an easy to interpret single number Relative Feature Utility (RFU) measure of the relative worth of using specific linguistic or non-linguistic features or sets of features in computational systems that order or filter media, such as information retrieval and classification systems. This measure allows one to make easily interpreted claims about the relative utility of features such as parts-of-speech, term suffixes, phrases vs. single terms, annotations, hyperlinks, citations, index terms, and metadata when ordering natural language text or other media. Data is provided for the RFU for stemming characteristics, part-of-speech tags, and phrase lengths, as well as retrieval characteristics and procedures. Using this linear measure of the relative utility of features makes available a wide range of cost-benefit analyses and decision theoretic techniques, allowing the study of whether or not to use many different kinds of representational information or tagging systems, and for the design of indexing and metadata systems. Some characteristics of natural languages used in the spectrum from softer to harder sciences, as well as medical terminology, are studied.
A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia
, 2003
"... Stemming is a process which provides a mapping of different morphological variants of words into their... ..."
Abstract
- Add to MetaCart
Stemming is a process which provides a mapping of different morphological variants of words into their...

