• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Allomorfessor: towards unsupervised morpheme analysis. In Evaluating systems for multilingual and multimodal information access, (2009)

by Oskar Kohonen, Sami Virpioja, Mikaela Klami
Venue:9th Workshop of the CLEF,
Add To MetaCart

Tools

Sorted by:
Results 1 - 4 of 4

O.: Unsupervised Morpheme Discovery with Allomorfessor

by Sami Virpioja, Oskar Kohonen - In Cross Language Evaluation Forum (CLEF , 2009
"... We describe Allomorfessor, which extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
We describe Allomorfessor, which extends the unsupervised morpheme segmentation method Morfessor to account for the linguistic phenomenon of allomorphy, where one morpheme has several different surface forms. The method discovers common base forms for allomorphs from an unannotated corpus by finding small modifications, called mutations, for them. Using Maximum a Posteriori estimation, the model is able to decide the amount and types of the mutations needed for the particular language. The method is evaluated in Morpho Challenge 2009.
(Show Context)

Citation Context

...an (habit’s). A segmentation based approach models changed stems as distinct morphemes. In Morpho Challenge 2008, we introduced an unsupervised model for morpheme segmentation and allomorphy learning =-=[10]-=-. In [11], some modifications to the model (now referred to as Allomorfessor Alpha) were suggested. In this paper we describe and evaluate the modified Allomorfessor model (referred to as Allomorfesso...

Enriching Morphological Lexica through Unsupervised Derivational Rule Acquisition

by Géraldine Walther , Lionel Nicolas - in "WoLeR 2011 at ESSLLI (International Workshop on Lexical Resources , 2011
"... Abstract In a morphological lexicon, each entry combines a lemma with a specific inflection class, often defined by a set of inflection rules. Therefore, such lexica usually give a satisfying account of inflectional operations. Derivational information, however, is usually badly covered. In this pa ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
Abstract In a morphological lexicon, each entry combines a lemma with a specific inflection class, often defined by a set of inflection rules. Therefore, such lexica usually give a satisfying account of inflectional operations. Derivational information, however, is usually badly covered. In this paper we introduce a novel approach for enriching morphological lexica with derivational links between entries and with new entries derived from existing ones and attested in large-scale corpora, without relying on prior knowledge of possible derivational processes. To achieve this goal, we adapt the unsupervised morphological rule acquisition tool MorphAcq (Nicolas et al., 2010) in a way allowing it to take into account an existing morphological lexicon developed in the Alexina framework (Sagot, 2010), such as the Lefff for French and the Leffe for Spanish. We apply this tool on large corpora, thus uncovering morphological rules that model derivational operations in these two lexica. We use these rules for generating derivation links between existing entries, as well as for deriving new entries from existing ones and adding those which are best attested in a large corpus. In addition to lexicon development and NLP applications that benefit from rich lexical data, such derivational information will be particularly valuable to linguists who rely on vast amounts of data to describe and analyse these specific morphological phenomena.
(Show Context)

Citation Context

...oldsmith, 2001; Goldsmith, 2006) and Morfessor (Creutz and Lagus, 2005). Linguistica constitutes the first real attempt to use the concept of MDL (Minimum Description Length) for encoding a complete corpus w.r.t. morphemes using as few bits as possible, thus trying to achieve the best possible affix and stem recognition. In (Creutz and Lagus, 2005), the authors also use the MDL approach without restricting the analysis of a word into only one facultative prefix, only one stem and only one suffix as is the case in (Goldsmith, 2001). Morfessor has later been extended for treating allomorphisms (Kohonen et al., 2009). Later, in (Golenia et al., 2009), MDL is used to pre-select possible stems for given forms; the stems are separated from the rest and the remaining strings considered possible affixes. These possible affixes are then first broken into substrings and then re-assembled according to a metric relying on the number of these substrings’ occurrences. Spiegler et al. (2010), Bernhard (2008) and Keshava (2006) describe methods inspired by the work of Harris (1955) and extensions thereof (Hafer and Weiss, 1974; Dejean, 1998). These approaches focus on transition probabilities and letter successor var...

MORPHEME SEGMENTATION BY OPTIMIZING TWO-PART MDL CODES

by Krista Lagus, Mathias Creutz, Sami Virpioja, Oskar Kohonen
"... In many real-world NLP applications, a compact yet representative vocabulary is a necessary ingredient. Words are often thought of as basic units of representation. In highly-inflecting and compounding languages, words can ..."
Abstract - Add to MetaCart
In many real-world NLP applications, a compact yet representative vocabulary is a necessary ingredient. Words are often thought of as basic units of representation. In highly-inflecting and compounding languages, words can
(Show Context)

Citation Context

...over allomorphic variation (e.g. ’lla’ and ’llä’ are different, contextdependent surface forms of a morpheme with roughly the meaning ’on’). In an extension of Morfessor Baseline called Allomorfessor =-=[10]-=- this problem is considered. It is assumed that a linguistic (latent) morpheme can have orthographical variants. The variants are coded using operations (insertions, deletions and replacements) applie...

DATE OF APPROVAL: 29/04/2011 Acknowledgements

by Koray Ak, Koray Ak , 2011
"... I am honored to present my special thanks and deepest gratitude to my supervisor Assist. Prof. Olcay Taner YILDIZ for his guidance in this thesis. Without his endless patience and support I would not finish this work. I am feeling lucky to share his vision and knowledge throughout this thesis. I am ..."
Abstract - Add to MetaCart
I am honored to present my special thanks and deepest gratitude to my supervisor Assist. Prof. Olcay Taner YILDIZ for his guidance in this thesis. Without his endless patience and support I would not finish this work. I am feeling lucky to share his vision and knowledge throughout this thesis. I am also grateful to my family for their support. They have always been by my side whenever I needed.
(Show Context)

Citation Context

...hip of the affixes are updated one more time by considering quality of shared root distributions. Once affix list is acquired, stemming of word is started from the longest common affix. Allomorfessor =-=[17]-=-, is a modified version of morfessor [2] to solve problems caused by allomorphy. MAP is done by modeling a lexicon of the words in the corpus instead of modeling the original corpus. A new notion muta...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University