Improving Successor Variety for Morphological Segmentation (2010)
| Venue: | COMPUTATIONAL LINGUISTICS IN THE NETHERLANDS 2010: SELECTED PAPERS FROM THE TWENTIETH CLIN MEETING |
BibTeX
@MISC{Çöltekin10improvingsuccessor,
author = {Çağrı Çöltekin},
title = {Improving Successor Variety for Morphological Segmentation},
year = {2010}
}
OpenURL
Abstract
Successor variety is a commonly used measure for segmentation in language processing. It is based on a simple idea that large variety of letters (or phonemes) following an initial word (or utterance) segment indicates a possible boundary. It dates back to Harris (1955), and several methods based on successor variety have been used in the literature, particularly for the purposes of segmenting words into morphemes. However, there have not been many studies analyzing the measure itself. Even though the idea is simple and effective, the current use in the literature does not utilize the measure to its full extent due to a number of problems with the successor variety scores. This paper intends to address these problems by introducing a normalization method, and demonstrates—using segmentation experiments on two typologically different languages — the effectiveness of this improvement on the morphological segmentation task. 1







