| Sojka P.: Notes on Compound Word Hyphenation in T E X. In: TUGboat, Vol. 16, No. 3 (1995) 290--296 |
....problem. Problems with hyphenation are often one of the most dicult. As most T E X users are perfectionists, xing and tuning hyphenation for every presentation is a tedious, timeconsuming task. We have already dealt with several issues related to hyphenation in T E X (Sojka and Seve cek, 1995; Sojka, 1995). On the basis of our being involved in typesetting tens of thousands of T E X pages of multilingual documents (mostly dictionaries) we want to point out several methods suitable for the development of hyphenation patterns. 2 Pattern generation There is no place in the world that is ....
....druh# a n# em.st. Ger z g# ot. gaiza hn# eco # spi# cat# ehoi. Figure 2: Using to typeset paragraphs in which words from languages with more than 256 di erent characters may appear and be hyphenated in parallel. Some suggestions on handling multiple hyphenation classes were suggested in Sojka (1995). A prototype implementation of T E X and PATGEN has recently been done (Classen, 1998) For wider adoption of such improvements availability of large word lists and development of new patterns is crucial. Many of the methods mentioned above could be used to develop such ....
Sojka, Petr. \Notes on Compound Word Hyphenation in T E X". TUGboat 16(3), 290-297, 1995.
....1987) Lout, QuarkXpress, 3B2 and many others today. Liang s algorithm performs well for nonflexive languages with small number of compounds like English but there is still lack of good methods for other languages, especially for flexive languages (all Slavonic languages, Dutch, German etc) Sojka and Seve cek (Sojka 1995, Sojka and Seve cek 1995) state that in Czech, on average, 20 30 different word forms inflexions can be derived from one word stem. This number can be almost doubled if negatives are formed from many words (adjectives, verbs, adverbs, some nouns) by adding the prefix ne. Thus, from a 170,000 ....
....and many others today. Liang s algorithm performs well for nonflexive languages with small number of compounds like English but there is still lack of good methods for other languages, especially for flexive languages (all Slavonic languages, Dutch, German etc) Sojka and Seve cek (Sojka 1995, Sojka and Seve cek 1995) state that in Czech, on average, 20 30 different word forms inflexions can be derived from one word stem. This number can be almost doubled if negatives are formed from many words (adjectives, verbs, adverbs, some nouns) by adding the prefix ne. Thus, from a 170,000 stem wordlist about ....
[Article contains additional citation context not shown here]
Sojka, P. and Seve cek, P.: 1995, Notes on Compound Word Hyphenation in T E X, TUGboat 16(3), 290--297.
....task. However, there are several pattern generation strategies that allow the choice between size optimal or coverage optimal patterns [27] with patgen program [15] A generation process can be parametrised by several parameters whose tuning strategies are beyond the scope of this paper; see [27,24] for details. Parameters could be tuned so that virtually all hyphenation points are covered, leading to about 99.9 efficiency, and size is not far from optimum. Further investigation and research is necessary to find sufficient conditions for finding optimal results. Stratification Technique ....
....of pattern generation, we were able to create patterns for German compound words with 8825 patterns (70.2 KB) with 95.28 coverage. Higher coverage is at the expense of pattern size growth. For details of hyphenation pattern generation for compound words in Czech and German using patgen, see [27,24,25]. 6 Outline of an Application to Part of Speech Tagging Two mainstream approaches are being used for the POS task: linguistic, based on handcoded linguistic rules (constraint grammars) 9,20] and machine learning (statistical, transformation based) approaches, based on learning the language ....
Petr Sojka. Notes on Compound Word Hyphenation in T E X. TUGboat, 16(3):290--297, 1995.
....DTP systems like troff (Emerson and Paulsell 1987) Lout, QuarkXpress, 3B2 and many others today. Liang s algorithm performs well for non flexive languages with small number of compounds, but there is still a lack of good methods for other languages, especially for flexive Slavonic languages. Sojka and Sevecek (Sojka 1995, Sojka and Sevecek 1995) state that in average 20 30 inflexions can be derived from one word stem by changing the suffix added and this number can be almost dubled if negatives are formed from many words (adjectives, verbs, adverbs, some nouns) by adding the prefix ne. Thus, from a 170,000 stem ....
....and Paulsell 1987) Lout, QuarkXpress, 3B2 and many others today. Liang s algorithm performs well for non flexive languages with small number of compounds, but there is still a lack of good methods for other languages, especially for flexive Slavonic languages. Sojka and Sevecek (Sojka 1995, Sojka and Sevecek 1995) state that in average 20 30 inflexions can be derived from one word stem by changing the suffix added and this number can be almost dubled if negatives are formed from many words (adjectives, verbs, adverbs, some nouns) by adding the prefix ne. Thus, from a 170,000 stem word list about 5,000,000 ....
[Article contains additional citation context not shown here]
Sojka, P. and Sevecek, P.: 1995, Notes on Compound Word Hyphenation in T E X, TUGboat 16(3), 290--297.
No context found.
Sojka P.: Notes on Compound Word Hyphenation in T E X. In: TUGboat, Vol. 16, No. 3 (1995) 290--296
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC