| Antal van den Bosch and Walter Daelemans. 1993. Data-oriented methods for grapheme-tophoneme conversion. In Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL'93), pages 45--53. |
.... of the pronunciation of the morphemes, because the pronunciation of morphemes can be modified in certain contexts, the textto speech system also has to be provided with phonological rules which adjust the pronunciation of mor phemes according to their context [Allen et al. 1987; 183 Nunn and van Heuven, 1993]. Dutch phonological rules are in several ways dependent on morphemic segmentation and word class assignment. As is shown in (2a) for example, the grapheme d is pronounced voiceless when it occurs stem finally, but voiced when it occurs stem initially. Final devoicing, the phonological rule ....
....onrechl on recht, N So to be able to produce high quality speech on unrestricted text, the text to speech system SPRAAK MAKER contains the morpheme lexicon based morphological parser MORPA to recover the morphemic segmentation and word class of the input word. The module MORPHON [Nunn and van Heuven, 1993] applies phonological rules which derive the pronun ciation of the word by making use of the morphological information. Also, the word class provided by MORPA feeds the module for sentence analysis which serves sentence prosody [Dirksen and Quen, 1993] Our method of morphological analysis ....
[Article contains additional citation context not shown here]
A. van den Bosch and W. Delemans. Data-oriented meth- ods for grapheme-to-phoneme conversion. In Proceedings of the Sizth Conference of the European Chapter of the Association for Computational Lin- guistics., 1993.
.... (Daelemans et al. submitted, Cardie, 1994, Cardie, 1993a) for semantic interpretation (e.g. concept extraction (Cardie, 1994, Cardie, 1993a) and for a number of low level language acquisition tasks, including stress acquisition (Daelemans et al. 1994) and graphemeto phoneme conversion (Bosch and Daelemans, 1993). In the sections below, we first present empirical evidence that the success of case based learning methods for natural language processing tasks depends to a large degree on the feature set used to describe the training instances. Next, we present a technique for automating feature set selection ....
A. van den Bosch and W. Daelemans. 1993. Data-oriented meth- ods for grapheme-to-phoneme conversion. In Proceedings of European Chapter of ACL, pages 45-53, Utrecht. Also available as ITK Research Report 42.
....and compared with NETtalk and other commercial systems (from Bellcore [77] Bell Labs [16] and DEC [17] ANAPRON performed significantly better than NETtalk in this task, yielding a word accuracy of 86 , which is very close to the performance of the commercial systems. Van den Bosch et al. [88] experimented with two data oriented methods for gra pheme to phoneme conversion in Dutch. The first variant, known as instance based learning (IBL) is a form of case based reasoning. During training it constructs records of letters surrounded by different graphemic windows, the corresponding ....
van den Bosch, A. and W. Daelemans, "Data-Oriented Methods for Grapheme- to-Phoneme Conversion," Proc. Sixth European ACL, pp. 45-53, 1993.
....P (c j v j ) j (4) When two values share more classes, they are more similar, as ffi decreases. Memory based learning has fruitfully been applied to natural language processing, yielding state of theart performance on all levels of linguistic analysis, including grapheme to phoneme conversion (van den Bosch and Daelemans, 1993), PoStagging (Daelemans et al. 1996) and shallow parsing (Cardie et al. 1999) In this study, the following memory based models are used, all available from the TiMBL package (Daelemans et al. 1999) IB1 IG is a k nearest distance classifier which employs a weighted overlap metric: Delta(I i ....
A. van den Bosch and W. Daelemans. 1993. Dataoriented methods for grapheme-to-phoneme conversion. Proceedings of the 6th Conference of the EACL.
....(approximately 200 word) test set, Stanfill found a best accuracy of 88 frames correct with a 132,072 frame (about 25,000 word) training dictionary. In subsequent work, Stanfill (1988) presents a rule induction methodology within the framework of the MBR paradigm. More recently, van den Bosch and Daelemans (1993) have described a very similar approach to memory based reasoning that they describe as a link between straightforward 15 lexical lookup and similarity based reasoning . The method takes a pronouncing dictionary as the training set but solves the problem of lacking generalisation power and ....
....to a score of about 93 symbols correct comparable to PbA. Hence, we feel justified in claiming that pronunciation by analogy works at least as well as the best letter to sound rules, with a tiny fraction of the development effort. This is consistent with the earlier finding of van den Bosch and Daelemans (1993) that previous knowledge based approaches were overkill (at least for Dutch) It seems that rule based systems only achieve acceptable performance because they are used in conjunction with a pronouncing dictionary 29 (which does most of the work) Of course, PbA could be deployed together with ....
Bosch, A. van den and Daelemans, W. (1993). Data-oriented methods for grapheme-tophoneme conversion. Proceedings of 6th Conference of the European Chapter of the Association for Computational Linguistics, Utrecht, Netherlands, pp. 45--53.
....has focused on learning with skewed class distributions (Fawcett, 1996) This paper addresses the problem of learning with skewed class distributions within the case based learning (CBL) framework. We first present as a baseline an information gain weighted case based learning algorithm, IG CBL (Bosch and Daelemans, 1993; Cardie, 1993b) IG CBL is a weighted k nearest neighbor (k nn) algorithm that derives feature weights based on the information gain of the associated feature across the training data. We then apply the algorithm to three problems from NLP with skewed class distributions: part of speech tagging, ....
....described in Cardie(1993b) and briefly below. The goal in this step is to prune features from the representation so that the CBL algorithm can ignore them entirely. Feature weighting. IG CBL then assigns each remaining feature a weight according to its information gain across the training cases (Bosch and Daelemans, 1993; Daelemans et al. 1997) The intent here is to weight each feature relative to its overall importance in the data set. Training Phase. There are three steps to the IGCBL training phase: 1. Create the case base. For this, we simply store all of the training instances in a flat case base. 2. Use ....
Bosch, A. van den and Daelemans, W. 1993. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the Sixth Conference of the EACL, Utrecht. 45--53. Also available as ITK Research Report 42.
.... weights, classic approximations make use of conditional probabilities (Crecy et al. 37] class projections (Stanfill and Waltz [38] Howe and Cardie [39] mutualinformation (Wettschereck and Dietterich [40] in a nearest hyperrectangle approach) or information gain (van den Bosch and Daelemans [41]; Cardie and Howe [42] first build a decision tree to select features and then weight each feature according to its information gain) VI. Summary and future work Once the Feature Weighting problem for nearest neighbor classifier has been stated as a search problem, GAs, due to their attractive ....
A. van den Bosch, W. Daelemans, Data-oriented methods for grapheme-tophoneme conversion, Technical Report no. 42, Tilburg University, Institute for Language Technology and Artificial Intelligence, Tilburg, The Netherlands, 1993.
.... conversion (identify the pronunciation of words) stress assignment (identify the stress pattern of words) morphology (both synthesis and analysis) and tagging (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996abc for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the accuracy of the ....
.... stress assignment (identify the stress pattern of words) morphology (both synthesis and analysis) and tagging (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996abc for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the accuracy of the induced systems is always comparable and often ....
Van den Bosch, A. and Daelemans, W. (1993). `Data-oriented methods for grapheme-to-phoneme conversion.' Proceedings of the Sixth conference of the European chapter of the ACL, ACL, 45-53.
.... demonstrated the application of the memory based (lazy) learning approach to several linguistic problems, e.g. segmentation as in hyphenation and syllabification (Daelemans Van den Bosch, 1992; Van den Bosch et al. 1995) and identification as in grapheme phoneme conversion (Weijters, 1991; Van den Bosch Daelemans, 1993; Daelemans Van den Bosch, 1994) and stress assignment(Daelemans et al. 1994) In most cases, the memory based (lazy) approach outdid the more eager inductive algorithms. We believe that in a noisy domain such as natural language, abstracting from the training instances is a bad idea because ....
....material. The idea is that it is not necessary to fully store an instance as a path when only a few feature values of the instance make the instance classification unique. In applications to linguistic tasks, igtree is shown to obtain compression factors of 90 or more as compared to ib1 ib1 ig (Van den Bosch Daelemans, 1993; Daelemans Van den Bosch, 1994) igtree also stores with each non terminal node information concerning the most probable or default classification given the path thus far, according to the classification bookkeeping information maintained by the trie construction algorithm. This extra ....
Van den Bosch, A., Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, 45--53. Utrecht: OTS.
....it can be substantially faster to turn it on. CHAPTER 4. LEARNING ALGORITHMS 16 make the difference between hours and seconds of computation 6 . 4. 4 IGTree Using Information Gain rather than unweighted Overlap distance to define similarity in ib1 improves its performance on several nlp tasks [17, 27, 26]. The positive effect of Information Gain on performance prompted us to develop an alternative approach in which the instance memory is restructured in such a way that it contains the same information as before, but in a compressed decision tree structure. We call this algorithm igtree [14] see ....
....a historical overview of our own work with the application of mbl type algorithms to NLP tasks. The ib1 ig algorithm was first introduced in [17] in the context of a comparison of memory based approaches with backprop learning for a hyphenation task. Predecessor versions of igtree can be found in [12, 27] where they are applied to grapheme to phoneme conversion. See [14] for a detailed description and review of the algorithms. A recent development, now implemented in the TiMBL package is tribl [15] The memory based algorithms implemented in the TiMBL package have been successfully applied to a ....
A. Van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53, 1993.
.... the pronunciation of words) stress assignment (identify the stress pattern of words) morphology (both synthesis and analysis) and morphosyntactic disambiguation (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996ab for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the generalization accuracy ....
.... the stress pattern of words) morphology (both synthesis and analysis) and morphosyntactic disambiguation (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996ab for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the generalization accuracy of the induced systems is always ....
Van den Bosch, A. and Daelemans, W. (1993). `Data-oriented methods for grapheme-tophoneme conversion.' Proceedings of the Sixth conference of the European chapter of the ACL, ACL, 45--53.
....not constrain the choice enough beforehand, or if we wish to measure the importance of various information sources experimentally. 2.2. Optimized MBL in TIMBL: IGTREE Using information gain rather than Overlap distance to define similarity in IB1 improves its performance on several NLP tasks [13, 26, 24]. The positive effect of information gain on performance prompted us to develop an alternative approach in which the instance memory is restructured in such a way that it contains the same information as before, but in a compressed decision tree structure. In this structure, instances are stored ....
....classification subtasks. In this section we illustrate some examples of recent work on MBLE on three light NLP tasks: i) text to speech conversion in TREETALK, ii) part of speech tagging in MBT, and (iii) phrase chunking in MBC. 3.1. TREETALK: Text to speech conversion The TREETALK system [26, 25, 11, 24] has originally been designed for isolated word pronunciation, i.e. converting a written word to its phonemic representation as found in a pronunciation dictionary, and efforts are underway to extend it to modeling speech phenomena in texts, such as sentence accents and prosody. In this ....
A. Van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53, 1993.
....the amount of memory. 5 mvdm and numeric features cannot make use of the inverted index optimization, because CHAPTER 4. LEARNING ALGORITHMS 14 4. 4 IGTree Using Information Gain rather than unweighted Overlap distance to define similarity in ib1 improves its performance on several nlp tasks [15, 25, 24]. The positive effect of Information Gain on performance prompted us to develop an alternative approach in which the instance memory is restructured in such a way that it contains the same information as before, but in a compressed decision tree structure. We call this algorithm igtree [12] see ....
....a historical overview of our own work with the application of mbl type algorithms to NLP tasks. The ib1 ig algorithm was first introduced in [15] in the context of a comparison of memory based approaches with backprop learning for a hyphenation task. Predecessor versions of igtree can be found in [10, 25] where they are applied to grapheme to phoneme conversion. See [12] for a detailed description and review of the algorithms. A recent development, now implemented in the TiMBL package is tribl [13] The memory based algorithms implemented in the TiMBL package have been successfully applied to a ....
A. Van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53, 1993.
No context found.
A. van den Bosch & W. Daelemans, Data-oriented methods for grapheme-to-phoneme conversion. Proceedings of the Sixth conference of the European chapter of the ACL, ACL, 45-53, 1993.
.... conversion (identify the pronunciation of words) stress assignment (identify the stress pattern of words) morphology (both synthesis and analysis) and tagging (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996abc for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the accuracy of the ....
.... stress assignment (identify the stress pattern of words) morphology (both synthesis and analysis) and tagging (identify for each word in a text its morpho syntactic category) See Daelemans (1995) for a discussion of the general approach, and van den Bosch Daelemans, 1992, 1993; Daelemans van den Bosch, 1992ab, 1993, 1994; van den Bosch et al. 1996; Daelemans et al. 1994, 1995, 1996abc for the details. There are some general trends which become clear when analysing the results of all these experiments. First, the most striking result is that the accuracy of the induced systems is always comparable and often ....
Van den Bosch, A. and Daelemans, W. (1993). `Data-oriented methods for grapheme-to-phoneme conversion.' Proceedings of the Sixth conference of the European chapter of the ACL, ACL, 45--53.
....the same relevance, which is undesirable for our linguistic problems. We noticed that ib1, when extended with a simple feature weighting similarity function, outperforms ib1, and sometimes also outperforms both connectionist approaches and knowledgebased linguistic engineering approaches [VD93]. The similarity function we introduced in lazy learning [DV92] consisted simply of multiplying, when comparing two feature vectors, the similarity between the values for each feature with the corresponding information gain, or in case of features with different numbers of values the gain ratio, ....
....machines. Based on these findings with ib1 ig, we designed a variant of ib1 in which the case base is compressed into a tree based data structure in such a way that access to relevant cases is faster, and no relevant information about the cases is lost. This simple algorithm, igtree [VD93, DVW97], uses a feature relevance metric such as information gain to restructure the case base into a decision tree. In Section 2, we describe the igtree model and its properties. Section 3 describes comparative experiments with igtree, ib1, and ib1 ig on learning one of our linguistic tasks and some of ....
Van den Bosch, A. and Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, 45--53. Utrecht: OTS. This article was processed using the L a T E X macro package with LLNCS style
....or if we wish to measure the importance of various information sources experimentally. 2.2. Optimized MBL in TiMBL: IGTREE Using information gain rather than Overlap distance to define similarity in IB1 improves its performance on several NLP tasks (Daelemans and van den Bosch, 1992; Van den Bosch and Daelemans, 1993; Van den Bosch, 1997) The positive effect of information gain on performance prompted us to develop an alternative approach in which the instance memory is restructured in such a way that it contains the same information as before, but in a compressed decision tree structure. In this structure, ....
....overlap among various types of assignment criteria, the expectation was that pace (Deutsch and Wijnen, 1985) phonological information should make at least some headway in supplying gender information for unknown lemmata. 1. Data was extracted from the CELEX (Baayen, Piepenbrock, and Van Rijn, 1993) lexical database. Two series of 3 experiments were carried out, one for each relevant gender distinction. Experiments 1 3 involved 6090 noun lemmas; target categories were M(asculine) F(eminine) N(euter) Experiments 4 6 involved 7651 noun lemmas) here, target categories were DE and HET, ....
Van den Bosch, A. and W. Daelemans. 1993. Dataoriented methods for grapheme-to-phoneme conversion.
.... mark of 80 90 correct words which is needed in high quality text to speech synthesis (Yvon, 1996) However, Bakiri and Dietterich (1993) have shown that their approach based on ID 3 (Quinlan, 1986) decision trees outperforms the sophisticated DECTalk rule set for English (Allen et al. 1987) (Van den Bosch and Daelemans, 1993; Daelemans and Van den Bosch, 1997) report similar results for Dutch. In both cases, the training corpora contained around 18000, and the test corpora around 2000 words. With the exception of (Dietterich and Bakiri, 1995) most researchers have relied on large machine readable pronunciation ....
....Shortening as in divine divinity, and stress shifts as in photograph photography. Three types of phoneme based approaches have yielded good results for large corpora: neural networks (Sejnowski and Rosenberg, 1987) decision trees (Dietterich et al. 1995) and instance based learning (Van den Bosch and Daelemans, 1993). 2.1 Neural networks Artificial neural networks (ann) consist of simple processing units with weighted connections. The units are usually grouped into an input layer, an output layer, and one or more hidden layers. The best results on APT so far have been achieved using a simple feed forward ....
Van den Bosch, A. and Daelemans, W. (1993). Data-oriented methods for graphemeto -phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53.
.... to some distance metric, and extrapolating the category of this item to the new input pattern (applying the consistency heuristic) Instances of this form of nearest neighbour method include instance based learning (Aha et al. 1991] exemplar based learning (Salzberg [1990] Cost and Salzberg [1993]) memory based reasoning (Stanfill and Waltz [1986] and case based reasoning (Kolodner [1993] Advantages of the approach include an often surprisingly high classification accuracy, the capacity to learn polymorphous concepts, high speed of learning, and perspicuity of algorithm and ....
.... (applying the consistency heuristic) Instances of this form of nearest neighbour method include instance based learning (Aha et al. 1991] exemplar based learning (Salzberg [1990] Cost and Salzberg [1993] memory based reasoning (Stanfill and Waltz [1986] and case based reasoning (Kolodner [1993]) Advantages of the approach include an often surprisingly high classification accuracy, the capacity to learn polymorphous concepts, high speed of learning, and perspicuity of algorithm and classification (see e.g. Cost and Salzberg [1993] Learning speed is extremely fast (it consists ....
[Article contains additional citation context not shown here]
Van den Bosch, A. and Daelemans, W.: `Data-oriented methods for grapheme-tophoneme conversion.' Proceedings of the Sixth conference of the European chapter of the ACL, ACL, (1993) 45--53.
....algorithm. Automatic, data oriented learning algorithms seem to provide an appropriate means for extracting statistical facts from language data related to orthographic depth, without incorporating any linguistic bias in the form of language specific constraints or heuristics. Daelemans and Van den Bosch (1993) demonstrate that the application of data oriented techniques to morpho phonological domains, such as grapheme to phoneme conversion, is languageindependent, and can be done for any language for which a computer readable corpus exists. The data oriented approach furthermore presents an interesting ....
....(the training material) The second algorithm, Decision Tree Search, is used to retrieve information from the decision tree in order to find the appropriate phonemic mapping to (possibly unseen) graphemic input strings. Detailed descriptions of both algorithms can be found in Daelemans and Van den Bosch (1993) and Van den Bosch and Daelemans (to appear) The Decision Tree model converts words to their phonemic transcription in a letter oriented way. For each letter in a spelling word, the model attempts to find the most appropriate phonemic mapping, given the current letter context. To this purpose, ....
[Article contains additional citation context not shown here]
Van den Bosch, A., Daelemans, W. (1993). "Dataoriented methods for grapheme-to-phoneme conversion." In Proceedings of the 6th Conference of the EACL (pp. 45--53).
.... heuristics for disambiguating between discourse use and sentential use of cue phrases in text was investigated by Litman (1996) In the morpho phonological domain the decision tree learning algorithm IGTree (Daelemans et al. to appear) has been applied successfully to grapheme phoneme conversion (Van den Bosch and Daelemans, 1993) and morphological analysis (Van den Bosch et al. 1996) An example application of rule induction to a morphophonological task is the application of C4.5rules to Dutch diminutive formation (Daelemans et al. 1996) 4 Artificial Neural Networks During the last decade, the study of connectionist ....
.... Riloff, and Scheler (1996) In the morpho phonological domain, successes are claimed for ANNs as good generalisation models for classification tasks, e.g. grapheme phoneme conversion (Sejnowski and Rosenberg, 1986; Dietterich et al. 1995; Wolters, 1995) However, work by Weijters (1991) Van den Bosch and Daelemans (1993), and Van den Bosch (forthcoming) consistently shows a significantly lower performance on a range of morpho phonological subtasks by BP as compared to decision tree learning and lazy learning. Apparently, the amount of abstraction in a BP trained ANN is accounting for a similar harmful effect on ....
Van den Bosch, A., and Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pp. 45--53.
....approach, the lexical requirements for such a system are extensive. In a typical knowledge based system solving the problem, morphological analysis (with lexicon) phonotactic knowledge, and syllable structure determination modules are designed and implemented. In a lazy learning approach ( 9] [3]) again a windowing approach was used to formulate the task as a classification problem (identification this time: given a set of possible phonemes, determine which phoneme should be used to translate a target spelling symbol taking into account its context) Results were highly similar to the ....
Van den Bosch, A. and Daelemans, W.: `Data-oriented methods for grapheme-to-phoneme conversion.' Proceedings of the Sixth conference of the European chapter of the ACL, ACL, (1993) 45--53.
....path re use, and inverted index has proven in practice (for our NLP datasets) to make the difference between hours and seconds of computation 5 . 3. 4 IGTree Using Information Gain rather than unweighted Overlap distance to define similarity in ib1 improves its performance on several nlp tasks [8, 24, 23]. The positive effect of Information Gain on performance prompted us to develop an alternative approach in which the instance memory is restructured in such a way that it contains the same information as before, but in a compressed decision tree structure. We call this algorithm igtree [13] see ....
....a historical overview of our own work with the application of mbl type algorithms to NPL tasks. The ib1 ig algorithm was first introduced in [8] in the context of a comparison of memory based approaches with backprop learning for a hyphenation task. Predecessor versions of igtree can be found in [10, 24] where they are applied to grapheme to phoneme conversion. See [13] for a detailed description and review of the algorithms. A recent development, not yet implemented in the TiMBL package is tribl [14] an algorithm which constitutes a hybrid between the ib1 ig and igtree algorithms. The ....
A. Van den Bosch and W. Daelemans. Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, pages 45--53, 1993. BIBLIOGRAPHY 29
....from the learning material during learning. In previous research we have demonstrated the application of the memory based (lazy) learning approach to several linguistic problems, e.g. segmentation as in hyphenation and syllabification [6, 17] and identification as in grapheme phoneme conversion [18, 16, 7], and stress assignment [8] In most cases, the memory based (lazy) approach outdid the more eager inductive algorithms. We believe that in a noisy domain such as natural language, abstracting from the training instances is a bad idea because any one instance (however exceptional from the ....
....material. The idea is that it is not necessary to fully store an instance as a path when only a few feature values of the instance make the instance classification unique. In applications to linguistic tasks, igtree is shown to obtain compression factors of 90 or more as compared to ib1 ib1 ig [16, 7]. igtree also stores with each non terminal node information concerning the most probable or default classification given the path thus far, according to the classification bookkeeping information maintained by the trie construction algorithm. This extra information is essential when processing ....
Van den Bosch, A., Daelemans, W. (1993). Data-oriented methods for grapheme-to-phoneme conversion. In Proceedings of the 6th Conference of the EACL, 45--53. Utrecht: OTS.
No context found.
Antal van den Bosch and Walter Daelemans. 1993. Data-oriented methods for grapheme-tophoneme conversion. In Sixth Conference of the European Chapter of the Association for Computational Linguistics (EACL'93), pages 45--53.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC