31 citations found. Retrieving documents...
F. Smadja, K. R. McKeown, and V. Hatzivassiloglou, `Translating collocations for bilingual lexicons: A statistical approach', Computational Linguistics, 22(1), 1--38, (1996).

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
NTCIR-3 Patent Retrieval Experiments at ULIS - Fujii, Ishikawa   (Correct)

....transliteration method used in the query translation module is one solution for this problem (see Section 2.2) On the other hand, it is also effective to update the translation dictionary. For this purpose, a number of methods to extract translations from bilingual (parallel comparable) corpora [22, 23] are applicable. However, it is considerably expensive to obtain bilingual corpora with sufficient volume of alignment information. To resolve this problem, we use patent families, which are patent sets filed for the same related contents in multiple countries, as comparable corpora. Thus, patents ....

F. Smadja, K. R. McKeown, and V. Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1-- 38, 1996.


Translation Knowledge Acquisition from Cross-Lingually Relevant.. - Utsuro   (Correct)

....table of co occurrence frequencies of tx and ty below, bilingual term correspondences can be estimated according to the statistical measures such as the mutual information, the b 2 statistic, the dice coefficient, the log likelihood ratio, and also certain types of their extensions (e. g, [Gale91, Kumano94, Haruno96a, Smadja96, Kitamura96, Melamed00]) tx freq(tx, t) freq(tx, t) tx freq(tx,t) freq(tx,t) Matsumoto97] also proposed a method for acquiring translation rules of machine translation systems from the results of syntactic structure level alignment [Matsumoto93] of parallel sentences. Detailed introductory descriptions regarding ....

Smadja, F., McKeown, K. R. and Hatzivassiloglou, V.: Translating Collocations for Bilingual Lexicons: A Statistical Approach, Computational Linguistics, Vol. 22, No. 1, pp. I 38 (1996).


Collocation Mining: Exploiting Corpora for Collocation.. - Krenn (2000)   (Correct)

....approach. Selected verbs are employed as lexical keys to identify a particular class of PP verb collocations. Both models and model combinations are compared to a selection of commonly used statistical methods for collocation identification such as mutual information [ 2 ] Dice coefficient [ 9 ] log likelihood [ 4 ] or relative entropy [ 3 ] 2.2.1 Phrase Entropy Entropy H is a suitable means for modeling the (in)variation of PP instances related to a particular PN combination. 1 H = Gamma n X i=1 p(X = x i ) log p(X = x i ) Entropy measures the informativity of a ....

Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):3 -- 38, 1996.


A Specific Least General Generalization of Strings and Its.. - Cicekli (2001)   (Correct)

....for the sentence in the source language, parts of the corresponding target language sentence are constructed using structural equivalences and deviances in those matches. Following Nagoa s original proposal, several machine translation methods that utilize bilingual corpora have been studied [5, 9, 20, 21, 22, 23]. Some researchers [3, 24] only utilize bilingual corpora to create a bilingual dictionary and use it during the translation process. In other words, they aligned bilingual corpora at word level to figure out corresponding words in languages. Bilingual corpora is also aligned at phrase level by ....

Smadja, F., McKeown, K. R., and Hatzivassiloglou, V., Translating Collocation for Bilingual Lexicons: A Statistical Approach in: Computational Linguistics, Vol 22(1), The MIT Press, 1996, pp:1-38.


A Decision Tree of Bigrams is an Accurate Predictor of Word Sense - Pedersen (2001)   (Correct)

....1 for all values n 11 . When the value of n 11 is less than either of the marginal totals (the more typical case) the rankings produced by the Dice Coe cient are similar to those of Mutual Information. The relationship between pointwise Mutual Information and the Dice Coecient is also discussed in (Smadja et al. 1996). We have developed the Bigram Statistics Package to produce ranked lists of bigrams using a range of tests. This software is written in Perl and is freely available from www.d.umn.edu tpederse. 3 Learning Decision Trees Decision trees are among the most widely used machine learning ....

F. Smadja, K. McKeown, and V. Hatzivassiloglou. 1996. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1-38.


Normalising the IJS-ELAN Slovene-English Parallel.. - Dias, Vintar.. (1999)   (1 citation)  (Correct)

....cohesiveness existing between words, various mathematical models have been proposed in the literature. However, most of them only evaluate the degree of cohesiveness between two words and do not generalise for the case of n individual words (Church, 1990; Gale, 1991; Dunning, 1993; Smadja, 1993, Smadja, 1996; Shimohata, 1997) As a consequence, these mathematical models only allow the acquisition of binary associations 6 In Table 1, p 12 is the signed distance between w 1 and w 2 . The sign ( is used for words on the right (left) of w 1 . and enticement techniques have to be applied to ....

....a sequence of words is, the higher its association measure value will be) Using this property, we performed many experiments with different association measures. In particular, we tested the following normalised mathematical models: the Association Ratio (Church, 1990) the Dice coefficient (Smadja, 1996), the f 2 (Gale, 1990) and the Log Likelihood Ratio (Dunning, 1993) 11] On the other hand, the LocalMaxs allows extracting multiword terms obtained by composition. Indeed, as the algorithm retrieves pertinent units by analysing their immediate context, it may identify multiword terms that are ....

Smadja F. (1996) Translating Collocations for Bilingual Lexicons: A Statistical Approach, Association for Computational Linguistics, 22/1.


Combining Linguistics with Statistics for.. - Dias.. (2000)   (1 citation)  (Correct)

....textual units, various mathematical models have been proposed in the literature. However, most of them only evaluate the degree of cohesiveness between two textual units and do not generalise for the case of n individual textual units (Church Hanks 1990; Gale 1991; Dunning, 1993; Smadja, 1993, Smadja, 1996; Shimohata, 1997) As a consequence, these mathematical models only allow the acquisition of binary associations and enticement techniques have to be applied to acquire associations with more than two textual units. Unfortunately, such techniques have shown their limitations as their retrieval ....

....higher its association measure value will be) Using this property, we performed many experiences with different association measures. In particular, we tested the following normalised mathematical models in (Dias et al., 1999d) the Association Ratio (Church Hanks, 1990) the Dice coefficient (Smadja, 1996), the f 2 (Gale, 1990) and the Log Likelihood Ratio (Dunning, 1993) 17 . In all cases the Mutual Expectation has proved to lead to better results than the other models. Figure 2: Election by Composition. On the other hand, the LocalMaxs allows extracting multiword terms obtained by ....

Smadja, F. (1996). Translating Collocations for Bilingual Lexicons: A Statistical Approach. In Association for Computational Linguistics, 22 (1).


Similarities and Differences - Cicekli (2000)   (1 citation)  (Correct)

....for the sentence in the source language, parts of the corresponding target language sentence are constructed using structural equivalences and deviances in those matches. Following Nagoa s original proposal, several machine translation methods that utilize bilingual corpora have been studied [5, 9, 16, 17, 18, 19]. Some researchers [3, 20] only utilized bilingual corpora to create a bilingual dictionary and use it during the translation process. In other words, they aligned bilingual corpora at word level to figure out corresponding words in languages. Bilingual corpora is also aligned at phrase level by ....

Smadja, F., McKeown, K. R., and Hatzivassiloglou, V., Translating Collocation for Bilingual Lexicons: A Statistical Approach in: Computational Linguistics, Vol 22(1), The MIT Press, 1996, pp:1-38.


Learning Translation Templates from Bilingual Translation.. - Cicekli, Guvenir (2001)   (1 citation)  (Correct)

....for the sentence in the source language, parts of the corresponding target language sentence are constructed using structural equivalences and deviances in those matches. Following Nagoa s original proposal, several machine translation methods that utilize bilingual corpora have been studied [5, 12, 22, 23, 24, 25]. Some researchers [3, 26] only utilized bilingual corpora to create a bilingual dictionary and use it during the translation process. In other words, c fl 2000 Kluwer Academic Publishers. Printed in the Netherlands. APIN66499.tex; 3 07 2000; 10:03; p.1 2 Cicekli and Guvenir they aligned ....

Smadja, F., McKeown, K. R., and Hatzivassiloglou, V., Translating Collocation for Bilingual Lexicons: A Statistical Approach in: Computational Linguistics, Vol 22(1), The MIT Press, 1996, pp:1-38.


A Simple Hybrid Aligner for Generating Lexical.. - Ahrenberg, ANDERSSON, .. (1998)   (3 citations)  (Correct)

....to find linguistic units of the proper size. Kitamura and Matsumoto (1996) present results from aligning multi word and single word expressions with a recall of 80 per cent if partially correct translations were included. Their method is iterative and is based on the use of the Dice coefficient. Smadja et. al (1996) also use the Dice coefficient as their basis for aligning collocations 1 Model 3 5 includes multi word units in one direction. between English and French. Their evaluation show results of 73 per cent accuracy (precision) on average. 3. Underlying assumptions As Fung and Church (1994) we wish ....

Smadja F., K. McKeown, & V. Hatzivassiloglou, (1996) "Translating Collocations for Bilingual Lexicons: A Statistical Approach." In Computational Linguistics, Vol. 22 No. 1.


Cross-Language Information Retrieval using Compound Word.. - Fujii, Ishikawa (1999)   (Correct)

....an interaction strategy, which facilitates user feedback. Our compound word translation method can be applied to other existing CLIR systems as an additional translation module. Future work will include the application of automatic word alignment methods (for example, one proposed by Smadja et al. [24]) to enhance the dictionary. ....

Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, Vol. 22, No. 1, pp. 1--38, 1996.


Resolving Ambiguity for Cross-language Retrieval - Ballesteros, Croft (1998)   (34 citations)  (Correct)

....disambiguate phrase translations. Given the possible target equivalents for two source terms, we infer the most likely translations by looking at the pattern of co occurrence for each possible pair of definitions. Co occurrence statistics have been used with some success for phrasal translations [SMH96, Kup93] These techniques rely on parallel corpora and our interest is in ascertaining whether unlinked corpora can be used effectively for phrasal translation. Kraaij and Hiemstra [KH97] used co occurrence frequency for phrase translation with some success during the TREC 6 [Har97] evaluations. ....

Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38, 1996.


Computational Tools and Resources for Linguistic Studies - Hsu, Chang, Su   (Correct)

....in information retrieval tasks [Salton 1993] for identifying closely related binary relations. It can, therefore, be used as a measure of the word association for two words, x y, or the association between x and y in two languages. The dice metric for a pair of words x, y is defined as follows [Smadja 1996] : D 2 x , y ( P x = 1,y = 1 ( 1 2 P x = 1 ( P y = 1 ( where x=1 and y=1 correspond to events where x appears in first place and y appears in second place, respectively, in a word pair or in an aligned sentence pair. It is, therefore, another indication of word ....

....) where x=1 and y=1 correspond to events where x appears in first place and y appears in second place, respectively, in a word pair or in an aligned sentence pair. It is, therefore, another indication of word co occurrence which is similar to the mutual information metric. For instance, [Smadja 1996] described a program called Champollion, which, given a pair of parallel corpora in two different languages and a list of collocations in one of them, can automatically produce their translations. In that program, the dice metric is used as the measure of the correlation between the source ....

[Article contains additional citation context not shown here]

F. Smadja, K. R. McKeown, and V. Hatzivassiloglou, "Translating Collocations for Bilingual Lexicons: A Statistical Approach," Computational Linguistics, Vol 22, No. 1, 1996.


Multilingual Domain Modeling in Twenty-One - Automatic Creation.. - Hiemstra (1998)   (2 citations)  (Correct)

....dangerous waste; in the corpus it is hazardous wastes. 4 Djoerd Hiemstra gevaarlijke hazardous 0.74 toxic 0.20 dangerous 0.05 . Figure 1: An example entry ing approach. The disadvantage of the hypothesis testing approach (W.A. Gale and K.W. Church 1991, P. van der Eijk 1993, F. Smadja, K.R. McKeown, and V. Hatzivassiloglou 1996) is that a valid hypothesis can only be made if a certain minimum number of observations is available. Therefore only a limited amount of translation examples can be found with high accuracy. Following the estimating approach, it is possible to find the most probable translations for each example ....

....depends on the information available and the information we are interested in. It is for example possible to identify words (P.F. Brown, S.A. Della Pietra, V.J. Della Pietra, and R.L. Mercer 1993, W.A. Gale and K.W. Church 1991) noun phrases (P. van der Eijk 1993, J. Kupiec 1993) or collocations (F. Smadja, K.R. McKeown, and V. Hatzivassiloglou 1996) If some form of morphological analysis is performed we might want to assign different words to the same equivalence class. For example the English words is and am of the corpus of Figure 2 share the same base form to be. One of the things we will evaluate in this paper is the influence of ....

F. Smadja, K.R. McKeown, and V. Hatzivassiloglou (1996), Translating collocations for bilingual lexicons: A statisticalapproach., Computational Linguistics, 22(1):1--38.


A Statistical View on Bilingual Lexicon Extraction: From Parallel.. - Fung (1998)   (5 citations)  (Correct)

....Others such as [6, 7] use an EM based model to align words in sentence pairs in order to obtain a technical lexicon. Some other algorithms use sentence aligned parallel texts to further compile a bilingual lexicon of technical words or terms using similarity measures on bilingual lexical pairs [21, 25, 29]. Yet others focus on translating phrases or terms which consist of multiple words [6, 25, 29] The main inspiration for our work [10, 14] to be described in the following section, comes from [21] who propose using word occurrences patterns and average mutual information and t scores to find word ....

....a technical lexicon. Some other algorithms use sentence aligned parallel texts to further compile a bilingual lexicon of technical words or terms using similarity measures on bilingual lexical pairs [21, 25, 29] Yet others focus on translating phrases or terms which consist of multiple words [6, 25, 29]. The main inspiration for our work [10, 14] to be described in the following section, comes from [21] who propose using word occurrences patterns and average mutual information and t scores to find word correspondences as an alternative to the IBM word alignment model. Given any pair of ....

Frank Smadja, Kathleen McKeown, and Vasileios Hatzsivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 21(4):1--38, 1996.


Automatic Discovery of Non-Compositional Compounds in Parallel Data - Melamed (1997)   (10 citations)  (Correct)

....the interesting features in that featureless desert. Furthermore, translational equivalence relations involving explicit representations of targetlanguage NCCs are more useful than fertility distributions for applications that do translation by table lookup. Many authors (e.g. Daille et al. 1994; Smadja et al. 1996) define collocations in terms of monolingual frequency and part of speech patterns. Markedly high frequency is a necessary property of NCCs, because otherwise they would fall out of use. However, at least for translationrelated applications, it is not a sufficient property. Non compositional ....

F. Smadja, K. R. McKeown & V. Hatzivassiloglou. (1996) "Translating Collocations for Bilingual Lexicons: A Statistical Approach," Computational Linguistics 22(1).


Learning Parse and Translation Decisions from Examples with.. - Hermjakob, Mooney (1997)   (15 citations)  (Correct)

....maps to the German compound Zinssatz . We believe that an extensive collection of complex translation pairs in the bilingual dictionary is critical for translation quality and we are confident that its acquisition can be at least partially automated by using techniques like those described in (Smadja et al. 1996). Complex translation entries are preprocessed using the same parser as for normal text. During the transfer process, the resulting parse tree pairs are then accessed using pattern matching. The generation module orders the components of phrases, adds appropriate punctuation, and propagates ....

F. Smadja, K. R. KcKeown and V. Hatzivassiloglou. 1996. Translating Collocations for Bilingual Lexicons: A Statistical Approach. In Computational Linguistics 22 (1), pages 1--38.


An Overview of Corpus-Based Statistics-Oriented (CBSO).. - Su, Chiang, Chang (1996)   (Correct)

....to appear simultaneously with its neighbors, and hence is less likely to be a lexicon unit by itself. Dice The dice metric is commonly used in information retrieval tasks [Salton 93] to identify closely related binary relations. It has been used for identifying bilingual collocation translation [Smadja 96] The dice metric for a pair of words x, y is defined as follows [Smadja 96] D 2 x , y ( P x = 1,y = 1 ( 1 2 P x = 1 ( P y = 1 ( where x=1 and y=1 correspond to the events where x appears in the first place and y appears in the second place, respectively. It is ....

....a lexicon unit by itself. Dice The dice metric is commonly used in information retrieval tasks [Salton 93] to identify closely related binary relations. It has been used for identifying bilingual collocation translation [Smadja 96] The dice metric for a pair of words x, y is defined as follows [Smadja 96] D 2 x , y ( P x = 1,y = 1 ( 1 2 P x = 1 ( P y = 1 ( where x=1 and y=1 correspond to the events where x appears in the first place and y appears in the second place, respectively. It is another indication of word co occurrence which is similar to the mutual ....

[Article contains additional citation context not shown here]

F. Smadja, K. R. McKeown, and V. Hatzivassiloglou, "Translating Collocations for Bilingual Lexicons: A Statistical Approach," Computational Linguistics, Vol 22, No. 1, 1996.


Relational Learning Techniques for Natural Language Information.. - Califf (1998)   (19 citations)  (Correct)

....database logical query query processor query parser NL query answer Figure 2.3: Complete System Architecture 2. 2 Symbolic Relational Learning Since much empirical work in natural language processing has employed statistical techniques (Charniak, 1993; Miller, Stallard, Bobrow, Schwartz, 1996; Smadja, McKeown, Hatzivassiloglou, 1996; Wermter et al. 1996) this section discusses the potential advantages of symbolic relational learning. In order to accurately estimate probabilities from limited data, most statistical techniques base their decisions on a very limited context, such as bigrams or trigrams (2 or 3 word contexts) ....

Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22 (1), 1--38.


Learning Parse and Translation Decisions From Examples With Rich .. - Hermjakob (1997)   (15 citations)  (Correct)

....have been able to extract a word alignment table containing German English word pairs based solely on the internal evidence of the bilingual corpus. As for the monolingual lexicon, alignment programs could propose likely candidates to a supervisor for approval or rejection. The Champollion program (Smadja, McKeown, Hatzivassiloglou, 1996) for example, using aligned text from the Canadian Hansards and given a multi word English collocation, can identify the equivalent collocation in French with a precision of up to 78 . Champollion is limited to a statistical analysis of sets of words. The use of parsers and moderate background ....

Smadja, F., McKeown, K. R., & Hatzivassiloglou, V. (1996). Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22 (1), 1--38.


Collocations - McKeown, Radev (2000)   Self-citation (Mckeown)   (Correct)

....for natural language researchers, there exist a large number of bilingual and multilingual aligned corpora (see Chapter XYZ) Such bodies of text are an invaluable resource in machine translation in general, and in the translation of collocations and technical terms in particular. Smadja et al. [40] have created a system called Champollion which is based on Smadja s collocation extractor, Xtract. Champollion uses a statistical method to translate both flexible and rigid collocations between French and English using the Canadian Hansard corpus . The Hansard corpus is pre aligned but it ....

Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38, March 1996.


Learning Interestingness Measures in Terminology.. - Roche, Azé.. (2004)   (Correct)

No context found.

F. Smadja, K. R. McKeown, and V. Hatzivassiloglou, `Translating collocations for bilingual lexicons: A statistical approach', Computational Linguistics, 22(1), 1--38, (1996).


Automatic Detection of Collocation - Jiangsheng Yu Zhihui   (Correct)

No context found.

F. A. Smadja, K. R. McKeown and V. Hatzivassiloglou (1996), Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22 (1), pp. 1-38.


Can we do better than frequency? A case study on extracting.. - Krenn, Evert (2001)   (Correct)

No context found.

Frank Smadja, Kathleen R. McKeown, and Vasileios Hatzivassiloglou. 1996. Translating Collocations for Bilingual Lexicons: A Statistical Approach. In Computational Linguistics 22(1), 3 -- 38.


Phrase Identification in Cross-Language Information Retrieval - Adriani, van Rijsbergen (2000)   (1 citation)  (Correct)

No context found.

Smadja, Frank, McKeown, Kathleen R., and Hatzivassiloglou, Vasileios. (1996). Translating Collocations for Bilingual Lexicons: A Statistical Approach. Computational Linguistics, 22(1), 1-38.


Identifying Translations of Compound Nouns Using.. - Iram, Ohtake.. (1999)   (1 citation)  (Correct)

No context found.

Frank Smadja, Kathleen R. Mckeown, and Vasileios Hatzivassiloglou. 1996. Translating collocations for bilingual lexicons: A statistical approach. Computational Linguistics, 22(1):1--38.


Automatic Thesaurus Generation through Multiple Filtering - Kageura, Tsuji, AIZAWA (2000)   (Correct)

No context found.

Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. (1996) \Translating collocations for bilingual lexicons: A statistical approach." Computational Linguistics. 22(1), p. 1-38.


A Multivariate Gaussian Mixture Model for Automatic Compound.. - Chang, Su   (Correct)

No context found.

Smadja, Frank, Kathleen R. McKeown and Vasileios Hatzivassiloglou, "Translating Collocations for Bilingual Lexicons: A Statistical Approach," Computational Linguistics, vol. 22, no. 1, pp. 1-38, 1996.


Evaluating Word Alignment Systems - Merkel, Ahrenberg   (Correct)

No context found.

Smadja, F., K. McKeown, & V. Hatzivassiloglou (1996) "Translating Collocations for Bilingual Lexicons: A Statistical Approach." In Computational Linguistics, Vol 22. No. 1.


An Unsupervised Iterative Method for Chinese New Lexicon.. - Jing-Shin Chang (1997)   (7 citations)  (Correct)

No context found.

Smadja, Frank, Kathleen R. McKeown and Vasileios Hatzivassiloglou, "Translating Collocations for Bilingual Lexicons: A Statistical Approach," Computational Linguistics, vol. 22, no. 1, pp. 1-38, 1996.


Semi-Automatic Acquisition of Domain-Specific Translation.. - Resnik, Melamed (1997)   (5 citations)  (Correct)

No context found.

Frank Smadja, Kathleen McKeown, and Vasileios Hatzivassiloglou. 1996. Translating collocations for bilingual lexicons: A statistical approach.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC