| Gale, W. A., & Church, K. W. (1991a). "Identifying word correspondences in parallel texts". Fourth DARPA Workshop on Speech and Natural language, Asilomar, California. |
....that can be expressed by symbols. The problem we need to address is how to identify symbol correspondence. Given any pair of symbols in which represents a visual cluster and represents an audio cluster, we 28 can measure the association between by making use of mutual information [Gale and Church, 1991]. a like statistic, seems to be a good measurement of correlation: 8) is the probability of and co occurrence, is not in close temporal proximity; is not in close temporal proximity. is the probability that neither nor ....
Gale, W. and Church, K. (1991). Identifying word correspondences in parallel texts. In Proceedings of the DDARPA SNL Workshop.
....pairs involving S or T in the same sentence pair. An alignment filter is based on the relative positions of S and T in their respective texts[Dag93] The decision procedure used to select lexicon entries from the multiset of candidate translation pairs is a variation of the method presented in [Gal91a]. Dun93] found binomial log likelihood ratios to be relatively accurate when dealing with rare tokens. This statistic was used to estimate dependencies between all co occuring (source word, target word) pairs. For each source word S, target words were ranked by their dependence with S. The top N ....
W. Gale & K. W. Church, "Identifying Word Correspondences in Parallel Texts," Pro- ceedings of the DARPA $NL Workshop, 1991.
....which is not yet used by the program. 6 Related Research : In this section we compare our work with two other methods reported on in the literature. In section 6.1 we compare our work to work discussed in [Gaussier et al. 1992] which is based on mutual information. Section 6. 2 discusses [Gale and Church, 1991a] which is based on the 42 statistic. sit is conceivable to partly automate the acquisition of the necessary lexical knowledge, viz. determining which nouns are likely to take PP complements, but our corpus is too small for this type of knowledge acquisition. In fact, it turned out to be better ....
....expense of recall. The position sensitive result is comparable to the 90 row in table 7. Figure 9: Phrase based methods using mutfial infor mation Position I Filter I Recall [Precision no no 66 (98 ) 25 yes no 66 (98 ) 58 no yes 55 (82 ) 38 yes yes 40 (59 ) 89 6. 2 The b 2 method In [Gale and Church, 1991a] another association measure is used, viz. b 2, a X2 1ike statistic. In the following formula, assume a is the co occurrence frequency of a source language term sl and a target language term tl, b the frequency of sl minus a, c the frequency of II minus a, and d the number of regions containing ....
[Article contains additional citation context not shown here]
W. Gale and K. Church. Identifying word correspondences in parallel texts. In dth Darpa Workshop on Speech and Natural Language, pages 152-157, 1991.
....The sub script i denotes the i th alignment of sentences in both languages. A word sequence in Ei is defined here as the correspondence of another sequence in Fi if the words of one sequence are considered to represent the words in the other. Single word correspondences have been investi gated [Gale and Church, 1991a] using a statistic operating on contingency tables. An algorithm for producing collocational correspondences has also been described [Smadja, 1992] The algorithm in volves several steps. English collocations are first extracted from the English side of the corpus. Instances of the English ....
....operation. An arbitrarily large corpus can be accommodated by segmenting it appropriately. The algorithm described here is an instance of a general approach to statistical estimation, rep resented by the EM algorithm [Dempster et al. 1977] In contrast to reservations that have been expressed [Gale and Church, 1991a] about us ing the EM algorithm to provide word correspondences, there have been no indications that prohibitive amounts of memory might be required, or that the approach lacks robustness. Unlike the other methods that have been mentioned, the approach has the capability to accommodate more ....
W. A. Gale and K. W. Church. Identifying word correspondences in parallel texts. In Proceedin#s of the Fourth DARPA Speech and Natural an#ua#e Workshop, pages 152-157, Pacific Grove, CA., February 1991. Morgan Kaufmann.
....table of co occurrence frequencies of tx and ty below, bilingual term correspondences can be estimated according to the statistical measures such as the mutual information, the b 2 statistic, the dice coefficient, the log likelihood ratio, and also certain types of their extensions (e. g, [Gale91, Kumano94, Haruno96a, Smadja96, Kitamura96, Melamed00]) tx freq(tx, t) freq(tx, t) tx freq(tx,t) freq(tx,t) Matsumoto97] also proposed a method for acquiring translation rules of machine translation systems from the results of syntactic structure level alignment [Matsumoto93] of parallel sentences. Detailed introductory descriptions regarding ....
Gale, W. and Church, K.: Identifying Word Correspondences in Parallel Texts, Proc. th DARPA Speech and Natural Language Workshop, pp. 152 157 (1991).
....system, a disambiguation of the English translation candidates is performed, by selecting the best English term, equivalent to each French query term, by applying a statistical method based on the co occurrence frequency. For the purpose of this study, we decided to use the mutual information [2], which is defined as follows: MI (W 1 ,W 2 ) Log 2 Where N is the size of the corpus, f(w)is the number of times the word w occurs in the corpus and f(w 1, w 2 ) is the number of times both w 1 and w 2 occur together in a sentence bead. 3 Query Expansion in Cross Language Information ....
....with that terminology, in Cross Language Information Retrieval. Terms Extraction for a Feedback Loop According to previous researches [1] 4] query expansion before and after translation improves the effectiveness of an information retrieval. In our case, we used the mutual information [2] to select and add those terms, which occur most often with the original query terms. Previous results showed that results based on the mutual information are significantly worst that those based on the log likelihood ratio or chi square test or modified dice coefficient [3] For an efficient use ....
Gale, W. A. and Church, K. : "Identifying word correspondences in parallel texts". Proceedings of the 4 th DARPA Speech and Natural Language Workshop, (1991). P.152-157.
....is a translation of the Japanese word J , then one would expect that when the Japanese word J is present in a Japanese sentence, its English translation E would also appear in the paired English sentence. A number of statical measures, such as mutual information, likelihood ratio test based [9, 8], have been developed to compute the association significance between two events. We used the likelihood ratio test based measure developed by Dunning [8] to compute the association strength between a pair of Japanese English words. From the aligned sentences, we constructed a contingency table ....
W. A. Gale and K. W. Church. Identifying word correspondences in parallel texts. In Proceedings of the Fourth DARPA Speech and Natual Language Workshop, pages 152--157, Pacific Grove, CA, 1991.
....We currently use two cost functions. AlshawiBangaloreDouglasFinal.tex; 1 09 1999; 17:19; p.8 9 Figure 5. Hierarchical alignment of I want to make a collect call with quiero hacer una llamada de cobrar The first, and primary, cost function is the OE correlation measure (cf the use of OE 2 in Gale and Church, 1991) computed as follows: OE = bc Gamma ad) p (a b) c d) a c) b d) where a = nw Gamma nw;v b = nw;v c = N Gamma n v Gamma nw nw;v d = n v Gamma nw;v N is the total number of bitexts, n v the number of bitexts in which v appears in the target, nw the number of bitexts in which ....
Gale, W. and K. Church: 1991, `Identifying word correspondences in parallel texts'. In: Proceedings of the Fourth DARPA Speech and Natural Language Processing Workshop. Pacific Grove, California, pp. 152--157.
....statistical function needs to indicate the strength of co occurrence correlation between source and target words, which we assume is indicative of carrying the same semantic content. Our preferred choice of statistical measure for assigning the costs is the so called OE correlation measure (Gale and Church, 1991). We apply this statistic to co occurrence of the source word with all its possible translations in the dataset examples. We have found that, at least for our data, this measure leads to better performance than the use of the log probabilities of target words given source words (cf Brown et al. ....
Gale, W.A. and K.W. Church. 1991. Identifying word correspondences in parallel texts. In Proceedings of the Fourth DARPA Speech and Natural Language Processing Workshop, pages 152--157, Pacific Grove, California.
....be zero, one, or several target language words. The assignment of translation pairing costs (effectively a statistical bilingual dictionary) may be done using various statistical measures. Our preferred choice of statistical measure for assigning the costs is the so called f correlation measure ([6]) We apply this statistic to co occurrence of the source word with all its possible translations in the dataset examples. We have found that, at least for our data, this measure leads to better performance than the use of the log probabilities of target subsequences given source words (cf [4] ....
W.A. Gale and K.W. Church. Identifying word correspondences in parallel texts. In Proceedings of the Fourth DARPA Speech and Natural Language Processing Workshop, pages 152-157, Pacific Grove, California. 1991.
....we do not. Finally, his search for word meanings is most analogous to a version space search, while ours is a greedy search. This work also has ties to the work on automatic construction of translation lexicons (Wu Xia 1995; Melamed 1995; Kumano H 1994; Catizone, Russell, Warwick 1993; Gale Churck 1991). While most of these methods also compute association scores between pairs (in their case, word word pairs) and use a greedy algorithm to choose the best translation(s) for each word, they do not take advantage of the constraints between pairs. One exception is Melamed (1996) however, his ....
Gale, W., and Churck, K. 1991. Identifying word correspondences in parallel texts. In Proceedings of the Fourth DARPA Speech and Natural Language Workshop.
.... Assistance (UC DATA) University of California at Berkeley, CA 94720 gey ucdata.berkeley.edu Abstract Recent years have seen active research in the statistical derivation of bilingual lexicons from bilingual corpora within the machine translation and computational linguistic communities [1, 4, 6, 7, 12, 13, 15, 17]. Bilingual lexicons have applications in machine translation, bilingual lexicography, and crosslanguage information retrieval. This paper describes the automatic construction of a Japanese English lexicon from a Japanese English collection of summaries of technical conference papers using ....
....of the Japanese word J , then one would expect that when the Japanese word J is present in a Japanese sentence, its English translation E would also appear in the paired English sentence. A number of statical measures, such as mutual information, 2 like, likelihood ratio test based [7, 5], have been developed to compute the association signi cance between two events. We used the likelihood ratio test based measure developed by Dunning [5] to compute the association strength between a pair of Japanese English words. From the aligned sentences, we constructed a contingency table for ....
William A. Gale and Kenneth W. Church. Identifying word correspondences in parallel texts. In Proceedings of the Fourth DARPA Speech and Natual Language Workshop, pages 152-157, Pacic Grove, CA, 1991.
....their corpus of sentence pairs (a portion of the Hansard data) They do this by means of a particular version of the EM algorithm (Dempster al. 10] which should allow them to obtain complete coverage. However, the authors do not discuss the level of precision of their results. Gale Church [13] introduce a method for identifying some of the word correspondences in texts that have already been aligned at the sentence level. They first determine a set word pairs that are strongly associated in the sentence pairs. This is done by applying a c 2 like statistic to two by two contigency ....
Gale W., Church K., Identifying Word Correspondences in Parallel Texts, Proceedings of DARPA SLS Workshop, 1991.
....measured the distance in words rather than in characters. General bitext mapping algorithms are a recent invention. So far, most researchers interested in co occurrence of mutual translations have relied on bitexts where sentence boundaries (or other text unit boundaries) were easy to find (e.g. Gale Church, 1991; Kumano Hirakawa, 1994; Fung, 1995; Melamed, 1995) Aligned text segments suggest a boundary based model of cooccurrence, illustrated in Figure 2. For bitexts involving languages with similar word order, a more accurate combined model of co occurrence can be built using both segment boundary ....
W. Gale & K. W. Church. (1991) "Identifying Word Correspondences in Parallel Texts," Proceedings of the DARPA SNL Workshop. Asilomar, CA.
....specifying a priori probabilities or likelihood scores. Existing automatic methods for constructing N best translation lexicons rely on the availability of large training corpora of parallel texts in the source and target languages. For some methods, the corpora must also be aligned by sentence [Bro93, Gal91a]. Unfortunately, such training corpora are available for only a handful of language pairs, and the cost to create enough training data manually for new language pairs is very high. This paper presents 1. a new automatic evaluation method for N best translation lexicons, 2. a filter based approach ....
....pairs involving S or T in the same sentence pair. An alignment filter is based on the relative positions of S and T in their respective texts[Dag93] The decision procedure used to select lexicon entries from the multiset of candidate translation pairs is a variation of the method presented in [Gal91a]. Dun93] found binomial log likelihood ratios to be relatively accurate when dealing with rare tokens. This statistic was used to estimate dependencies between all co occuring (source word, target word) pairs. For each source word S, target words were ranked by their dependence with S. The top N ....
W. Gale & K. W. Church, "Identifying Word Correspondences in Parallel Texts," Proceedings of the DARPA SNL Workshop, 1991.
No context found.
Gale, W. A., & Church, K. W. (1991a). "Identifying word correspondences in parallel texts". Fourth DARPA Workshop on Speech and Natural language, Asilomar, California.
No context found.
W. A. Gale and K. W. Church. Identifying word correspondences in parallel texts. In Proc. of the Speech and Natural Language Workshop, page 152, Pacific Grove, CA, 1991.
No context found.
W.A. Gale and K.W. Church. Identifying Word Correspondences in Parallel Texts. In Proceedings of the 4th Speech and Natural Language Workshop, pp. 152--157. DARPA, Morgan Kaufmann.
No context found.
W. Gale and K. Church. 1991. Identifying word correspondences in parallel texts. In Proceedings of the Forth Darpa Speech and Natural Language Processing Workshop, pp. 152--157 Pacific Grove, CA.
No context found.
W. Gale and K. Church (1991). "Identifying word correspondences in parallel texts," In Fourth DARPA Workshop on Speech and Natural Language, Morgan Kaufmann Publishers, pp. 152--157.
No context found.
Gale, W. A. and K. W. Church, (1991b). Identifying Word Correspondences in Parallel Texts, In Proceedings of the Fourth DARPA Speech and Natural Language Workshop, 152-157, Pacific Grove, CA, USA.
No context found.
W. Gale & K. W. Church, "Identifying Word Correspondences in Parallel Texts," DARPA SNL Workshop, 1991.
No context found.
Gale, W. A. and Church, K. W. (1991) \Identifying word correspondences in parallel texts." Proc. of DARPA Speech and Natural Language Workshop. p. 152-157.
No context found.
W. Gale & K. W. Church. (1991b) "Identifying Word Correspondences in Parallel Texts," Proceedings of the DARPA SNL Workshop. Asilomar, CA.
No context found.
W. Gale and K. Church (1991). "Identifying word correspondences in parallel texts," in Fourth DARPA Workshop on Speech and Natural Language, Morgan Kaufmann Publishers, pp. 152--157.
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC