23 citations found. Retrieving documents...
I.D. Melamed "Automatic Eval- uation and Uniform Filter Cascades for In- ducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Detection of Translational Equivalence - Smith (2001)   (1 citation)  (Correct)

....et al. SFI92] used cognates to improve sentence alignment in parallel corpora. Knight and Graehl [KG97] explored transliteration of English characters to Japanese katakana using a generative sourcechannel model; the performance their system attained was better than that of human judges. Melamed [Mel95] used a string similarity measure based on character identity and string length to identify cognates; this was extended by Tiedemann [Tie99] Work by Smith and Jahr [WS99] showed that cognate classifiers like Tiedemann s could be learned from a bilingual dictionary, and that these classifiers ....

....My approach constructs such a function from a sentence aligned parallel corpus, and it assumes very little linguistic similarity between the two languages 7 . 4.1. 1 LCSR and HSCR Tiedemann [Tie99] explored ways in which language independent versions of the Least Common Substring Ratio (LCSR, [Mel95]) could be derived from a set of known cognates in the language pair of interest. The LCSR is the ratio of the longest substring of the characters which are common to the two types in the pair (LCS) this subset need not be consecutive to the length (in characters) of the longer word in the ....

Melamed, I. Dan (1995). Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons. Third Workshop on Very Large Corpora, Boston, Massachusetts.


Semantic Lexicon Acquisition for Learning Natural language.. - Thompson, Mooney (1998)   (6 citations)  (Correct)

....while ours is batch. While he argues for psychological plausibility, we do not. Finally, his search for word meanings is most analogous to a version space search, while ours is a greedy search. This work also has ties to the work on automatic construction of translation lexicons (Wu Xia 1995; Melamed 1995; Kumano H 1994; Catizone, Russell, Warwick 1993; Gale Churck 1991) While most of these methods also compute association scores between pairs (in their case, word word pairs) and use a greedy algorithm to choose the best translation(s) for each word, they do not take advantage of the ....

Melamed, I. 1995. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons.


Pivot Alignment - Borin (1999)   (Correct)

....word alignment depends on the use of many sources of information in concert 1 . Distributional parallelism, coocurrence, string similarity (both between and within languages) and part of speech are some such information sources used in previous research (see, e.g. Tiedemann 1998, 1999a, 1999b; Melamed 1995, 1998) In the ETAP project we have so far concentrated on linguistically rich information sources, such as word similarity The research reported here was carried out within the ETAP project, supported by the Bank of Sweden Tercentenary Foundation as part of the research programme Translation and ....

Melamed, I. Dan 1995. Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons. Proceedings of the Third Workshop on Very Large Corpora. Boston, Massachusetts.


Issues in Cross-Language Retrieval from Document Image.. - Douglas Oard   (Correct)

....languages share a common character set, one simple technique is to retain unrecognized terms in the hope that they might be names or some other strings that would have the same representation in the source and target languages. More sophisticated cognate matching techniques can be applied (cf. [7]) and techniques which account for character recognition errors and character set differences are also available (cf. 5] Corpora (collections of documents that that use terms in representative ways) provide another source of translation knowledge that can be used alone or in conjunction ....

I. Dan Melamed. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In Third Workshop on Very Large Corpora, 1995. http://www.cis.upenn.edu/ ¸melamed/.


Semantic Lexicon Acquisition for Learning Natural Language.. - Thompson (1998)   (6 citations)  (Correct)

....limited to a single sentence of context, but an entire story. 7.2 Other Related Work 7.2. 1 Translation Lexicons This work also has ties to the work on automatic construction of translation lexicons (Gale Church, 1991; Catizone, Russell, Warwick, 1993; Kumano Hirakawa, 1994; Wu Xia, 1995; Melamed, 1995). These systems use input in the form of aligned pairs of sentences in two different natural languages. While most of these methods also compute association scores between pairs (in their case, word word pairs) and use a greedy algorithm to choose the best translation(s) for each word, they do not ....

Melamed, I. (1995). Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In Proceedings of the Third Workshop on Very Large Corpora.


Automatic Construction of Semantic Lexicons for Learning.. - Thompson, Mooney (1999)   (2 citations)  (Correct)

....is part of the meaning of capital, then in the second stage learn that capital can have either one or two arguments. By using common substructures, we can combine these two stages in Wolfie. This work also has ties to the work on automatic construction of translation lexicons (Wu Xia 1995; Melamed 1995; Kumano Hirakawa 1994; Catizone, Russell, Warwick 1993; Gale Church 1991) While most of these methods also compute association scores between pairs (in their case, word word pairs) and use a greedy algorithm to choose the best translation(s) for each word, they do not take advantage of the ....

Melamed, I. 1995. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons.


Experiments in Multilingual Sentence Boundary Recognition - Palmer   (Correct)

....sentence boundaries in German and French corpora. Keywords: sentence boundary, neural network, multilingual, corpus 1 Introduction An important step in many multilingual text processing tasks, including sentence alignment (Kay Roscheisen 93; Gale Church 93) automatic lexicon construction (Melamed 95) and knowledge base acquisition for machine translation (Mitamura et al. 93) is the segmentation of texts into individual sentences. The sentence boundary recognition system SATZ, Palmer Hearst 94) Palmer 94) provides an efficient, trainable algorithm for this task. The approach in the ....

I. Dan Melamed. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. Proceedings of the Workshop on Very Large Corpora 1995, 1995.


A Statistical Translation Tool With Aligned Texts - Mattsson (1996)   (Correct)

....number of rejected sentences varies with the weather; when unusual or unexpected phenomena occur, the number of rejected sentences drastically increases. 1. 2 Statistical systems A more recent experiment, concerning translation with statistical methods similar to Dilemma s, is reported by Melamed [18]. The system consists of a comparison machinery connected to a variable set of cascaded filters. Each filter treats pairs of words uniformly, but from different perspectives and with different knowledge sources. The capacity approaches human performance and dramatically improves the quality, even ....

....limited: An increased number of words induce more equivalence classes, which in turn lead to more alternatives in the choice of words. The result with a very large corpus might be increased confusion and discriminating of potentially correct translations. Another example comes from Melamed s system[18] with cascaded uniform filters (see section 1.2) It appeared that the filter with the best translating performance actually got worse performance when the size of the database increased. When it comes to the content, the corpus used is definitely representative for the type of task that for which ....

[Article contains additional citation context not shown here]

I. Dan Melamed (1995) "Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons". Proceedings of the Third Workshop on Very Large Corpora


Extraction of Translation Equivalents from Parallel Corpora - Tiedemann (1998)   (4 citations)  (Correct)

.... algorithm based on character comparison is the longest common sub sequence ratio (LCSR) which is defined as follows: The LCSR score is calculated by the length of the longest common, not necessarily contiguous, sub sequence of characters divided by the character length of the longer string ([Mel95]) Mostly, even more simple algorithms represent sufficient measures for string similarity. We used algorithms to search initial and final character sequences, and algorithms to compare fixed character sequences. More complex algorithms may raise the recall but not the precision of results when ....

I. Dan Melamed. Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons. In Proceedings of the 3rd Workshop on Very Large Corpora, Boston/Massachusetts, 1995.


Automatic Processing of Parallel Corpora: A Swedish.. - Ahrenberg, Merkel..   (Correct)

....the user with a concordance tool for manual filtering of the data. The filtering approach for rating candidate word pairs An approach developed by I. Dan Melamed at the University of Pennsylvania in Philadelphia is directed towards the automatic evaluation of lexicon by applying several filters [Mel95] The filters use external knowledge sources and heuristics. First, all the source language words and all the target language words of a sentence alignment pair are combined into word pairs. Then filters are applied in cascades to find the N best (e.g. 7 best) translations among the translation ....

.... bilingual dictionaries (MRBD) If a translation candidate appears in the MRBD, all pairs with the same source language word and a different target language word, and all pairs with the same target language word and a different source language word occurring in the same sentence pair are removed [Mel95] In other words, the translation of the source language word from the MRBD is assumed to be correct and all the other target language words from the same sentence alignment pair are disregarded as translation equivalents. Cognate filters are based on the assumption that there are similarities ....

[Article contains additional citation context not shown here]

I. Dan Melamed. Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons. In Proceedings of the 3rd Workshop on Very Large Corpora, Boston/Massachusetts, 1995.


A Geometric Approach to Mapping Bitext Correspondence - Melamed (1996)   (11 citations)  Self-citation (Melamed)   (Correct)

....is a translation lexicon. Trans lation lexicons can be extracted from machine readable bilingual dictionaries (MRBDs) in the rare cases where MRBDs are available. In other cases, they can be induced automatically using any of several existing methods (Dagan et al. 1993, Fung Church 1991, Melamed 1995) Since the matching predicate does not require perfect accuracy, the induced lexicons need not be perfect. When a large translation lexicon is not available, a small hand constructed translation lexicon for the key terms in a given bitext may suffice to produce a rough map for that bitext. If ....

....orthography and or pronunciation. Languages that are closely relaed will often share a large number of cognates. For example, in the non technical Canadian Hansards (parliamentary debate transcripts available in English and French) cognates can be found for roughly one quarter of all text tokens (Melamed 1995). A cognate based matching predicate will generate more points for more similar language pairs, and for text genres where more word borrowing occurs, such as technical texts. For English and French, such a matching predicate can generate enough points in the bitext space to obviate the need for a ....

[Article contains additional citation context not shown here]

I.D. Melamed "Automatic Eval- uation and Uniform Filter Cascades for In- ducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


Measuring Semantic Entropy - Melamed (1997)   (2 citations)  Self-citation (Melamed)   (Correct)

No context found.

I.D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Third Workshop on Very Large Corpora, Boston, MA, 1995.


Statistical Machine Translation - Al-Onaizan, Curin, Jahr, Knight.. (1999)   Self-citation (Melamed)   (Correct)

....8. 2 Finding Cognates Tiedemann describes three methods of extracting a string similarity measure that is suitable for the recognition of cognates in two specific languages [Tiedemann, 1999] These methods are based on the 32 longest common subsequence ratio (LCSR) measure of string similarity [Melamed, 1995]. They consist of a set of weights for each mapping of a character in the first language to a character in the second language. The weights are learned from a set of known cognate pairs. We used the simplest of his methods, which maps single characters to single characters only, and within certain ....

....In Candide (GIZA) Model 1 training, the t function is based on co occurrence data. A pair of tokens is said to co occur once for every time they appear in parallel sentences. Melamed describes an oracle filter which uses a bilingual lexicon to limit events which can be considered co occurrences [Melamed, 1995]. Essentially, co occurrence is redefined as follows: Given a lexicon and parallel sentences E = fe 1 ; e 2 ; e ng and C = fc 1 ; c 2 ; c mg, e i and c j co occur if either (e i ; c j ) is in the lexicon or (e i ; c k ) is not in the lexicon 8k and (e p ; c j ) is not in the lexicon ....

Melamed, I. Dan. 1995. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In Proceedings of the Third Workshop on Very Large Corpora.


A Geometric Approach to Mapping Bitext Correspondence - Melamed (1996)   (11 citations)  Self-citation (Melamed)   (Correct)

....orthography and or pronunciation. Languages that are closely related will often share a large number of cognates. For example, in the non technical Canadian Hansards (parliamentary debate transcripts available in English and French) cognates can be found for roughly one quarter of all text tokens [Mel95]. A cognate based matching predicate will generate more points for more similar language pairs, and for text genres where more word borrowing occurs, such as technical texts. For English and French, such a matching predicate can generate enough points in the bitext space to obviate the need for a ....

....and a stop list of closed class words for both languages. SIMR judges the cognateness of each token pair by their Longest Common Subsequence Ratio (LCSR) The LCSR of a token pair is the number of characters that appear in the same order in both tokens divided by the length of the longer token [Mel95]. The common characters need not be contiguous. The matching predicate considers a token pair cognates if their LCSR exceeds a certain threshold. The LCSR threshold was optimized together with SIMR s other parameters, as described in Section 3.7. The stop list of closed class words made the ....

[Article contains additional citation context not shown here]

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


A Portable Algorithm for Mapping Bitext Correspondence - Melamed (1997)   (4 citations)  Self-citation (Melamed)   (Correct)

....cognates and translation lexicons. Two tokens in a bitext are cognates if they have the same meaning and similar spellings. In the nontechnical Canadian Hansards (parliamentary debate transcripts available in English and in French) cognates can be found for roughly one quarter of all text tokens (Melamed, 1995). Even distantly related languages like English and Czech will share a large number of cognates in the form of proper nouns. Cognates are more common in bitexts from more similar language pairs, and from text genres where more word borrowing occurs, such as technical texts. When dealing with ....

....heuristics of sentence alignment algorithms can be exploited equally well at the word level. The cognate heuristic of the character based algorithms works better at the word level, because cognateness can be defined more precisely in terms of words, e.g. using the Longest Common Subsequence Ratio (Melamed, 1995). Several other matching heuristics can only be applied at the word level, including the localized noise filter in Section 3.3, lists of stop words and lists of faux amis (Macklovitch, 1995) Most importantly, translation lexicons can only be used at the word level. SIMR can employ a small ....

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


A Scalable Architecture for Bilingual Lexicography - Melamed (1997)   Self-citation (Melamed)   (Correct)

....though users can easily reconfigure the system to take advantage of such resources as language specific stemmers, part of speech taggers, and stop lists when they are available. A data flow diagram for SABLE is on the next page. The following is a brief description of SABLE s main components. See [Mel95, Mel96a, Mel97b, Mel97c] for more details. 2 Tokenization and Stemming A tokenizer s job is to identify the smallest content bearing units in text. A stemmer s job is to replace all morphological variants of one lemma with a unique symbol, without assigning that symbol to other lemmas. Not all stemmers are lemmatizers, ....

....implementation includes a good tokenizer and lemmatizer for English, and fair tokenizers and stemmers for French, Spanish and Korean. optional L2 stemmer optional L1 stemmer process data bitext maps SIMR [Mel96a] input L2 texts input L1 texts co occurrence counter translation lexicon extraction [Mel95,Mel96b] stemmed graded thresholding unstemming output translation lexicon translation lexicon L1 tokenizer L2 tokenizer SABLE data flow diagram for languages L1 and L2. 3 Mapping Bitext Correspondence After both halves of the input bitext(s) have been tokenized, SABLE invokes the Smooth Injective Map ....

[Article contains additional citation context not shown here]

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


Automatic Construction Of Clean Broad-Coverage Translation Lexicons - Melamed   (15 citations)  Self-citation (Melamed)   (Correct)

....the requisite stop lists, I followed the advice of Fung [Fun95] and deleted all function words from the corpus. Though I have yet to confirm the effect, I suspect that I traded a tiny loss in recall for a huge gain in precision. An initial translation lexicon was constructed using the method in [Mel95] with no linguistic filters. The algorithm at the end of Section 3 was run until the model converged. Six iterations were required to reach this point. Table 1 shows some interesting changes at the end of each iteration. First, as expected, right increases while wrong decreases. Second, the first ....

....out to be more difficult than one might think. The results presented here suggest several directions for future work. Previously, I have shown that the statistical construction of translation lexicons from parallel corpora can be gainfully assisted by incorporating various kinds of prior knowledge [Mel95]. I am optimistic that similar assistance can push the precision recall envelope of the present method even higher. I am also eager to investigate methods for relaxing the one to one assumption, so that compound words can receive proper treatment. Most of all, I hope that the new availability of ....

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


Porting SIMR to New Language Pairs - Melamed (1996)   Self-citation (Melamed)   (Correct)

....document explains each step in detail. It assumes that you have read and understood the paper [1] It also assumes that you have SIMR and the porting tools properly installed. 1 Matching Predicate SIMR s matching predicates can be based on any combination of predicate filters and oracle filters [3]. Your choice of matching predicate will depend on the languages that you are dealing with and the linguistic resources that you have at your disposal. If you are dealing with languages that share many cognates, or even phonetic cognates, then your matching predicate should test for cognateness. ....

....that you are dealing with and the linguistic resources that you have at your disposal. If you are dealing with languages that share many cognates, or even phonetic cognates, then your matching predicate should test for cognateness. Cognate predicates using seem to work best with the LCSR metric [3]. If you have access to part of speech (POS) taggers for both languages, then your predicate should test for matching POS. If you have access to a pre existing translation lexicon between your two languages, even a small and noisy one, then by all means use it. The nlplib.pl library that comes ....

I. D. Melamed, "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


Semi-Automatic Acquisition of Domain-Specific Translation.. - Resnik, Melamed (1997)   (5 citations)  Self-citation (Melamed)   (Correct)

....Acquisition of Domain Specific Translation Lexicons Philip Resnik Dept. of Linguistics and UMIACS University of Maryland College Park, MD 20742 USA resnik umiacs.umd.edu I. Dan Melamed Dept. of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 USA melamed unagi.cis.upenn.edu Abstract We investigate the utility of an algorithm for translation lexicon acquisition (SABLE) used previously on a very large corpus to acquire general translation ....

I. Dan Melamed, 1995. Automatic evaluation and uniform filter cascades for inducing n-best translation lexicons. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, Massachusetts.


Measuring Semantic Entropy - Melamed   (2 citations)  Self-citation (Melamed)   (Correct)

No context found.

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Third Workshop on Very Large Corpora, Boston, MA, 1995.


A Word-to-Word Model of Translational Equivalence - Melamed (1997)   (27 citations)  Self-citation (Melamed)   (Correct)

....the bitext to improve the model s accuracy. 2 Co occurrence With the exception of (Fung, 1995b) previous methods for automatically constructing statistical translation models begin by looking at word cooccurrence frequencies in bitexts (Gale Church, 1991; Kumano Hirakawa, 1994; Fung, 1995a; Melamed, 1995). A bitext comprises a pair of texts in two languages, where each text is a translation of the other. Word co occurrence can be defined in various ways. The most common way is to divide each half of the bitext into an equal number of segments and to align the segments so that each pair of segments ....

.... co occurrence relation can also be based on distance in a bitext space, which is a more general representations of bitext correspondence (Dagan et al. 1993; Resnik Melamed, 1997) or it can be restricted to words pairs that satisfy some matching predicate, which can be extrinsic to the model (Melamed, 1995; Melamed, 1997) 3 The Basic Word to Word Model Our translation model consists of the hidden parameters and Gamma , and likelihood ratios L(u; v) The two hidden parameters are the probabilities of the model generating true and false positives in the data. L(u; v) represents the ....

I. D. Melamed "Automatic Evaluation and Uniform Filter Cascades for Inducing N-best Translation Lexicons," Proceedings of the Third Workshop on Very Large Corpora, Boston, MA, 1995.


Automatic Construction of Weighted String Similarity Measures - Tiedemann (1999)   (6 citations)  (Correct)

No context found.

I. Dan Melamed. 1995. Automatic Evaluation and Uniform Filter Cascades for Inducing Nbest Translation Lexicons. In Proceedings of the 3rd Workshop on Very Large Corpora, Boston/Massachusetts.


Finding Terminology Translations From Non-Parallel Corpora - Fung, McKeown (1997)   (3 citations)  (Correct)

No context found.

I. Dan Melamed. 1995. Automatic evaluation and uniform filter cascades for inducing N-best translation lexicons. In Proceedings of the 3rd Annual Workshop on Very Large Corpora, Boston, Massachusettes.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC