20 citations found. Retrieving documents...
J.-Y. Nie, M. Simard, Isabelle P., and Durand R. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval., 1999.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Evaluating Multi-lingual Information Retrieval and Clustering .. - Atsushi Fujii Yy (2001)   (Correct)

....and documents need to be standardized into a common representation, so that monolingual retrieval techniques can be applied. From this point of view, we classify existing CLIR methods into the following three fundamental categories. The first method translates queries into the document language [1, 7, 14]. On the other hand, the second method translates documents into the query language [13, 15] The third method projects both queries and documents into a language independent space by way of thesaurus classes [9, 18] and latent semantic indexing [2, 11] We shall call those methods, query ....

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--81, 1999.


Applying Machine Translation to Two-Stage - Cross-Language Information.. (2000)   (Correct)

....as can be predicted, CLIR needs to standardize queries and documents into a common representation, so that monolingual IR techniques can be applied. From this point of view, existing CLIR can be classified into three approaches. The first approach translates queries into the document language [2, 4, 5, 16], while the second approach translates documents into the query language [13, 17] The third approach projects both queries and documents into a languageindependent representation by way of thesaurus classes [6, 18] and latent semantic indexing [3, 11] Although extensive comparative experiments ....

Jian-Yun Nie, Michel Simard, Pierre Isabelle, and Richard Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 74--81, 1999.


Harvesting Translingual Vocabulary Mappings for Multilingual .. - Larson, Gey, Chen (2002)   (Correct)

....translated from the original language page. Algorithms and software must be developed to mine these web pages and to extract parallel text fragments (sentences, paragraphs, documents) which can serve the same purposes as the parallel reservoirs of literature published by socio political entities [15, 12]. Web resources of this type are less likely to be developed with the same degree of attention to detail as translations done by professional translators for official government purposes. That is to say, from a statistical point of view, they are noisy channels. In addition, if the desire is to go ....

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999.


Harvesting Translingual Vocabulary Mappings for Multilingual .. - Larson, Gey, al. (2002)   (Correct)

....translated from the original language page. Algorithms and software must be developed to mine these web pages and to extract parallel text fragments (sentences, paragraphs, documents) which can serve the same purposes as the parallel reservoirs of literature published by socio political entities [15, 12]. Web resources of this type are less likely to be developed with the same degree of attention to detail as translations done by professional translators for official government purposes. That is to say, from a statistical point of view, they are noisy channels. In addition, if the desire is to go ....

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In SIGIR '99: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, August 15-19, 1999.


NTCIR-3 Patent Retrieval Experiments at ULIS - Fujii, Ishikawa   (Correct)

....documents need to be standardized into a common representation, so that monolingual retrieval techniques can be applied. From this point of view, existing CLIR methods are classified into the following three fundamental categories. The first method translates queries into the document language [1, 8, 17], and the second method translates documents into the query language [16, 18] The third method projects both queries and documents into a language independent space by way of thesaurus classes [10, 21] and latent semantic indexing [3, 14] Among those above methods, the first method (i.e. query ....

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--81, 1999.


Translation Knowledge Acquisition from Cross-Lingually Relevant.. - Utsuro   (Correct)

.... against bilingual newspaper articles available on WWW news sites [Xu99, Hasan01] Another type of previous works on finding parallel document pairs is an approach of collecting parallel document URLs from WWW by examining clues in the URLs and the structures of the HTML source texts (e.g. [Resnik99, Nie99]) Previously studied techniques of estimating bilingual term correspondences from non parallel corpora do not rely on the process of finding parallel document pairs. They are mostly based on the idea that semantically similar words appear in similar contexts [Kaji96, Tanaka96, Fung98, Rapp99, ....

....that the CLIR process benefits from the results of translation knowledge acquisition. Compared with our approach of employing bilingual news articles on WWW news sites as a source for translation knowledge acquisition, the techniques for finding parallel document pairs from general WWW texts [Resnik99, Nie99] and those for acquiring translation knowledge from partially bilingual texts on the WWW [Nagata01] or based on collocations in the monolingual texts on the WWW ICao02] have one advantage: i.e. that it is applicable to various domains that infrequently become topics of news articles, although ....

Nie, J.-Y., et al.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web, Proc. 22nd SIGIR, pp. 74 81 (1999). loch02] Och, F. J. and Ney, H.: Discriminative Training and Maximum Entropy Models for Statistical Machine Translation, Proc. JOth ACL, pp. 295 302 (2002).


Query Term Disambiguation for Web Cross-Language.. - Maeda, Sadat.. (2000)   (Correct)

....bigram statistics. For the disambiguation method, various approaches have been proposed[5, 6] such as using the first term listed in the dictionary, using relevance feedback, and using a parallel or a comparable corpus. However, such bilingual corpora are usually not readily available. Nie et al.[7] proposes a method to automatically gather parallel texts from the Web and use them for query term selection. However, for language pairs other than English French in their case, the amount of parallel documents on the Web might not always be enough. Therefore, query term disambiguation method ....

Nie, J., Simard, M., Isabelle, P. and Durand, R. Cross- language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM S1G1R Conference on Research and Development in Information Retrieval (S1G1R'99), 1999, pp. 74-81.


Internet Evolution And Progress In Full Automatic French.. - Vaufreydaz, Gery (2001)   (2 citations)  (Correct)

....need to correct errors, we can obtain more reliable corpus than before. We are now investigating other properties of the Internet data in language modelling. For example, we are working on topic detection using such data in order to increase performance of very large vocabulary system. Besides, in [8], experiments on automatic aligned multilingual texts have already been done in information retrieval research. So, we work too on multilingual language modelling for international speech recognition engine and for speech to speech translation system. 7. ....

Nie J.Y., Simard M., Isabelle P. Durand R., Cross-Language Information Retrieval Based on Parallel Texts and Automatic Mining of Parallel Texts from the Web,22 nd Annual International ACM SIGIR, pp. 74-81, Berkeley, CA, USA, August 1999.


TNO at CLEF-2001: Comparing Translation Resources - Kraaij (2001)   (1 citation)  (Correct)

....successful use of the Babelfish translation service based on Systran, the Powertranslator system from L H and several other systems[10] 1] 3. Some groups exploited parallel or comparable corpora for CLIR. Parallel corpora can be used to train probabilistic bilingual translation dictionaries[12]. Comparable corpora can be used to generate similarity thesauri[1] In this paper we will compare these three types of resources in a quantitative and qualitative way. For the machine readable dictionaries we used the VLIS lexical database of lexicon publisher Van Dale. For a description of the ....

....on the Babelfish MT service and word by word translation based on dictionaries derived from a collection of parallel web documents. For CLEF 2000 we had already developed three parallel corpora based on web pages in close cooperation with RALI, Universite de Montreal. We used the PTMiner tool [12] to locate web pages which have a high probability to be translations of each other. The mining process consisted of the following steps: 1. Query a web search engine for web pages with a hyperlink anchor text English version and respective variants. 2. For each web site) Query a web search ....

J.Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts an d automatic mining of parallel texts in the web. In Proceedings of the 22nd ACMSIGIR Conference on Research and Development in Information Retrieval (SIGIR99), pages 74--81, 1999.


Translation Resources, Merging Strategies and.. - Hiemstra, Kraaij, ..   (Correct)

....web pages in close cooperation with RALI, Universite de Montreal. RALI already had developed an English French parallel corpus of web pages, so it seemed interesting to investigate the feasibility of a full multilingual system based on web derived lexical resources only. We used the PTMiner tool [8] to find web pages which have a high probability to be translations of each other. The mining process consists of the following steps: 1. Query a web search engine for web pages with a hyperlink anchor text English version and respective variants. 2. For each web site) Query a web search ....

....are discarded by the language identification step. These parallel corpora have been used in di#erent ways: i) to refine the estimates of translation probabilities of a dictionary based translation system (corpus based probability estimation) ii) to construct simple statistical translation models [8]. The former application will be described in more detail in Section 5.2 the latter in Section 5.3. The translation models for English Italian and English German, complemented with an already existing model for EnglishFrench formed also the basis for a full corpus based translation multilingual ....

J.Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts in the web. In ACM-SIGIR'99, pages 74--81, 1999. Hiemstra, Kraaij, Pohlmann and Westerveld


NTCIR-2 Experiments at Matsushita: Monolingual and.. - Sato Mitsuhiro Msato   (Correct)

....To construct larger bilingual dictionary may be costly. Constructing larger parallel corpus may be another way of improvement. An advantage of our corpus based approach is that the parallel corpus utilized in our method needs only document level alignments. Using similar strategy reported in [8], we may construct large parallel corpus from Web resources. We also submitted two formal runs for J JE task. We purely combined our result of J J run and J E run, according to the proportion of Japanese English documents in the target document set (about 2:1 in NTCIR 2) D J JE : Combined D CLS ....

Nie, J-Y. Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web, Proc. of SIGIR99 (1999), 74-81.


Twenty-One at CLEF-2000: Translation resources.. - Hiemstra, Kraaij, ..   (Correct)

....web pages in close cooperation with RALI, Universitede Montreal. RALI already had developed an English French parallel corpus of web pages, so it seemed interesting to investigate the feasibility of a full multilingual system based on web derived lexical resources only. We used the PTMiner tool [7] to find web pages which have a high probability to be translations of each other. The mining process consists of the following steps: 1. Query a web search engine for web pages with a hyperlink anchor text English version and respective variants. 2. For each web site) Query a web search ....

.... sizes during corpus construction These parallel corpora have been used in different ways: i) to refine the estimates of translation probabilities of a dictionary based translation system (corpus based probability estimation) ii) to construct simple statistical translation models (IBM model 1) [7] . The former application will be described in more detail in Section 5.2 the latter in Section 5.3. The translation models for English Italian and English German, complemented with an already existing model for English French formed also the basis for a full corpus based translation multilingual ....

J.Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts in the web. In ACM-SIGIR'99, pages 74--81, 1999.


Detection of Translational Equivalence - Smith (2001)   (1 citation)  (Correct)

....artificial meaning representations in favor of the most natural ones the strings themselves. 4 1.1 Potential Applications I suggest four practical applications of such a scoring function. The first is in parallel corpus construction using systems like STRAND [Res99] and Nie et al. s system [NSID99]. STRAND is a a tool which automatically discovers World Wide Web pages which may be mutual translations (in a given language pair) then filters the candidates based on structural similarity evidenced by the language independent markup tags present in the documents. While the precision of STRAND ....

....from two lists of candidate pages. Given two sets of documents, STRAND attempts to generate an assignment 2 between them, but because of the high computational cost of comparing the contents of each possible document pair, STRAND produces candidates based solely on the URL strings. Nie et al. [NSID99] and Chen and Nie [CN00] used a similar approach. The markup filter is then applied, but if the candidates are wrongly paired, translation pairs may be lost simply because they weren t paired based on URL string similarity. A second application considers text strings of shorter length; computing ....

[Article contains additional citation context not shown here]

Nie, Jianyn, Michel Simard, Pierre Isabelle, and Richard Durand (1999). Cross-Language Information Retrieval Based on Parallel Texts and Automatic Mining Parallel Texts from the Web. ACM-SIGIR Conference, Berkeley, California.


Parallel Web Text Mining for Cross-Language IR - Chen, Nie (2000)   (1 citation)  (Correct)

....pair of parallel texts is two such texts that are translation one for the other. In our work, the need for parallel corpora is originated from the research of cross language information retrieval (CLIR) using statistical translation model. There are several approaches to query translation in CLIR (Nie et al. 1999): using a machine translation (MT) system, using a bilingual dictionary or terminology base, and using a statistical translation model. For dictionary based translation, because one word could have di erent meanings, it is very dicult to give a correct translation among many choices. MT systems ....

....Experiments The nal goal of this work is to test the performance in CLIR of the translation models trained by the Web parallel corpora. We give some preliminary results as the following. 4.3. 1 English French CLIR The experiment results of English French CLIR have been given in the recent paper (Nie et al. 1999). Here we brie y show some results. The experiments were actually conducted on the English AP and French SDA corpora in TREC6 and TREC7, using the Smart information retrieval system. Two translation models have been used, one trained with the parallel texts from the Web, and the other trained ....

Nie, Jianyun, Michel Simard, Pierre Isabelle, and Richard Durand. 1999. Cross-language information retrieval based on parallel texts and automatic mining parallel texts from the Web. In ACM SIGIR'99, pages 74-81, August.


Using Statistical Translation Models for Bilingual IR - Jian-Yun Nie Michel   Self-citation (Nie Simard)   (Correct)

No context found.

J.Y. Nie, P. Isabelle, M. Simard, R. Durand, Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web, ACM-SIGIR conference, Berkeley, CA, pp. 7481 (1999).


CLIR using a Probabilistic Translation Model based on Web Documents - Nie (2000)   (1 citation)  Self-citation (Nie)   (Correct)

....For other languages (e.g. Chinese and English) such a corpus is less (or not at all) available. In order to solve this problem, we conducted a text mining project in the Web in order to find parallel texts automatically. The first experiments with the mined documents have been described in [Nie99]. The experiments were done with a subset (5000) of the mined documents. However, they showed that the approach is feasible. In TREC8, we intend to evaluate the performance of a probabilistic model trained with all the parallel documents we found (about 20 000 pairs) The mining process is ....

....model RaliHanF2EF: Using French queries and the Hansard model RaliWebE2EF: Using English queries and the Web model RaliWebF2EF: Using French queries and the Web model What we can also observe in the above table is that the Web model performs generally slightly better than the Hansard model. In [Nie99], with the limited web model trained with 5000 pairs of parallel texts, the performance was not as good as that of the Hansard model. The above tables show that with enough parallel texts from the Web (actually about the same volume of texts as in the Hansard) we can do as well as with a well ....

[Article contains additional citation context not shown here]

J.Y. Nie, P. Isabelle, M. Simard, R. Durand, Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web, ACM-SIGIR conference, Berkeley, CA, pp. 74-81(1999).


Report on CLEF-2003 experiments: two ways of.. - Cancedda, Dejean, .. (2003)   (Correct)

No context found.

J.-Y. Nie, M. Simard, Isabelle P., and Durand R. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval., 1999.


Building Bilingual Dictionaries From Parallel Web - Documents Craig Mcewan   (Correct)

No context found.

Nie, J-Y., Simard, M., Isabelle, P. and Durard, R.: Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web. In Proceedings of the 22 nd Information Retrieval (ACM SIGIR'99), Berkeley, p74-81. 1999.


NTCIR-3 Cross-Language IR Experiments at ULIS - Atsushi Fujii Tetsuya   (Correct)

No context found.

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--81, 1999.


Improved Cross-Language Retrieval - Using Backoff Translation   (Correct)

No context found.

J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In M. Hearst, F. Gey, and R. Tong, editors, Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--81, Aug. 1999.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC