Results 1 -
3 of
3
Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian
"... This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora.We first studied the monolingual search queries posed by the visitors of the website of the Nordic council containing five different languages. In order to compare how well different types of bilingual dictionaries covered the most common queries and terms on the website we tried a collection of ordinary bilingual dictionaries, a small manually constructed trilingual dictionary and an automatically constructed trilingual dictionary, constructed from the news corpus in the website using Uplug. The precision and recall of the automatically constructed Swedish-English dictionary using Uplug were 71 and 93 percent, respectively. We found that precision and recall increase significantly in samples with high word frequency, but we could not confirm that POS-tags improve precision. The collection of ordinary dictionaries, consisting of about 200 000 words, only cover 41 of the top 100 search queries at the website. The automatically built trilingual dictionary combined with the small manually built trilingual dictionary, consisting of about 2 300 words, and cover 36 of the top search queries.
1 Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian languages
"... This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the web-site of the Nordic council containing six different language ..."
Abstract
- Add to MetaCart
This paper presents how we adapted a website search engine for cross language information retrieval, using the Uplug word alignment tool for parallel corpora. We first studied the monolingual search queries posed by the visitors of the web-site of the Nordic council containing six different languages. In order to compare how well different types of bilingual dic-tionaries covered the most common que-ries and terms on the website we tried a collection of ordinary bilingual diction-aries, a small manually constructed tri-lingual dictionary and an automatically constructed trilingual dictionary, con-structed from the news corpus in the website using Uplug. The precision and recall of the automatically constructed Swedish-English dictionary using Uplug were 71 and 93 percent, respectively. We found that precision and recall increase significantly in samples with high word frequency, but we could not confirm that POS-tags improve precision. The collec-tion of ordinary dictionaries, consisting of about 200 000 words, only cover half of the top 100 search queries at the web-site. The automatically built trilingual dictionary combined with the small manually built trilingual dictionary con-sists of about 2000 words and covers 27 of the top 100 search queries.