Results 1 -
3 of
3
Automatic Semantic Subject Indexing of Web Documents in Highly In ected Languages
"... Abstract. Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain o ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Structured semantic metadata about unstructured web documents can be created using automatic subject indexing methods, avoiding laborious manual indexing. A succesful automatic subject indexing tool for the web should work with texts in multiple languages and be independent of the domain of discourse of the documents and controlled vocabularies. However, analyzing text written in a highly in ected language requires word form normalization that goes beyond rule-based stemming algorithms. We have tested the state-of-the art automatic indexing tool Maui on Finnish texts using three stemming and lemmatization algorithms and tested it with documents and vocabularies of di erent domains. Both of the lemmatization algorithms we tested performed signi cantly better than a rule-based stemmer, and the subject indexing quality was found to be comparable to that of human indexers. 1
General Terms
"... In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company’s query log as a ‘gold standard ’ test set we demonstrate that biases in sitemap-derived ..."
Abstract
- Add to MetaCart
In real world use of test collection methods, it is essential that the query test set be representative of the work load expected in the actual application. Using a random sample of queries from a media company’s query log as a ‘gold standard ’ test set we demonstrate that biases in sitemap-derived and top n query sets can lead to significant perturbations in engine rankings and big differences in estimated performance levels.
HealthFinland —a National Semantic Publishing Network and Portal for Health Information
"... Providing citizens with reliable, up-to-date and individually relevant health information on the web is done by governmental, non-governmental, business and other organizations. Currently the information is published with little co-ordination and co-operation between the publishers. For publishers, ..."
Abstract
- Add to MetaCart
Providing citizens with reliable, up-to-date and individually relevant health information on the web is done by governmental, non-governmental, business and other organizations. Currently the information is published with little co-ordination and co-operation between the publishers. For publishers, this means duplicated work and costs due to publishing same information twice on many websites. Also maintaining links between websites requires work. From the citizens point of view, finding content is difficult due to e.g. differences in layman’s vocabularies compared to medical terminology and difficulties in aggregating information from several sites. To solve these problems, we present a national scale semantic publishing system HealthFinland which consists of a 1) a centralized content infrastructure of health ontologies and services with tools, 2) a distributed semantic content creation channel based on several health organizations, and 3) an intelligent semantic portal aggregating and presenting the contents from intuitive and health promoting end-user perspectives for human users as well as for other websites and portals.

