Results 1 - 10
of
57
SemTag and Seeker: Bootstrapping the semantic web via automated semantic annotation
- Proceedings of the 12 th International Conference on World Wide Web (WWW’03
, 2003
"... This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatica ..."
Abstract
-
Cited by 120 (4 self)
- Add to MetaCart
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. The final version of this paper will reflect new data labeling one billion pages, rather than the 264 million pages reported on herein. To our knowledge, this is the largest scale semantic tagging effort to date. We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web. 1.
On Deep Annotation
, 2003
"... The success of the Semantic Web crucially depends on the easy creation, integration and use of semantic data. For this purpose, we consider an integration scenario that defies core assumptions of current metadata construction methods. We describe a framework of metadata creation when web pages are g ..."
Abstract
-
Cited by 62 (11 self)
- Add to MetaCart
The success of the Semantic Web crucially depends on the easy creation, integration and use of semantic data. For this purpose, we consider an integration scenario that defies core assumptions of current metadata construction methods. We describe a framework of metadata creation when web pages are generated from a database and the database owner is cooperatively participating in the Semantic Web. This leads us to the definition of ontology mapping rules by manual semantic annotation and the usage of the mapping rules and of web services for semantic queries. In order to create metadata, the framework combines the presentation layer with the data description layer in contrast to "conventional" annotation, which remains at the presentation layer. Therefore, we refer to the framework as deep annotation. t We consider deep annotation as particularly valid because, (/), web pages generated from databases outnumber static web pages, (ii), annotation of web pages may be a very intuitive way to create semantic data from a database and, (iii), data from databases should not be materialized as RDF files, it should remain where it can be handled most efficiently in its databases.
Designing Adaptive Information Extraction for the Semantic Web in Amilcare
- Annotation for the Semantic Web, Frontiers in Artificial Intelligence and Applications. IOS
, 2003
"... ..."
Survey of semantic annotation platforms
- Proceedings of the 2005 ACM Symposium on Applied Computing
, 2005
"... The realization of the Semantic Web requires the widespread availability of semantic annotations for existing and new documents on the Web. Semantic annotations are to tag ontology class instance data and map it into ontology classes. The fully automatic creation of semantic annotations is an unsolv ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
The realization of the Semantic Web requires the widespread availability of semantic annotations for existing and new documents on the Web. Semantic annotations are to tag ontology class instance data and map it into ontology classes. The fully automatic creation of semantic annotations is an unsolved problem. Instead, current systems focus on the semi-automatic creation of annotations. The Semantic Web also requires facilities for the storage of annotations and ontologies, user interfaces, access APIs, and other features to fully support annotation usage. This paper examines current Semantic Web annotation platforms that provide annotation and related services, and reviews their architecture, approaches and performance.
Automatic Semantic Annotation using Unsupervised Information Extraction and Integration
, 2000
"... In this paper we propose a methodology to learn to automatically annotate domain-specific information from large repositories (e.g. Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
In this paper we propose a methodology to learn to automatically annotate domain-specific information from large repositories (e.g. Web sites) with minimum user intervention. The methodology is based on a combination of information extraction, information integration and machine learning techniques. Learning is seeded by extracting information from structured sources (e.g. databases and digital libraries). Retrieved information is then used to partially annotate documents. These annotated documents are used to bootstrap learning for simple Information Extraction (IE) methodologies, which in turn will produce more annotations used to annotate more documents. It will be used to train more complex IE engines and the cycle will keep on repeating itself until the required information is obtained. The user intervention is limited to providing an initial URL and to correct information if it is the case when the computation is finished. The revised annotation can then be reused to provide further training and therefore getting more information and/or more precision.
ESpotter: Adaptive named entity recognition for web browsing
- 3rd Conf. on Professional Knowledge Management
, 2005
"... {j.zhu, v.s.uren, e.motta} @ open.ac.uk Web users are facing information overload problems, i.e., it is hard for them to find desired information on the web. Hence the growing interest in named entity recognition (NER) for discovering relevant information on users ’ behalf. We present a browser plu ..."
Abstract
-
Cited by 24 (15 self)
- Add to MetaCart
{j.zhu, v.s.uren, e.motta} @ open.ac.uk Web users are facing information overload problems, i.e., it is hard for them to find desired information on the web. Hence the growing interest in named entity recognition (NER) for discovering relevant information on users ’ behalf. We present a browser plug-in called ESpotter which adapts lexicons and patterns to a domain hierarchy consisting of domains on the web and user preferences for accurate and efficient NER. Mappings are created from domain independent types to domain specific types. Entities are highlighted according to their types, and users are assisted by navigational functionalities between these highlighted entities.
CREAM -- CREAting Metadata for the Semantic Web
, 2003
"... Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, CREAM, that allows for creation of metadata. While the annotation mode of CREAM allows to create metadata for existing web pages, the authoring mode lets authors create metadata -- almo ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
Richly interlinked, machine-understandable data constitute the basis for the Semantic Web. We provide a framework, CREAM, that allows for creation of metadata. While the annotation mode of CREAM allows to create metadata for existing web pages, the authoring mode lets authors create metadata -- almost for free -- while putting together the content of a page. As a
Integrating Information to Bootstrap Information Extraction from Web Sites
- In: IJCAI’03 Workshop on Intelligent Information Integration
, 2003
"... In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention. ..."
Abstract
-
Cited by 20 (3 self)
- Add to MetaCart
In this paper we propose a methodology to learn to extract domain-specific information from large repositories (e.g. the Web) with minimum user intervention.
A Case for Automated Large Scale Semantic Annotations
- Journal of Web Semantics
, 2003
"... This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatica ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
This paper describes Seeker, a platform for large-scale text analytics, and SemTag, an application written on the platform to perform automated semantic tagging of large corpora. We apply SemTag to a collection of approximately 264 million web pages, and generate approximately 434 million automatically disambiguated semantic tags, published to the web as a label bureau providing metadata regarding the 434 million annotations. To our knowledge, this is the largest scale semantic tagging effort to date. We describe the Seeker platform, discuss the architecture of the SemTag application, describe a new disambiguation algorithm specialized to support ontological disambiguation of large-scale data, evaluate the algorithm, and present our final results with information about acquiring and making use of the semantic tags. We argue that automated large scale semantic tagging of ambiguous content can bootstrap and accelerate the creation of the semantic web.
Semantic annotation of unstructured and ungrammatical text
, 2005
"... There are vast amounts of free text on the internet that are neither grammatical nor formally structured, such as item descriptions on Ebay or internet classifieds like Craig's list. These sources of data, called "posts," are full of useful information for agents scouring the Semantic Web, but ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
There are vast amounts of free text on the internet that are neither grammatical nor formally structured, such as item descriptions on Ebay or internet classifieds like Craig's list. These sources of data, called "posts," are full of useful information for agents scouring the Semantic Web, but they lack the semantic annotation to make them searchable. Annotating these posts is difficult since the text generally exhibits little formal grammar and the structure of the posts varies. However, by leveraging collections of known entities and their common attributes, called "reference sets," we can annotate these posts despite their lack of grammar and structure. To use

