Results 1 - 10
of
14
Dbpedia spotlight: Shedding light on the web of documents
- In Proceedings of the 7th International Conference on Semantic Systems (I-Semantics
, 2011
"... Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a syste ..."
Abstract
-
Cited by 167 (5 self)
- Add to MetaCart
(Show Context)
Interlinking text documents with Linked Open Data enables the Web of Data to be used as background knowledge within document-oriented applications such as search and faceted browsing. As a step towards interconnecting the Web of Documents with the Web of Data, we developed DBpedia Spotlight, a system for automatically annotating text documents with DBpedia URIs. DBpedia Spotlight allows users to configure the annotations to their specific needs through the DBpedia Ontology and quality measures such as prominence, topical pertinence, contextual ambiguity and disambiguation confidence. We compare our approach with the state of the art in disambiguation, and evaluate our results in light of three baselines and six publicly available annotation systems, demonstrating the competitiveness of our system. DBpedia Spotlight is shared as open source and deployed as a Web Service freely available for public use.
Domain Adaptation of Rule-based Annotators for Named-Entity Recognition Tasks
- In EMNLP (To appear
, 2010
"... Named-entity recognition (NER) is an important task required in a wide variety of applications. While rule-based systems are appealing due to their well-known “explainability,” most, if not all, state-of-the-art results for NER tasks are based on machine learning techniques. Motivated by these resul ..."
Abstract
-
Cited by 17 (7 self)
- Add to MetaCart
(Show Context)
Named-entity recognition (NER) is an important task required in a wide variety of applications. While rule-based systems are appealing due to their well-known “explainability,” most, if not all, state-of-the-art results for NER tasks are based on machine learning techniques. Motivated by these results, we explore the following natural question in this paper: Are rule-based systems still a viable approach to named-entity recognition? Specifically, we have designed and implemented a high-level language NERL on top of SystemT, a general-purpose algebraic information extraction system. NERL is tuned to the needs of NER tasks and simplifies the process of building, understanding, and customizing complex rule-based named-entity annotators. We show that these customized annotators match or outperform the best published results achieved with machine learning techniques. These results confirm that we can reap the benefits of rule-based extractors ’ explainability without sacrificing accuracy. We conclude by discussing lessons learned while building and customizing complex rule-based annotators and outlining several research directions towards facilitating rule development. 1
Rijke. Entity search: building bridges between two worlds
- In SEMSEARCH
, 2010
"... We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to-end ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
(Show Context)
We consider the task of entity search and examine to which extent state-of-art information retrieval (IR) and semantic web (SW) technologies are capable of answering information needs that focus on entities. We also explore the potential of combining IR with SW technologies to improve the end-to-end performance on a specific entity search task. We arrive at and motivate a proposal to combine text-based entity models with semantic information from the Linked Open Data cloud. Categories and Subject Descriptors
A.: Twitris 2.0 : Semantically empowered system for understanding perceptions from social data
- In: Proc. of the Int. Semantic Web Challenge. (2010
"... Abstract. We present Twitris 2.0 1, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatiotemporal-thematic pr ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
(Show Context)
Abstract. We present Twitris 2.0 1, a Semantic Web application that facilitates understanding of social perceptions by Semantics-based processing of massive amounts of event-centric data. Twitris 2.0 addresses challenges in large scale processing of social data, preserving spatiotemporal-thematic properties. Twitris 2.0 also covers context based semantic integration of multiple Web resources and expose semantically enriched social data to the public domain. Semantic Web technologies enable the system’s integration and analysis abilities. 1
Ten years of hyperlinks in online conversations
, 2010
"... Since social media sites have existed, a significant feature of the conversations which take place on them has been links to external websites. Users share videos or photos they have seen, point to products or movies they are interested in, and use external articles as a reference in discussions. Un ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Since social media sites have existed, a significant feature of the conversations which take place on them has been links to external websites. Users share videos or photos they have seen, point to products or movies they are interested in, and use external articles as a reference in discussions. Un-derstanding linking behaviour is an important part of un-derstanding the dynamics of online conversations, and iden-tifying the changing interests of participants in these discus-sions. In this paper, we present a study of the hyperlinks posted on a large-scale message board dataset over a ten year period. We analyse the occurrences of hyperlinks and find that the practice of posting links has become more fre-quent over the lifetime of the message board. We examine the domains linked to by users, and see that they change greatly over the years. We focus in particular on the the in-crease of links to resources with associated structured data, and discuss the potential for using this data for enhanced analysis of online conversation. Keywords hyperlinks, message boards, online communities, online con-versation, social media 1.
J.M.: Scenario-Driven Selection and Exploitation of Semantic Data for Optimal Named Entity Disambiguation
- In Proceedings of the Semantic Web and Information Extraction Workshop (SWAIE 2012
"... Abstract. The rapidly increasing use of large-scale data on the Web has made named entity disambiguation a key research challenge in Information Extraction (IE) and development of the Semantic Web. In this paper we propose a novel disambiguation framework that utilizes background semantic informatio ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Abstract. The rapidly increasing use of large-scale data on the Web has made named entity disambiguation a key research challenge in Information Extraction (IE) and development of the Semantic Web. In this paper we propose a novel disambiguation framework that utilizes background semantic information, typically in the form of Linked Data, to accurately determine the intended meaning of detected semantic entity references within texts. The novelty of our approach lies in the definition of a structured semi-automatic process that enables the custom selection and use of the semantic data that is optimal for the disambiguation scenario at hand. This process allows our framework to adapt to the particular characteristics of different domains and scenarios and, as experiments show, to be more effective than approaches primarily designed to work in open domain and unconstrained situations. 1
A Semantic Annotation Approach for Professional Literature
"... Abstract: This paper presents an automatic annotation method for professional literature. Through comparing with other storage formats and literary styles, we summarize two features of professional literature, and then propose three assumptions. To improve annotation efficiency firstly the method, b ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract: This paper presents an automatic annotation method for professional literature. Through comparing with other storage formats and literary styles, we summarize two features of professional literature, and then propose three assumptions. To improve annotation efficiency firstly the method, based on topology structure, partitions the domain ontology into segments which are self-consistent, then locates the most related segment(s) with the keywords extracted from document, finally annotates the document with located segment(s) and expands the annotation scope according to the correspondence between grammatical structure and semantic structure. Experiment results show that the method could improve annotation efficiency and annotation accuracy. Key words: professional literature, semantic annotation, ontology partition, semantic context, syntax context 1
EnviLOD WP5: Quantitative Evaluation of LOD-based Semantic Enrichment on Environmental Science Literature
"... This EnviLOD deliverable reports on the quantitative evaluation of the LODIE automatic semantic enrichment algorithm. Firstly, we outline the LODIE approach for LOD-based semantic enrichment of metadata and full-text scientific articles. Secondly, the methods are evaluated, both quantitatively and q ..."
Abstract
- Add to MetaCart
This EnviLOD deliverable reports on the quantitative evaluation of the LODIE automatic semantic enrichment algorithm. Firstly, we outline the LODIE approach for LOD-based semantic enrichment of metadata and full-text scientific articles. Secondly, the methods are evaluated, both quantitatively and qualitatively by our British Library users. For the latter, semantic search queries are compared against the full-text search capabilities of the Envia British Library information discovery tool and tentative benefits of LOD-based semantic enrichment are identified. The resulting richer content underpins the EnviLOD semantic search interface, which was then evaluated much more extensively with users at two different workshops. The results from the latter evaluation appear in the EnviLOD WP2 User Feedback report. 1
Edited by:
, 2012
"... There is a vast wealth of information available in textual format that the Semantic Web cannot yet tap into: 80 % of data on the Web and on internal corporate intranets is unstructured, hence analysing and structuring the data- social analytics and next generation analytics- is a large and growing e ..."
Abstract
- Add to MetaCart
(Show Context)
There is a vast wealth of information available in textual format that the Semantic Web cannot yet tap into: 80 % of data on the Web and on internal corporate intranets is unstructured, hence analysing and structuring the data- social analytics and next generation analytics- is a large and growing endeavour. The goal of the 1st workshop on Semantic Web and Information Extraction was to bring researchers from the fields of Information Extraction and the Semantic Web together to foster inter-domain collaboration. To make sense of the large amounts of textual data now available, we need help from both the Information Extraction and Semantic Web communities. The Information Extraction community specialises in mining the nuggets of information from text: such techniques could, however, be enhanced by annotated data or domain-specific resources. The Semantic Web community has already taken great strides in making these resources available through the Linked Open Data cloud, which are now ready for uptake by the Information Extraction community. The workshop invited contributions around three particular topics: 1) Semantic Web-driven Information
The Exponential Growth of Text Capture
, 2011
"... Approved for public release; distribution is unlimited. ..."