Results 1 -
4 of
4
2008a. Ontology-Based XQuery’ing of XML-Encoded Language Resources on Multiple Annotation Layers
"... We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be ..."
Abstract
-
Cited by 9 (5 self)
- Add to MetaCart
We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-based querying of multiple corpora simultaneously. 1.
Requirements of a user-friendly, general-purpose corpus query interface
- Sustainability of Language Resources and Tools for Natural Language Processing
, 2008
"... Abstract This article reports on a survey that was conducted among 16 projects of a collaborative research centre to learn about the requirements of a web-based corpus query interface. This interface is to be created for a collection of corpora that are heterogeneous with respect to their languages ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
Abstract This article reports on a survey that was conducted among 16 projects of a collaborative research centre to learn about the requirements of a web-based corpus query interface. This interface is to be created for a collection of corpora that are heterogeneous with respect to their languages, levels of annotations, and their users' research interests. Based on the survey and a comparison of three existing corpus query interfaces we compiled a set of requirements. In the context of sustainable strategies of corpus storage and accessibility we point out how to design an interface that is general enough to cover multiple corpora and at the same time suitable for a wide range of users.
Choosing an XML database for linguistically annotated corpora. Sprache und Datenverarbeitung 2008
"... XML has become the de-facto standard for representing linguistically annotated cor-pora. It seems safe to assume that storing and querying an XML-encoded, annotated corpus in an XML database is a straightforward procedure. In reality, however, it is not. This article aims to provide guidelines for d ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
XML has become the de-facto standard for representing linguistically annotated cor-pora. It seems safe to assume that storing and querying an XML-encoded, annotated corpus in an XML database is a straightforward procedure. In reality, however, it is not. This article aims to provide guidelines for deciding whether to use an XML database and how to choose a suitable product. To this end we examine the following questions: Which aspects should be considered before choosing to store an XML-encoded annot-ated corpus in an XML database? Which facilities does a database need to provide in or-der to be suitable for storing and querying annotated corpora? Do current XML data-bases offer these facilities, and, if not, can they be added? 1
A generic formalism to represent linguistic corpora in RDF and OWL/DL
- In Proceedings of the 8th International Conference on Language Resources and Evaluation
, 2012
"... This paper describes POWLA, a generic formalism to represent linguistic corpora by means of RDF and OWL/DL. Unlike earlier approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but rather, it is designed to support any kind of text-oriented annotation. POWLA i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper describes POWLA, a generic formalism to represent linguistic corpora by means of RDF and OWL/DL. Unlike earlier approaches in this direction, POWLA is not tied to a specific selection of annotation layers, but rather, it is designed to support any kind of text-oriented annotation. POWLA inherits its generic character from the underlying data model PAULA (Dipper, 2005; Chiarcos et al., 2009) that is based on early sketches of the ISO TC37/SC4 Linguistic Annotation Framework (Ide and Romary, 2004). As opposed to existing standoff XML linearizations for such generic data models, it uses RDF as representation formalism and OWL/DL for validation. The paper discusses advantages of this approach, in particular with respect to interoperability and queriability, which are illustrated for the MASC corpus, an open multi-layer corpus of American English (Ide et al., 2008).