Results 1  10
of
14
Learning and Verifying Quantified Boolean Queries by Example
"... To help a user specify and verify quantified queries — a class of database queries known to be very challenging for all but the most expert users — one can question the user on whether certain data objects are answers or nonanswers to her intended query. In this paper, we analyze the number of ques ..."
Abstract

Cited by 7 (0 self)
 Add to MetaCart
(Show Context)
To help a user specify and verify quantified queries — a class of database queries known to be very challenging for all but the most expert users — one can question the user on whether certain data objects are answers or nonanswers to her intended query. In this paper, we analyze the number of questions needed to learn or verify qhorn queries, a special class of Boolean quantified queries whose underlying form is conjunctions of quantified Horn expressions. We provide optimal polynomialquestion and polynomialtime learning and verification algorithms for two subclasses of the class qhorn with upper constant limits on a query’s causal density.
Learning Schemas for Unordered XML
"... We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunctionfree multipli ..."
Abstract

Cited by 4 (4 self)
 Add to MetaCart
(Show Context)
We consider unordered XML, where the relative order among siblings is ignored, and we investigate the problem of learning schemas from examples given by the user. We focus on the schema formalisms proposed in [10]: disjunctive multiplicity schemas (DMS) and its restriction, disjunctionfree multiplicity schemas (MS). A learning algorithm takes as input a set of XML documents which must satisfy the schema (i.e., positive examples) and a set of XML documents which must not satisfy the schema (i.e., negative examples), and returns a schema consistent with the examples. We investigate a learning framework inspired by Gold [18], where a learning algorithm should be sound i.e., always return a schema consistent with the examples given by the user, and complete i.e., able to produce every schema with a sufficiently rich set of examples. Additionally, the algorithm should be efficient i.e., polynomial in the size of the input. We prove that the DMS are learnable from positive examples only, but they are not learnable when we also allow negative examples. Moreover, we show that the MS are learnable in the presence of positive examples only, and also in the presence of both positive and negative examples. Furthermore, for the learnable cases, the proposed learning algorithms return minimal schemas consistent with the examples. 1.
Validating RDF with shape expressions
 CoRR
"... We propose shape expression schema (ShEx), a novel schema formalism for describing the topology of an RDF graph that uses regular bag expressions (RBEs) to define constraints on the admissible neighborhood for the nodes of a given type. We provide two alternative semantics, multi and singletype, ..."
Abstract

Cited by 2 (2 self)
 Add to MetaCart
We propose shape expression schema (ShEx), a novel schema formalism for describing the topology of an RDF graph that uses regular bag expressions (RBEs) to define constraints on the admissible neighborhood for the nodes of a given type. We provide two alternative semantics, multi and singletype, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the singletype semantics is strictly more expressive than the multitype semantics, singletype validation is generally intractable and multitype validation is feasible for a small class of RBEs. To further curb the high computational complexity of validation, we propose a natural notion of determinism and show that multitype validation for the class of deterministic schemas using singleoccurrence regular bag expressions (SORBEs) is tractable. Finally, we consider the problem of validating only a fragment of a graph with preassigned types for some of its nodes, and argue that for deterministic ShEx using SORBEs, multitype validation can be performed efficiently and singletype validation can be performed with a single pass over the graph. 1
Query Induction with SchemaGuided Pruning Strategies
, 2013
"... Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruningbased heuristics in query learning algorit ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Inference algorithms for tree automata that define node selecting queries in unranked trees rely on tree pruning strategies. These impose additional assumptions on node selection that are needed to compensate for small numbers of annotated examples. Pruningbased heuristics in query learning algorithms for Web information extraction often boost the learning quality and speed up the learning process. We will distinguish the class of regular queries that are stable under a given schemaguided pruning strategy, and show that this class is learnable with polynomial time and data. Our learning algorithm is obtained by adding pruning heuristics to the traditional learning algorithm for tree automata from positive and negative examples. While justified by a formal learning model, our learning algorithm for stable queries also performs very well in practice of xml information extraction.
Complexity and Expressiveness of ShEx for RDF
"... We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expression ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
We study the expressiveness and complexity of Shape Expression Schema (ShEx), a novel schema formalism for RDF currently under development by W3C. ShEx assigns types to the nodes of an RDF graph and allows to constrain the admissible neighborhoods of nodes of a given type with regular bag expressions (RBEs). We formalize and investigate two alternative semantics, multiand singletype, depending on whether or not a node may have more than one type. We study the expressive power of ShEx and study the complexity of the validation problem. We show that the singletype semantics is strictly more expressive than the multitype semantics, singletype validation is generally intractable and multitype validation is feasible for a small (yet practical) subclass of RBEs. To curb the high computational complexity of validation, we propose a natural notion of determinism and show that multitype validation for the class of deterministic schemas using singleoccurrence regular bag expressions (SORBEs) is tractable.
Algorithms, Theory
"... Stochastic contextfree grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Webpage wrapping and even analysis of RNA. A string and an SCFG jointly represent a pro ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
(Show Context)
Stochastic contextfree grammars (SCFGs) have long been recognized as useful for a large variety of tasks including natural language processing, morphological parsing, speech recognition, information extraction, Webpage wrapping and even analysis of RNA. A string and an SCFG jointly represent a probabilistic interpretation of the meaning of the string, in the form of a (possibly infinite) probability space of parse trees. The problem of evaluating a query over this probability space is considered under the conventional semantics of querying a probabilistic database. For general SCFGs, extremely simple queries may have results that include irrational probabilities. But, for a large subclass of SCFGs (that includes all the standard studied subclasses of SCFGs) and the language of treepattern queries with projection (and child/descendant edges), it is shown that query results have rational probabilities with a polynomialsize bit representation and, more importantly, an efficient queryevaluation algorithm is presented.
THEME Knowledge and Data Representation
, 2012
"... Université des sciences et technologies de Lille (Lille 1) ..."
(Show Context)
General Terms
"... Web applications store their data within various database models, such as relational, semistructured, and graph data models to name a few. We study learning algorithms for queries for the above mentioned models. As a further goal, we aim to apply the results to learning crossmodel database mapping ..."
Abstract
 Add to MetaCart
(Show Context)
Web applications store their data within various database models, such as relational, semistructured, and graph data models to name a few. We study learning algorithms for queries for the above mentioned models. As a further goal, we aim to apply the results to learning crossmodel database mappings, which can also be seen as queries across different schemas. Categories and Subject Descriptors
Characterizing XML Twig Queries with Examples
"... Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of ..."
Abstract
 Add to MetaCart
Typically, a (Boolean) query is a finite formula that defines a possibly infinite set of database instances that satisfy it (positive examples), and implicitly, the set of instances that do not satisfy the query (negative examples). We investigate the following natural question: for a given class of queries, is it possible to characterize every query with a finite set of positive and negative examples that no other query is consistent with. We study this question for twig queries and XML databases. We show that while twig queries are characterizable, they generally require exponential sets of examples. Consequently, we focus on a practical subclass of anchored twig queries and show that not only are they characterizable but also with polynomiallysized sets of examples. This result is obtained with the use of generalization operations on twig queries, whose application to an anchored twig query yields a properly contained and minimally different query. Our results illustrate further interesting and strong connections between the structure and the semantics of anchored twig queries that the class of arbitrary twig queries does not enjoy. Finally, we show that the class of unions of twig queries is not characterizable.