Results 1 - 10
of
10
Efficient Query Evaluation on Probabilistic Databases
, 2004
"... We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attentio ..."
Abstract
-
Cited by 275 (36 self)
- Add to MetaCart
We describe a system that supports arbitrarily complex SQL queries with ”uncertain” predicates. The query semantics is based on a probabilistic model and the results are ranked, much like in Information Retrieval. Our main focus is efficient query evaluation, a problem that has not received attention in the past. We describe an optimization algorithm that can compute efficiently most queries. We show, however, that the data complexity of some queries is #P-complete, which implies that these queries do not admit any efficient evaluation methods. For these queries we describe both an approximation algorithm and a Monte-Carlo simulation algorithm.
P-SHOQ(D): A Probabilistic Extension of SHOQ(D) for Probabilistic Ontologies in the Semantic Web
, 2002
"... Ontologies play a central role in the development of the semantic web, as they provide precise definitions of shared terms in web resources. One important web ontology language is DAML+OIL; it has a formal semantics and a reasoning support through a mapping to the expressive description logic SHOQ ..."
Abstract
-
Cited by 68 (13 self)
- Add to MetaCart
Ontologies play a central role in the development of the semantic web, as they provide precise definitions of shared terms in web resources. One important web ontology language is DAML+OIL; it has a formal semantics and a reasoning support through a mapping to the expressive description logic SHOQ(D) with the addition of inverse roles. In this paper, we present a probabilistic extension of SHOQ(D), called P-SHOQ(D), to allow for dealing with probabilistic ontologies in the semantic web. The description logic P-SHOQ(D) is based on the notion of probabilistic lexicographic entailment from probabilistic default reasoning. It allows to express rich probabilistic knowledge about concepts and instances, as well as default knowledge about concepts. We also present sound and complete reasoning techniques for P-SHOQ(D), which are based on reductions to classical reasoning in SHOQ(D) and to linear programming, and which show in particular that reasoning in P-SHOQ(D) is decidable.
Models for Incomplete and Probabilistic Information
- IEEE Data Engineering Bulletin
, 2006
"... Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probab ..."
Abstract
-
Cited by 50 (6 self)
- Add to MetaCart
Abstract. We discuss, compare and relate some old and some new models for incomplete and probabilistic databases. We characterize the expressive power of c-tables over infinite domains and we introduce a new kind of result, algebraic completion, for studying less expressive models. By viewing probabilistic models as incompleteness models with additional probability information, we define completeness and closure under query languages of general probabilistic database models and we introduce a new such model, probabilistic c-tables, that is shown to be complete and closed under the relational algebra. 1
A Data Model and Algebra for Probabilistic Complex Values
- Annals of Mathematics and Artificial Intelligence
, 2000
"... We present a probabilistic data model for complex values. More precisely, we introduce probabilistic complex value relations, which combine the concept of probabilistic relations with the idea of complex values in a uniform framework. We elaborate a modeltheoretic definition of probabilistic combina ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
We present a probabilistic data model for complex values. More precisely, we introduce probabilistic complex value relations, which combine the concept of probabilistic relations with the idea of complex values in a uniform framework. We elaborate a modeltheoretic definition of probabilistic combination strategies, which has a rigorous foundation on probability theory. We then define an algebra for querying database instances, which comprises the operations of selection, projection, renaming, join, Cartesian product, union, intersection, and difference. We prove that our data model and algebra for probabilistic complex values generalizes the classical relational data model and algebra. Moreover, we show that under certain assumptions, all our algebraic operations are tractable. We finally show that most of the query equivalences of classical relational algebra carry over to our algebra on probabilistic complex value relations. Hence, query optimization techniques for classical relational algebra can easily be applied to optimize queries on probabilistic complex value relations. Keywords: Complex value databases, probabilistic databases, data model, relational algebra, query languages. AMS Subject classification: Primary 68P15, 68P20; Secondary 68T30, 68T37 1.
Probabilistic ontologies and relational databases
- In Proceedings CoopIS/DOA/ODBASE-2005, volume 3760 of LNCS
, 2005
"... Abstract. The relational algebra and calculus do not take the semantics of terms into account when answering queries. As a consequence, not all tuples that should be returned in response to a query are always returned, leading to low recall. In this paper, we propose the novel notion of a constraine ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract. The relational algebra and calculus do not take the semantics of terms into account when answering queries. As a consequence, not all tuples that should be returned in response to a query are always returned, leading to low recall. In this paper, we propose the novel notion of a constrained probabilistic ontology (CPO). We developed the concept of a CPO-enhanced relation in which each attribute of a relation has an associated CPO. These CPOs describe relationships between terms occurring in the domain of that attribute. We show that the relational algebra can be extended to handle CPO-enhanced relations. This allows queries to yield sets of tuples, each of which has a probability of being correct. 1
An Intelligent Search Agent System for Semantic Information Retrieval on the Internet
, 2003
"... In this paper we describe a prototype system for information retrieval on the Internet. Our idea is that the Web has to be searched both semantically and syntactically. In order to automatically categorize the web pages on the fly we propose a novel approach based on ontology and semantic networks a ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper we describe a prototype system for information retrieval on the Internet. Our idea is that the Web has to be searched both semantically and syntactically. In order to automatically categorize the web pages on the fly we propose a novel approach based on ontology and semantic networks and we describe a prototype system based on the Intelligent Agent Paradigm. Preliminary experiments are shown and discussed while describing open problems and on-going research.
E.: An efficient distance calculation method for uncertain objects
- In: Proceedings of 2007 IEEE Symposium on Computational Intelligence and Data Mining (CIDM
, 2007
"... Abstract — Recently the academic communities have paid more attention to the queries and mining on uncertain data. In the tasks such as clustering or nearest-neighbor queries, expected distance is often used as a distance measurement among uncertain data objects. Traditional database systems store u ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract — Recently the academic communities have paid more attention to the queries and mining on uncertain data. In the tasks such as clustering or nearest-neighbor queries, expected distance is often used as a distance measurement among uncertain data objects. Traditional database systems store uncertain objects using their expected (average) location in the data space. Distances can be calculated easily from the expected locations, but it poorly approximates the real expected distance values. Recent research work calculates the expected distance by calculating the weighted average of the pair-wise distances among samples of two uncertain objects. However the pair-wise distance calculations take much longer time than the the former method. In this paper, we propose an efficient method Approximation by Single Gaussian (ASG) to calculate the expected distance by a function of the means and variances of samples of uncertain objects. Theoretical and experimental studies show that ASG has both advantages of the latter method’s high accuracy and the former method’s fast execution time. We suggest that ASG plays an important role in reducing computational costs significantly in query processing and various data mining tasks such as clustering and outlier detection. I.
A Framework for Management of Semistructured Probabilistic Data
- Journal of Intelligent Information Systems
, 2004
"... This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions and associated information. A formal Semistructured Probabilistic Object (SPO) data model and a Semistructured Probabilistic Query Algebra (SP-algebra) are proposed. The SP-algebra supports standard database queries as well as some specific to probabilities, such as conditionalization and marginalization. Thus, the Semistructured Probabilistic Database may be used as a backend to any application that involves the management of large quantities of probabilistic information, such as building stochastic models. The implementation uses XML encoding of SPOs to facilitate communication with diverse applications. The database management system has been implemented on top of a relational DBMS. The translation of SP-algebra queries into relational queries are discussed here, and the results of initial experiments evaluating the system are reported.
Query algebra operations for interval probabilities
- In Proceedings of the Iternational Conference on Database and Expert Systems Applications (DEXA). Prague, Czech Republic
"... Abstract. The groundswell for the `00s is imprecise probabilities. Whether the numbers represent the probable location of a GPS device at its next sounding, the inherent uncertainty of an individual expert's probability prediction, or the range of values derived from the fusion of sensor data, proba ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Abstract. The groundswell for the `00s is imprecise probabilities. Whether the numbers represent the probable location of a GPS device at its next sounding, the inherent uncertainty of an individual expert's probability prediction, or the range of values derived from the fusion of sensor data, probability intervals became an important way of representing uncertainty. However, until recently, there has been no robust support for storage and management of imprecise probabilities. In this paper, we define the semantics of traditional query algebra operations of selection, projection, Cartesian product and join, as well as an operation of conditionalization, specific to probabilistic databases. We provide efficient methods for computing the results of these operations and show how they conform to probability theory.
Taming Data Explosion in Probabilistic Information Integration
- In Proceedings of the International Workshop on Inconsistency and Incompleteness in Databases (IIDB), March 26, 2006
, 2006
"... Abstract. Data integration has been a challenging problem for decades. In an ambient environment, where many autonomous devices have their own information sources and network connectivity is ad hoc and peer-topeer, it even becomes a serious bottleneck. To enable devices to exchange information witho ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract. Data integration has been a challenging problem for decades. In an ambient environment, where many autonomous devices have their own information sources and network connectivity is ad hoc and peer-topeer, it even becomes a serious bottleneck. To enable devices to exchange information without the need for interaction with a user at data integration time and without the need for extensive semantic annotations, a probabilistic approach seems rather promising. It simply teaches the device how to cope with the uncertainty occurring during data integration. Unfortunately, without any kind of world knowledge, almost everything becomes uncertain, hence maintaining all possibilities produces huge integrated information sources. In this paper, we claim that only very simple and generic rules are enough world knowledge to drastically reduce the amount of uncertainty, hence to tame the data explosion to a manageable size. 1

