Results 1 - 10
of
23
A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems
- ACM Transactions on Information Systems
, 1994
"... We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression ..."
Abstract
-
Cited by 212 (34 self)
- Add to MetaCart
We present a probabilistic relational algebra (PRA) which is a generalization of standard relational algebra. Here tuples are assigned probabilistic weights giving the probability that a tuple belongs to a relation. Based on intensional semantics, the tuple weights of the result of a PRA expression always confirm to the underlying probabilistic model. We also show for which expressions extensional semantics yields the same results. Furthermore, we discuss complexity issues and indicate possibilities for optimization. With regard to databases, the approach allows for representing imprecise attribute values, whereas for information retrieval, probabilistic document indexing and probabilistic search term weighting can be modelled. As an important extension, we introduce the concept of vague predicates which yields a probabilistic weight instead of a Boolean value, thus allowing for queries with vague selection conditions. So PRA implements uncertainty and vagueness in combination with the...
Automated ranking of database query results
- In CIDR
, 2003
"... We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlatio ..."
Abstract
-
Cited by 118 (11 self)
- Add to MetaCart
(Show Context)
We investigate the problem of ranking answers to a database query when many tuples are returned. We adapt and apply principles of probabilistic models from Information Retrieval for structured data. Our proposed solution is domain independent. It leverages data and workload statistics and correlations. Our ranking functions can be further customized for different applications. We present results of preliminary experiments which demonstrate the efficiency as well as the quality of our ranking system. 1.
Selectivity Estimation using Probabilistic Models
, 2001
"... Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this paper, ..."
Abstract
-
Cited by 98 (3 self)
- Add to MetaCart
Estimating the result size of complex queries that involve selection on multiple attributes and the join of several relations is a difficult but fundamental task in database query processing. It arises in cost-based query optimization, query profiling, and approximate query answering. In this paper, we show how probabilistic graphical models can be effectively used for this task as an accurate and compact approximation of the joint frequency distribution of multiple attributes across multiple relations. Probabilistic Relational Models (PRMs) are a recent development that extends graphical statistical models such as Bayesian Networks to relational domains. They represent the statistical dependencies between attributes within a table, and between attributes across foreign-key joins. We provide an efficient algorithm for constructing a PRM from a database, and show how a PRM can be used to compute selectivity estimates for a broad class of queries. One of the major contributions of this work is a unified framework for the estimation of queries involving both select and foreign-key join operations. Furthermore, our approach is not limited to answering a small set of predetermined queries; a single model can be used to effectively estimate the sizes of a wide collection of potential queries across multiple tables. We present results for our approach on several real-world databases. For both single-table multi-attribute queries and a general class of select-join queries, our approach produces more accurate estimates than standard approaches to selectivity estimation, using comparable space and time.
"Is This Document Relevant? ...Probably": A Survey of Probabilistic Models in Information Retrieval
, 2001
"... This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the developmen ..."
Abstract
-
Cited by 71 (15 self)
- Add to MetaCart
This article surveys probabilistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described
Probabilistic information retrieval approach for ranking of database query results
- ACM Transactions on Database Systems (TODS
, 2006
"... We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured ..."
Abstract
-
Cited by 45 (8 self)
- Add to MetaCart
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
Towards automatic association of relevant unstructured content with structured query results
- In CIKM
, 2005
"... Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. In existing information integration solutions, the application needs to formu ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
Faced with growing knowledge management needs, enterprises are increasingly realizing the importance of seamlessly integrating critical business information distributed across both structured and unstructured data sources. In existing information integration solutions, the application needs to formulate the SQL logic to retrieve the needed structured data on one hand, and identify a set of keywords to retrieve the related unstructured data on the other. This paper proposes a novel approach wherein the application specifies its information needs using only a SQL query on the structured data, and this query is automatically “translated ” into a set of keywords that can be used to retrieve relevant unstructured data. We describe the techniques used for obtaining these keywords from (i) the query result, and (ii) additional related information in the underlying database. We further show that these techniques achieve high accuracy with very reasonable overheads.
Document Retrieval Facilities for Repository-Based System Development Environments
, 1996
"... Modern system development environments usually deploy the object management facilities of a so-called repository to store the documents created and maintained during system development. PCTE is the ISO and ECMA standard for a public tool interface for an open repository [23]. In this paper we presen ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Modern system development environments usually deploy the object management facilities of a so-called repository to store the documents created and maintained during system development. PCTE is the ISO and ECMA standard for a public tool interface for an open repository [23]. In this paper we present document retrieval extensions for an OQLoriented query language for PCTE. The extensions proposed cover (1) pattern matching, (2) term based document retrieval with automatically generated document description vectors, (3) the flexible definition of what is addressed as a "document" in agiven query, and (4) the integration of these facilities into a CASE tool. Whereas the integration of pattern matching facilities into query languages has been addressed by other authors before, the main contribution of our approach is the homogeneous integration of term based document retrieval and the flexible definition of documents. 1 Introduction Repository-based applications are in wide-spread use i...
Information retrieval on empty fields
- In HLT 2007
, 2007
"... We explore the problem of retrieving semi-structured documents from a realworld collection using a structured query. We formally develop Structured Relevance Models (SRM), a retrieval model that is based on the idea that plausible values for a given field could be inferred from the context provided ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We explore the problem of retrieving semi-structured documents from a realworld collection using a structured query. We formally develop Structured Relevance Models (SRM), a retrieval model that is based on the idea that plausible values for a given field could be inferred from the context provided by the other fields in the record. We then carry out a set of experiments using a snapshot of the National Science Digital Library (NSDL) repository, and queries that only mention fields missing from the test data. For such queries, typical field matching would retrieve no documents at all. In contrast, the SRM approach achieves a mean average precision of over twenty percent. 1
Efficient Transaction Support for Dynamic Information Retrieval Systems
- In Proc. of ACM SIGIR
, 1996
"... To properly handle concurrent accesses to documents by updates and queries in information retrieval (IR) systems, efforts are on to integrate IR features with database management system (DBMS) features. However, initial research has revealed that DBMS features optimized for traditional databases, di ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
To properly handle concurrent accesses to documents by updates and queries in information retrieval (IR) systems, efforts are on to integrate IR features with database management system (DBMS) features. However, initial research has revealed that DBMS features optimized for traditional databases, display degraded performance while handling text databases. Since efficiency is critical in IR systems, infrastructural extensions are necessary for several DBMS features, transaction support being one of them. This paper focuses on developing efficient transaction support for IR systems where updates and queries arrive dynamically, by exploiting the data characteristics of the indexes as well as of the queries and updates that access the indexes. Results of performance tests on a prototype system demonstrate the superior performance of our algorithms. Keywords: Concurrency Control, Recovery, Transaction Management, Index Management, Information Retrieval, Optimization, Performance, Digital L...
Query Evaluation with Soft-Key Constraints
"... Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constr ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Key Violations often occur in real-life datasets, especially in those integrated from different sources. Enforcing constraints strictly on these datasets is not feasible. In this paper we formalize the notion of soft-key constraints on probabilistic databases, which allow for violation of key constraint by penalizing every violating world by a quantity proportional to the violation. To represent our probabilistic database with constraints, we define a class of markov networks, where we can do query evaluation in PTIME. We also study the evaluation of conjunctive queries on relations with soft keys and present a dichotomy that separates this set into those in PTIME and the rest which are #P-Hard. 1.