Results 1 - 10
of
79
The state of the art in distributed query processing
- ACM Computing Surveys
, 2000
"... Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of ..."
Abstract
-
Cited by 320 (3 self)
- Add to MetaCart
Distributed data processing is fast becoming a reality. Businesses want to have it for many reasons, and they often must have it in order to stay competitive. While much of the infrastructure for distributed data processing is already in place (e.g., modern network technology), there are a number of issues which still make distributed data processing a complex undertaking: (1) distributed systems can become very large involving thousands of heterogeneous sites including PCs and mainframe server machines � (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system� (3) legacy systems need to be integrated|such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the \textbook " architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intra-query parallelism, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses di erent kinds of distributed systems such as client-server, middleware (multi-tier), and heterogeneous database systems and shows how query processing works in these systems. Categories and subject descriptors: E.5 [Data]:Files � H.2.4 [Database Management Systems]: distributed databases, query processing � H.2.5 [Heterogeneous Databases]: data translation General terms: algorithms � performance Additional key words and phrases: query optimization � query execution � client-server databases � middleware � multi-tier architectures � database application systems � wrappers� replication � caching � economic models for query processing � dissemination-based information systems 1
Efficient IR-Style Keyword Search over Relational Databases
- In VLDB
, 2003
"... Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this sea ..."
Abstract
-
Cited by 211 (10 self)
- Add to MetaCart
Applications in which plain text coexists with structured data are pervasive. Commercial relational database management systems (RDBMSs) generally provide querying capabilities for text attributes that incorporate state-of-the-art information retrieval (IR) relevance ranking strategies, but this search functionality requires that queries specify the exact column or columns against which a given list of keywords is to be matched.
XSEarch: A Semantic Search Engine for XML
- In VLDB
, 2003
"... XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the user's query. Query answers are ranked using extended information-retrieval techniques and are gener ..."
Abstract
-
Cited by 178 (6 self)
- Add to MetaCart
(Show Context)
XSEarch, a semantic search engine for XML, is presented. XSEarch has a simple query language, suitable for a naive user. It returns semantically related document fragments that satisfy the user's query. Query answers are ranked using extended information-retrieval techniques and are generated in an order similar to the ranking. Advanced indexing techniques were developed to facilitate e#cient implementation of XSEarch. The performance of the di#erent techniques as well as the recall and the precision were measured experimentally.
Efficient Keyword Search for Smallest LCAs in XML Databases
- In SIGMOD Conference
, 2005
"... Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corre-sponding efficient algorithms. The proposed keyword search re- ..."
Abstract
-
Cited by 164 (7 self)
- Add to MetaCart
(Show Context)
Keyword search is a proven, user-friendly way to query HTML documents in the World Wide Web. We propose keyword search in XML documents, modeled as labeled trees, and describe corre-sponding efficient algorithms. The proposed keyword search re-
Blinks: Ranked keyword searches on graphs
, 2007
"... Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supportin ..."
Abstract
-
Cited by 139 (9 self)
- Add to MetaCart
(Show Context)
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bilevel index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.
Effective keyword search in relational databases
- In SIGMOD
, 2006
"... With the amount of available text data in relational databases growing rapidly, the need for ordinary users to search such information is dramatically increasing. Even though the major RDBMSs have provided full-text search capabilities, they still require users to have knowledge of the database sche ..."
Abstract
-
Cited by 93 (0 self)
- Add to MetaCart
(Show Context)
With the amount of available text data in relational databases growing rapidly, the need for ordinary users to search such information is dramatically increasing. Even though the major RDBMSs have provided full-text search capabilities, they still require users to have knowledge of the database schemas and use a structured query language to search information. This search model is complicated for most ordinary users. Inspired by the big success of information retrieval (IR) style keyword search on the web, keyword search in relational databases has recently emerged as a new research topic. The differences between text databases and relational databases result in three new challenges: (1) Answers needed by users are not limited to individual tuples, but results assembled from joining tuples from multiple tables are used to form answers in the form of tuple trees. (2) A single score for each answer (i.e. a tuple tree) is needed to estimate its relevance to a given query. These scores are used to rank the most relevant answers as high as possible. (3) Relational databases have much richer structures than text databases. Existing IR strategies are inadequate in ranking relational outputs. In this paper, we propose a novel IR ranking strategy for effective keyword search. We are the first that conducts comprehensive experiments on search effectiveness using a real world database and a set of keyword queries collected by a major search company. Experimental results show that our strategy is significantly better than existing strategies. Our approach can be used both at the application level and be incorporated into a RDBMS to support keyword-based search in relational databases. 1.
An XML Query Engine for Network-Bound Data
, 2001
"... XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has rem ..."
Abstract
-
Cited by 76 (9 self)
- Add to MetaCart
XML has become the lingua franca for data exchange and integration across administrative and enterprise boundaries. Nearly all data providers are adding XML import or export capabilities, and standard XML Schemas and DTDs are being promoted for all types of data sharing. The ubiquity of XML has removed one of the major obstacles to integrating data from widely disparate sources -- namely, the heterogeneity of data formats. However, general-purpose integration of data across the wide area also requires a query processor that can query data sources on demand, receive streamed XML data from them, and combine and restructure the data into new XML output --- while providing good performance for both batch-oriented and ad-hoc, interactive queries. This is the goal of the Tukwila data integration system, the first system that focuses on network-bound, dynamic XML data sources. In contrast to previous approaches, which must read, parse, and often store entire XML objects before querying them, Tukwila can return query results even as the data is streaming into the system. Tukwila is built with a new system architecture that extends adaptive query processing and relational-engine techniques into the XML realm, as facilitated by a pair of operators that incrementally evaluate a query's input path expressions as data is read. In this paper, we describe the Tukwila architecture and its novel aspects, and we experimentally demonstrate that Tukwila provides better overall query performance and faster initial answers than existing systems, and has excellent scalability.
Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data
- IEEE INTERNATIONAL CONFERENCE ON DATA ENGINEERING
, 2009
"... Keyword queries enjoy widespread usage as they represent an intuitive way of specifying information needs. Recently, answering keyword queries on graph-structured data has emerged as an important research topic. The prevalent approaches build on dedicated indexing techniques as well as search algor ..."
Abstract
-
Cited by 62 (6 self)
- Add to MetaCart
Keyword queries enjoy widespread usage as they represent an intuitive way of specifying information needs. Recently, answering keyword queries on graph-structured data has emerged as an important research topic. The prevalent approaches build on dedicated indexing techniques as well as search algorithms aiming at finding substructures that connect
Integrating DB and IR technologies: What is the sound of one hand clapping
- In CIDR
, 2005
"... Databases (DB) and information retrieval (IR) have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both data and text management. In such settings, traditional DB queries, in SQL or XQuery, are not flexibl ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Databases (DB) and information retrieval (IR) have evolved as separate fields. However, modern applications such as customer support, health care, and digital libraries require capabilities for both data and text management. In such settings, traditional DB queries, in SQL or XQuery, are not flexible enough to handle applicationspecific scoring and ranking. IR systems, on the other hand, lack efficient support for handling structured parts of the data and metadata, and do not give the application developer adequate control over the ranking function. This paper analyzes the requirements of advanced text- and data-rich applications for an integrated platform. The core functionality must be manageable, and the API should be easy to program against. A particularly important issue that we highlight is how to reconcile flexibility in scoring and ranking models with optimizability, in order to accommodate a wide variety of target applications efficiently. We discuss whether such a system needs to be designed from scratch, or can be incrementally built on top of existing architectures. The results of our analyses are cast into a series of challenges to the DB and IR communities.
Agora: Living with XML and Relational
- In Proceedings of International Conference on Very Large Databases (VLDB
, 2000
"... Introduction There has been a significant body of research in the last fifteen years dedicated to integration of data from various repositories, exhibiting heterogeneous formats, and sometimes access restrictions; for a survey of such systems see, for example, [12]. The main technical issues to be ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Introduction There has been a significant body of research in the last fifteen years dedicated to integration of data from various repositories, exhibiting heterogeneous formats, and sometimes access restrictions; for a survey of such systems see, for example, [12]. The main technical issues to be addressed in a mediation system are: how to semantically unify heterogeneous data formats and schemas, and how to use query processing capabilities of participant data sites and that of the mediator in order to answer a particular query. Systems like the Information Manifold, and Garlic from IBM have chosen the relational and respectively the object-oriented model as the integration model. Given the popularity of XML as a data description format, more and more DBMS manufacturers have added to their systems the capability to export relational or object-oriented data to an XML format; other data formats (flat data files, regular HTML, PowerPoint presentations, annotated text) are also