Results 1 - 10
of
132
Vbi-tree: A peer-to-peer framework for supporting multi-dimensional indexing schemes
- in ICDE
"... ..."
(Show Context)
Efficient skyline query processing on peer-to-peer networks
- In IEEE International Conference on Data Engineering (ICDE) (2007
, 2007
"... Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficien ..."
Abstract
-
Cited by 41 (6 self)
- Add to MetaCart
(Show Context)
Skyline query has been gaining much interest in database research communities in recent years. Most existing studies focus mainly on centralized systems, and resolving the problem in a distributed environment such as a peer-to-peer (P2P) network is still an emerging topic. The desiderata of efficient skyline querying in P2P environment include: 1) progressive returning of answers, 2) low processing cost in terms of number of peers accessed and search messages, 3) balanced query loads among the peers. In this paper, we propose a solution that satisfies the three desiderata. Our solution is based on a balanced tree structured P2P network. By partitioning the skyline search space adaptively based on query accessing patterns, we are able to alleviate the problem of “hot ” spots present in the skyline query processing. By being able to estimate the peer nodes within the query subspaces, we are able to control the amount of query forwarding, limiting the number of peers involved and the amount of messages transmitted in the network. Load balancing is achieved in query load conscious data space splitting/merging during the joining/departure of nodes and through dynamic load migration. Experiments on real and synthetic datasets confirm the effectiveness and scalability of our algorithm on P2P networks. 1.
Speeding up search in peer-to-peer networks with a multi-way tree structure
- In Proc. of the 2006 SIGMOD Conf
, 2006
"... Peer-to-Peer systems have recently become a popular means to share resources. Effective search is a critical requirement in such systems, and a number of distributed search structures have been proposed in the literature. Most of these structures provide “log time search ” capability, where the loga ..."
Abstract
-
Cited by 38 (4 self)
- Add to MetaCart
(Show Context)
Peer-to-Peer systems have recently become a popular means to share resources. Effective search is a critical requirement in such systems, and a number of distributed search structures have been proposed in the literature. Most of these structures provide “log time search ” capability, where the logarithm is taken base 2. That is, in a system with N nodes, the cost of the search is O(log2N). In database systems, the importance of large fanout index structures has been well recognized. In P2P search too, the cost could be reduced considerably if this logarithm were taken to a larger base. In this paper, we propose a multiway tree search structure, which reduces the cost of search to O(logmN), where m is the fanout. The penalty paid is a larger update cost, but we show how to keep this penalty to be no worse than linear in m. We experimentally explore this tradeoff between search and update cost as a function of m, and suggest how to find a good trade-off point. The multi-way tree structure we propose, BATON*, is derived from the BATON structure that has recently been suggested. In addition to multi-way fanout, BATON * also adds support for multi-attribute queries to BATON. 1.
P-Ring: An efficient and robust P2P range index structure
- In SIGMOD
, 2007
"... Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and range queries. P-Ring is fault-tolerant, provides logarithmic search performance even for highly skewed ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
(Show Context)
Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and range queries. P-Ring is fault-tolerant, provides logarithmic search performance even for highly skewed data distributions and efficiently supports large sets of data items per peer. We experimentally evaluate P-Ring using both simulations and a real distributed deployment on PlanetLab, and we compare its performance with
HotRoD: Load Balancing and Efficient Range Query Processing in Peer-to-Peer Data Networks
"... Abstract. We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data net-works maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accel ..."
Abstract
-
Cited by 32 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We consider the conflicting problems of ensuring data-access load balancing and efficiently processing range queries on peer-to-peer data net-works maintained over Distributed Hash Tables (DHTs). Placing consecutive data values in neighboring peers is frequently used in DHTs since it accelerates range query processing. However, such a placement is highly susceptible to load imbalances, which are preferably handled by replicating data (since repli-cation also introduces fault tolerance benefits). In this paper, we present HotRoD, a DHT-based architecture that deals effectively with this combined problem through the use of a novel locality-preserving hash function, and a tun-able data replication mechanism which allows trading off replication costs for fair load distribution. Our detailed experimentation study shows strong gains in both range query processing efficiency and data-access load balancing, with low replication overhead. To our knowledge, this is the first work that concur-rently addresses the two conflicting problems using data replication. 1
Pastrystrings: A comprehensive content-based publish/subscribe DHT network
- In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS’06
, 2006
"... In this work we propose and develop a comprehensive infrastructure, coined PastryStrings, for supporting rich queries on both numerical (with range, and comparison predicates) and string attributes, (accommodating equality, prefix, suffix, and containment predicates) over DHT net-works utilising pre ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
(Show Context)
In this work we propose and develop a comprehensive infrastructure, coined PastryStrings, for supporting rich queries on both numerical (with range, and comparison predicates) and string attributes, (accommodating equality, prefix, suffix, and containment predicates) over DHT net-works utilising prefix-based routing. As event-based, pub-lish/subscribe information systems are a champion applica-tion class, we formulate our solution in terms of this envi-ronment. 1
Delay aware querying with Seaweed
- In VLDB
, 2006
"... Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
(Show Context)
Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (10 3 to 10 9 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems, and their data, will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting online aggregation and completeness prediction. Seaweed is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and present an evaluation of the Seaweed prototype running on a large-scale network simulator driven by real-world traces. 1.
Angle-based space partitioning for efficient parallel skyline computation
- In SIGMOD Conference
, 2008
"... Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech-niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral-lel and distributed settings. Unfortunately, such grid-based par ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
(Show Context)
Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech-niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral-lel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a parallel skyline query, where all partitions are examined at the same time, since many data partitions do not contribute to the overall skyline set, resulting in a lot of redundant processing. In this paper we propose a novel angle-based space parti-tioning scheme using the hyperspherical coordinates of the data points. We demonstrate both formally as well as through an exhaustive set of experiments that this new scheme is very suitable for skyline query processing in a parallel share-nothing architecture. The intuition of our partitioning tech-nique is that the skyline points are equally spread to all partitions. We also show that partitioning the data accord-ing to the hyperspherical coordinates manages to increase the average pruning power of points within a partition. Our novel partitioning scheme alleviates most of the problems of traditional grid partitioning techniques, thus managing to reduce the response time and share the computational workload more fairly. As demonstrated by our experimen-tal study, our technique outperforms grid partitioning in all cases, thus becoming an efficient and scalable solution for skyline query processing in parallel environments. 1.
Indexing Multi-dimensional Data in a Cloud System
"... Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high concurrent OLTP queries. Most existing work foc ..."
Abstract
-
Cited by 28 (6 self)
- Add to MetaCart
(Show Context)
Providing scalable database services is an essential requirement for extending many existing applications of the Cloud platform. Due to the diversity of applications, database services on the Cloud must support large-scale data analytical jobs and high concurrent OLTP queries. Most existing work focuses on some specific type of applications. To provide an integrated framework, we are designing a new system, epiC, as our solution to next-generation database systems. In epiC, indexes play an important role in improving overall performance. Different types of indexes are built to provide efficient query processing for different applications. In this paper, we propose RT-CAN, a multi-dimensional indexing scheme in epiC. RT-CAN integrates CAN [23]based routing protocol and the R-tree based indexing scheme to support efficient multi-dimensional query processing in a Cloud system. RT-CAN organizes storage and compute nodes into an overlay structure based on an extended CAN protocol. In our proposal, we make a simple assumption that each compute node uses an R-tree like indexing structure to index the data that are locally stored. We propose a query-conscious cost model that selects beneficial local R-tree nodes for publishing. By keeping the number of persistently connected nodes small and maintaining a global multi-dimensional search index, we can locate the compute nodes that may contain the answer with a few hops, making the scheme scalable in terms of data volume and number of compute nodes. Experiments on Amazon’s EC2 show that our proposed routing protocol and indexing scheme are robust, efficient and scalable.
Effective keyword-based selection of relational databases
- In Proceedings of SIGMOD
, 2007
"... over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented archi ..."
Abstract
-
Cited by 23 (7 self)
- Add to MetaCart
(Show Context)
over World Wide Web has fueled the demand for incorporating keyword-based search over structured databases. However, most of the current research work focuses on keywordbased searching over a single structured data source. With the growing interest in distributed databases and service oriented architecture over the Internet, it is important to extend such a capability over multiple structured data sources. One of the most important problems for enabling such a query facility is to be able to select the most useful data sources relevant to the keyword query. Traditional database summary techniques used for selecting unstructured data sources developed in IR literature are inadequate for our problem, as they do not capture the structure of the data sources. In this paper, we study the database selection problem for relational data sources, and propose a method that effectively summarizes the relationships between keywords in a relational database based on its structure. We develop effective ranking methods based on the keyword relationship summaries in order to select the most useful databases for a given keyword query. We have implemented our system on PlanetLab. In that environment we use extensive experiments with real datasets to demonstrate the effectiveness of our proposed summarization method.