Results 11 - 20
of
95
P-Ring: An efficient and robust P2P range index structure
- In SIGMOD
, 2007
"... Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and range queries. P-Ring is fault-tolerant, provides logarithmic search performance even for highly skewed ..."
Abstract
-
Cited by 33 (1 self)
- Add to MetaCart
(Show Context)
Peer-to-peer systems have emerged as a robust, scalable and decentralized way to share and publish data. In this paper, we propose P-Ring, a new P2P index structure that supports both equality and range queries. P-Ring is fault-tolerant, provides logarithmic search performance even for highly skewed data distributions and efficiently supports large sets of data items per peer. We experimentally evaluate P-Ring using both simulations and a real distributed deployment on PlanetLab, and we compare its performance with
A scalable P2P platform for the knowledge Grid
- IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
, 2005
"... The Knowledge Grid needs to operate with a scalable platform to provide large-scale intelligent services. A key function of such a platform is to efficiently support various complex queries in a dynamic large-scale network environment. This paper proposes a platform to support index-based path quer ..."
Abstract
-
Cited by 33 (13 self)
- Add to MetaCart
The Knowledge Grid needs to operate with a scalable platform to provide large-scale intelligent services. A key function of such a platform is to efficiently support various complex queries in a dynamic large-scale network environment. This paper proposes a platform to support index-based path queries by incorporating a semantic overlay with an underlying structured P2P network that provides object location and management services. Various distributed indexing structures can be dynamically formed by publishing semantic objects as indexing nodes. Queries are forwarded along the chains of semantic object pointers to search for objects. We investigate the deployment of a scalable distributed trie index for broadcast queries on key strings, propose a decentralized load balancing method for solving the problem of uneven load distribution incurred by heterogeneity of loads and node capacities and by the distributed trie index, and give an approach for improving the availability of the semantic overlay and its trie index. Experiments demonstrate the scalability of the proposed platform.
Delay aware querying with Seaweed
- In VLDB
, 2006
"... Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges ..."
Abstract
-
Cited by 28 (2 self)
- Add to MetaCart
(Show Context)
Large highly distributed data sets are poorly supported by current query technologies. Applications such as endsystembased network management are characterized by data stored on large numbers of endsystems, with frequent local updates and relatively infrequent global one-shot queries. The challenges are scale (10 3 to 10 9 endsystems) and endsystem unavailability. In such large systems, a significant fraction of endsystems, and their data, will be unavailable at any given time. Existing methods to provide high data availability despite endsystem unavailability involve centralizing, redistributing or replicating the data. At large scale these methods are not scalable. We advocate a design that trades query delay for completeness, incrementally returning results as endsystems become available. We also introduce the idea of completeness prediction, which provides the user with explicit feedback about this delay/completeness trade-off. Completeness prediction is based on replication of compact data summaries and availability models. This metadata is orders of magnitude smaller than the data. Seaweed is a scalable query infrastructure supporting online aggregation and completeness prediction. Seaweed is built on a distributed hash table (DHT) but unlike previous DHT based approaches it does not redistribute data across the network. It exploits the DHT infrastructure for failure resilient metadata replication, query dissemination, and result aggregation. We analytically compare Seaweed’s scalability against other approaches and present an evaluation of the Seaweed prototype running on a large-scale network simulator driven by real-world traces. 1.
The Quest for Balancing Peer Load in Structured Peer-To-Peer Systems
, 2003
"... Structured peer-to-peer (P2P) systems are considered as the next generation application backbone on the Internet. An important problem of these systems is load balancing in the presence of non-uniform data distributions. In this paper we propose a completely decentralized mechanism that in parallel ..."
Abstract
-
Cited by 27 (9 self)
- Add to MetaCart
(Show Context)
Structured peer-to-peer (P2P) systems are considered as the next generation application backbone on the Internet. An important problem of these systems is load balancing in the presence of non-uniform data distributions. In this paper we propose a completely decentralized mechanism that in parallel addresses a local and a global load balancing problem: (1) balancing the storage load uniformly among peers participating in the network and (2) uniformly replicating different data items in the network while optimally exploiting existing storage capacity. Our approach is based on the P-Grid P2P system which is our variant of a structured P2P network. Problem (1) is solved by directly adapting the search structure to the data distribution. This may result in an unbalanced search structure, but we will show that the expected search cost in P-Grid in number of messages remains logarithmic under all circumstances.
A Practical Scalable Distributed B-Tree
"... Internet applications increasingly rely on scalable data structures that must support high throughput and store huge amounts of data. These data structures can be hard to implement efficiently. Recent proposals have overcome this problem by giving up on generality and implementing specialized interf ..."
Abstract
-
Cited by 24 (3 self)
- Add to MetaCart
(Show Context)
Internet applications increasingly rely on scalable data structures that must support high throughput and store huge amounts of data. These data structures can be hard to implement efficiently. Recent proposals have overcome this problem by giving up on generality and implementing specialized interfaces and functionality (e.g., Dynamo [4]). We present the design of a more general and flexible solution: a fault-tolerant and scalable distributed B-tree. In addition to the usual B-tree operations, our B-tree provides some important practical features: transactions for atomically executing several operations in one or more B-trees, online migration of B-tree nodes between servers for load-balancing, and dynamic addition and removal of servers for supporting incremental growth of the system. Our design is conceptually simple. Rather than using complex concurrency and locking protocols, we use distributed transactions to make changes to B-tree nodes. We show how to extend the B-tree and keep additional information so that these transactions execute quickly and efficiently. Our design relies on an underlying distributed data sharing service, Sinfonia [1], which provides fault tolerance and a light-weight distributed atomic primitive. We use this primitive to commit our transactions. We implemented our B-tree and show that it performs comparably to an existing open-source B-tree and that it scales to hundreds of machines. We believe that our approach is general and can be used to implement other distributed data structures easily. 1.
LH*lh: A Scalable High Performance Data Structure for Switched Multicomputers
, 1995
"... LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, e ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
(Show Context)
LH*lh is a new data structure for scalable high-performance hash les on the increasingly popular switched multicomputers, i.e., MIMD multiprocessor machines with distributed RAM memory and without shared memory. An LH*lh le scales up gracefully over available processors and the distributed memory, easily reaching Gbytes. Address calculus does not require any centralized component that could lead to a hot- spot. Access times to the le can be under a millisecond and the le can be used in parallel by several client processors. We showthe LH*lh design, and report on the performance analysis. This includes experiments on the Parsytec GC/PowerPlus multicomputer with up to 128 Power PCs and 32 MB of distributed RAM per node. We prove the e ciency of the method and justify various algorithmic choices that were made. LH*lh opens a new perspective for high-performance applications, especially for the database management of new types of data and in real-time environments.
High-Availability LH* Schemes with Mirroring
, 1996
"... Mirroring is a popular technique for enhancing file availability. We incorporate this technique into the LH* algorithms for scalable distributed linear hash files. Several schemes for mirroring LH* files are presented in this paper. The schemes increase the availability of LH* files in the presence ..."
Abstract
-
Cited by 19 (3 self)
- Add to MetaCart
Mirroring is a popular technique for enhancing file availability. We incorporate this technique into the LH* algorithms for scalable distributed linear hash files. Several schemes for mirroring LH* files are presented in this paper. The schemes increase the availability of LH* files in the presence of node failures. Every record remains accessible in the presence of a single node failure, and usually in the presence of multiple-node failures. The price is, as usual, twice as much storage for data, and an increase in the number of messages. The different schemes are characterized by different trade-offs, and they accommodate diverse application requirements. The additional messaging cost per insert is about the same for all the schemes, and is roughly only one message. The cost of a bucket recovery may in contrast vary greatly, from one message for one type of scheme, to a few for another, and many for yet another.
Balanced Distributed Search Trees Do Not Exist
, 1995
"... This paper is a first step towards an understanding of the inherent limitations of distributed data structures. We propose a model of distributed search trees that is based on few natural assumptions. We prove that any class of trees within our model satisfies a lower bound of \Omega\Gamma p m) o ..."
Abstract
-
Cited by 19 (1 self)
- Add to MetaCart
This paper is a first step towards an understanding of the inherent limitations of distributed data structures. We propose a model of distributed search trees that is based on few natural assumptions. We prove that any class of trees within our model satisfies a lower bound of \Omega\Gamma p m) on the worst case height of distributed search trees for m keys. That is, unlike in the single site case, balance in the sense that the tree height satisfies a logarithmic upper bound cannot be achieved. This is true although each node is allowed to have arbitrary degree (note that in this case, the height of a single site search tree is trivially bounded by one). By proposing a method that generates trees of height O( p m), we show the bound to be tight. 1 Introduction Distributed data structures have attracted considerable attention in the past few years. From a practical viewpoint, this is due to the increasing availability of networks of workstations. These networks offer an enormous c...
P-ring: An index structure for peer-to-peer systems
- In Cornell Technical Report
, 2004
"... Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range queries, while others support range queries, but do not support multiple data items per peer or prov ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
Current peer-to-peer (P2P) index structures only support a subset of the desired functionality for P2P database systems. For instance, some P2P index structures support equality queries but not range queries, while others support range queries, but do not support multiple data items per peer or provide guaranteed search performance. In this paper, we devise a novel index structure called P-Ring that supports both equality and range queries, is fault-tolerant, provides guaranteed search performance, and efficiently supports large sets of data items per peer. We are not aware of any other existing index structure that supports all of the above functionality in a dynamic P2P environment. In a thorough experimental study we evaluate the performance of P-Ring and quantify the performance trade-offs of the different system components. We also compare P-Ring with two other P2P index structures, Skip Graphs and Chord. 1.
Multifaceted Simultaneous Load Balancing in DHT-based P2P systems: A new game with old balls and bins
- Self-* Properties in Complex Information Systems, “Hot Topics” series, LNCS
, 2004
"... In this paper we present and evaluate uncoordinated on-line algorithms for simultaneous storage and replication load-balancing in DHT-based peer-to-peer systems. We compare our approach with the classical balls into bins model, and point out the similarities but also the differences which call fo ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
(Show Context)
In this paper we present and evaluate uncoordinated on-line algorithms for simultaneous storage and replication load-balancing in DHT-based peer-to-peer systems. We compare our approach with the classical balls into bins model, and point out the similarities but also the differences which call for new loadbalancing mechanisms specifically targeted at P2P systems. Some of the peculiarities of P2P systems, which make our problem even more challenging are that both the network membership and the data indexed in the network is dynamic, there is neither global coordination nor global information to rely on, and the load-balancing mechanism ideally should not compromise the structural properties and thus the search efficiency of the DHT, while preserving the semantic information of the data (e.g., lexicographic ordering to enable range searches).