Results 1 - 10
of
95
P-Grid: A self-organizing access structure for P2P information systems
- In CoopIS
, 2001
"... Peer-To-Peer systems are driving a major paradigm shift in the era of genuinely distributed computing. Gnutella is a good example of a Peer-To-Peer success story: a rather simple software enables Internet users to freely exchange files, such as MP3 music files. But it shows up also some of the limit ..."
Abstract
-
Cited by 304 (49 self)
- Add to MetaCart
(Show Context)
Peer-To-Peer systems are driving a major paradigm shift in the era of genuinely distributed computing. Gnutella is a good example of a Peer-To-Peer success story: a rather simple software enables Internet users to freely exchange files, such as MP3 music files. But it shows up also some of the limitations of current P2P information systems with respect to their ability to manage data eÆciently. In this paper we introduce P-Grid, a scalable access structure that is specifically designed for Peer-To-Peer information systems. P-Grids are constructed and maintained by using randomized algorithms strictly based on local interactions, provide reliable data access even with unreliable peers, and scale gracefully both in storage and communication cost.
Scalable, Distributed Data Structures for Internet Service Construction
, 2000
"... This paper presents a new persistent data management layer designed to simplify cluster-based Internet service construction. This self-managing layer, called a distributed data structure (DDS), presents a conventional single-site data structure interface to service authors, but partitions and replic ..."
Abstract
-
Cited by 156 (8 self)
- Add to MetaCart
This paper presents a new persistent data management layer designed to simplify cluster-based Internet service construction. This self-managing layer, called a distributed data structure (DDS), presents a conventional single-site data structure interface to service authors, but partitions and replicates the data across a cluster. We have designed and implemented a distributed hash table DDS that has properties necessary for Internet services (incremental scaling of throughput and data capacity, fault tolerance and high availability, high concurrency, consistency, and durability). The hash table uses two-phase commits to present a coherent view of its data across all cluster nodes, allowing any node to service any task. We show that the distributed hash table simplies Internet service construction by decoupling service-specic logic from the complexities of persistent, consistent state management, and by allowing services to inherit the necessary service properties from the DDS rather ...
Boxwood: Abstractions as the Foundation for Storage Infrastructure
, 2004
"... Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper ..."
Abstract
-
Cited by 132 (8 self)
- Add to MetaCart
Writers of complex storage applications such as distributed file systems and databases are faced with the challenges of building complex abstractions over simple storage devices like disks. These challenges are exacerbated due to the additional requirements for faulttolerance and scaling. This paper explores the premise that high-level, fault-tolerant abstractions supported directly by the storage infrastructure can ameliorate these problems. We have built a system called Boxwood to explore the feasibility and utility of providing high-level abstractions or data structures as the fundamental storage infrastructure. Boxwood currently runs on a small cluster of eight machines. The Boxwood abstractions perform very close to the limits imposed by the processor, disk, and the native networking subsystem. Using these abstractions directly, we have implemented an NFSv2 file service that demonstrates the promise of our approach.
BATON: A Balanced Tree Structure for Peer-to-Peer Networks
- In VLDB
, 2005
"... We propose a balanced tree structure overlay on a peer-to-peer network capable of supporting both exact queries and range queries efficiently. In spite of the tree structure causing distinctions to be made between nodes at different levels in the tree, we show that the load at each node is approxima ..."
Abstract
-
Cited by 132 (16 self)
- Add to MetaCart
We propose a balanced tree structure overlay on a peer-to-peer network capable of supporting both exact queries and range queries efficiently. In spite of the tree structure causing distinctions to be made between nodes at different levels in the tree, we show that the load at each node is approximately equal. In spite of the tree structure providing precisely one path between any pair of nodes, we show that sideways routing tables maintained at each node provide sufficient fault tolerance to permit efficient repair. Specifically, in a network with N nodes, we guarantee that both exact queries and range queries can be answered in O(logN) steps and also that update operations (to both data and network) have an amortized cost of O(logN). An experimental assessment validates the practicality of our proposal. 1
Survey of Research towards Robust Peer-to-Peer Networks: Search Methods
- COMPUTER NETWORKS
, 2004
"... ..."
(Show Context)
Online Balancing of Range-Partitioned Data with Applications to Peer-to-Peer Systems
- In VLDB
, 2004
"... We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale parallel databases, as well as in peer-to-peer (P2P) systems. As tuples are inserted and deleted... ..."
Abstract
-
Cited by 127 (4 self)
- Add to MetaCart
(Show Context)
We consider the problem of horizontally partitioning a dynamic relation across a large number of disks/nodes by the use of range partitioning. Such partitioning is often desirable in large-scale parallel databases, as well as in peer-to-peer (P2P) systems. As tuples are inserted and deleted...
Explicit Control in a Batch-Aware Distributed File System
"... We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer whi ..."
Abstract
-
Cited by 62 (3 self)
- Add to MetaCart
(Show Context)
We present the design, implementation, and evaluation of the Batch-Aware Distributed File System (BAD-FS), a system designed to orchestrate large, I/O-intensive batch workloads on remote computing clusters distributed across the wide area. BAD-FS consists of two novel components: a storage layer which exposes control of traditionally fixed policies such as caching, consistency, and replication; and a scheduler that exploits this control as needed for different users and workloads. By extracting these controls from the storage layer and placing them in an external scheduler, BAD-FS manages both storage and computation in a coordinated way while gracefully dealing with cache consistency, fault-tolerance, and space management issues in an application-specific manner. Using both microbenchmarks and real applications, we demonstrate the performance benefits of explicit control, delivering excellent end-to-end performance across the wide-area.
Replication under Scalable Hashing: A Family of Algorithms for Scalable Decentralized Data Distribution
- In Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM
, 2004
"... Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Repl ..."
Abstract
-
Cited by 58 (14 self)
- Add to MetaCart
Typical algorithms for decentralized data distribution work best in a system that is fully built before it first used; adding or removing components results in either extensive reorganization of data or load imbalance in the system. We have developed a family of decentralized algorithms, RUSH (Replication Under Scalable Hashing), that maps replicated objects to a scalable collection of storage servers or disks. RUSH algorithms distribute objects to servers according to user-specified server weighting. While all RUSH variants support addition of servers to the system, different variants have different characteristics with respect to lookup time in petabyte-scale systems, performance with mirroring (as opposed to redundancy codes), and storage server removal. All RUSH variants redistribute as few objects as possible when new servers are added or existing servers are removed, and all variants guarantee that no two replicas of a particular object are ever placed on the same server. Because there is no central directory, clients can compute data locations in parallel, allowing thousands of clients to access objects on thousands of servers simultaneously.
Tsar: A two tier sensor storage architecture using interval skip graphs
- In SenSys ’05: Proceedings of the 3rd international conference on Embedded networked sensor systems
, 2005
"... Archival storage of sensor data is necessary for applications that query, mine, and analyze such data for interesting features and trends. We argue that existing storage systems are designed primarily for flat hierarchies of homogeneous sensor nodes and do not fully exploit the multi-tier nature of ..."
Abstract
-
Cited by 47 (1 self)
- Add to MetaCart
Archival storage of sensor data is necessary for applications that query, mine, and analyze such data for interesting features and trends. We argue that existing storage systems are designed primarily for flat hierarchies of homogeneous sensor nodes and do not fully exploit the multi-tier nature of emerging sensor networks, where an application can comprise tens of tethered proxies, each managing tens to hundreds of untethered sensors. We present TSAR, a fundamentally different storage architecture that envisions separation of data from metadata by employing local archiving at the sensors and distributed indexing at the proxies. At the proxy tier, TSAR employs a novel multi-resolution ordered distributed index structure, the Interval Skip Graph, for efficiently supporting spatio-temporal and value queries. At the sensor tier, TSAR supports energy-aware adaptive summarization that can trade off the cost of transmitting metadata to the proxies against the overhead of false hits resulting from querying a coarse-grain index. We implement TSAR in a two-tier sensor testbed comprising Stargatebased proxies and Mote-based sensors. Our experiments demonstrate the benefits and feasibility of using our energy-efficient storage architecture in multi-tier sensor networks.