Results 11 - 20
of
105
Freeloader: Scavenging desktop storage resources for scientific data
- IN PROCEEDINGS OF SUPERCOMPUTING
, 2005
"... High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage s ..."
Abstract
-
Cited by 32 (11 self)
- Add to MetaCart
High-end computing is suffering a data deluge from experiments, simulations, and apparatus that creates overwhelming application dataset sizes. End-user workstations—despite more processing power than ever before—are ill-equipped to cope with such data demands due to insufficient secondary storage space and I/O rates. Meanwhile, a large portion of desktop storage is unused. We present the FreeLoader framework, which aggregates unused desktop storage space and I/O bandwidth into a shared cache/scratch space, for hosting large, immutable datasets and exploiting data access locality. Our experiments show that FreeLoader is an appealing low-cost solution to storing massive datasets, by delivering higher data access rates than traditional storage facilities. In particular, we present novel data striping techniques that allow FreeLoader to efficiently aggregate a workstation’s network communication bandwidth and local I/O bandwidth. In addition, the performance impact on the native workload of donor machines is small and can be effectively controlled.
Experiences with CoralCDN: A Five-Year Operational View
- In Proc NSDI
, 2010
"... CoralCDN is a self-organizing web content distribution network (CDN). Publishing through CoralCDN is as simple as making a small change to a URL’s hostname; a decentralized DNS layer transparently directs browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the o ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
(Show Context)
CoralCDN is a self-organizing web content distribution network (CDN). Publishing through CoralCDN is as simple as making a small change to a URL’s hostname; a decentralized DNS layer transparently directs browsers to nearby participating cache nodes, which in turn cooperate to minimize load on the origin webserver. CoralCDN has been publicly available on PlanetLab since March 2004, accounting for the majority of its bandwidth and serving requests for several million users (client IPs) per day. This paper describes CoralCDN’s usage scenarios and a number of experiences drawn from its multi-year deployment. These lessons range from the specific to the general, touching on the Web (APIs, naming, and security), CDNs (robustness and resource management), and virtualized hosting (visibility and control). We identify design aspects and changes that helped CoralCDN succeed, yet also those that proved wrong for its current environment. 1
Flexible, Wide-Area Storage for Distributed Systems with WheelFS
"... WheelFS is a wide-area distributed storage system intended to help multi-site applications share data and gain fault tolerance. WheelFS takes the form of a distributed file system with a familiar POSIX interface. Its design allows applications to adjust the tradeoff between prompt visibility of upda ..."
Abstract
-
Cited by 25 (3 self)
- Add to MetaCart
(Show Context)
WheelFS is a wide-area distributed storage system intended to help multi-site applications share data and gain fault tolerance. WheelFS takes the form of a distributed file system with a familiar POSIX interface. Its design allows applications to adjust the tradeoff between prompt visibility of updates from other sites and the ability for sites to operate independently despite failures and long delays. WheelFS allows these adjustments via semantic cues, which provide application control over consistency, failure handling, and file and replica placement. WheelFS is implemented as a user-level file system and is deployed on PlanetLab and Emulab. Three applications (a distributed Web cache, an email service and large file distribution) demonstrate that WheelFS’s file system interface simplifies construction of distributed applications by allowing reuse of existing software. These applications would perform poorly with the strict semantics implied by a traditional file system interface, but by providing cues to WheelFS they are able to achieve good performance. Measurements show that applications built on WheelFS deliver comparable performance to services such as CoralCDN and BitTorrent that use specialized wide-area storage systems. 1
Wide-area Network Acceleration for the Developing World
"... Wide-area network (WAN) accelerators operate by compressing redundant network traffic from point-to-point communications, enabling higher effective bandwidth. Unfortunately, while network bandwidth is scarce and expensive in the developing world, current WAN accelerators are designed for enterprise ..."
Abstract
-
Cited by 24 (5 self)
- Add to MetaCart
(Show Context)
Wide-area network (WAN) accelerators operate by compressing redundant network traffic from point-to-point communications, enabling higher effective bandwidth. Unfortunately, while network bandwidth is scarce and expensive in the developing world, current WAN accelerators are designed for enterprise use, and are a poor fit in these environments. We present Wanax, a WAN accelerator designed for developing-world deployments. It uses a novel multiresolution chunking (MRC) scheme that provides high compression rates and high disk performance for a variety of content, while using much less memory than existing approaches. Wanax exploits the design of MRC to perform intelligent load shedding to maximize throughput when running on resource-limited shared platforms. Finally, Wanax exploits the mesh network environments being deployed in the developing world, instead of just the star topologies common in enterprise branch offices. 1
Secure Data Deduplication
- STORAGESS'08
, 2008
"... As the world moves to digital storage for archival purposes, there is an increasing demand for systems that can provide secure data storage in a cost-effective manner. By identifying common chunks of data both within and between files and storing them only once, deduplication can yield cost savings ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
(Show Context)
As the world moves to digital storage for archival purposes, there is an increasing demand for systems that can provide secure data storage in a cost-effective manner. By identifying common chunks of data both within and between files and storing them only once, deduplication can yield cost savings by increasing the utility of a given amount of storage. Unfortunately, deduplication exploits identical content, while encryption attempts to make all content appear random; the same content encrypted with two different keys results in very different ciphertext. Thus, combining the space efficiency of deduplication with the secrecy aspects of encryption is problematic. We have developed a solution that provides both data security and space efficiency in single-server storage and distributed storage systems. Encryption keys are generated in a consistent manner from the chunk data; thus, identical chunks will always encrypt to the same ciphertext. Furthermore, the keys cannot be deduced from the encrypted chunk data. Since the information each user needs to access and decrypt the chunks that make up a file is encrypted using a key known only to the user, even a full compromise of the system cannot reveal which chunks are used by which users.
ChunkCast: An Anycast Service for Large Content Distribution
, 2006
"... Fast and efficient large content distribution is a challenge in the Internet due to its high traffic volume. In this paper, we propose ChunkCast, an anycast service that optimizes large content distribution. We present a distributed locality-aware directory that supports an efficient query for large ..."
Abstract
-
Cited by 23 (1 self)
- Add to MetaCart
(Show Context)
Fast and efficient large content distribution is a challenge in the Internet due to its high traffic volume. In this paper, we propose ChunkCast, an anycast service that optimizes large content distribution. We present a distributed locality-aware directory that supports an efficient query for large content. Our system improves the median downloading time by at least 32% compared to previous approaches and emulates multicast trees without any explicit coordination of peers.
EndRE: An End-System Redundancy Elimination Service for Enterprises
"... In many enterprises today, middleboxes called WAN optimizers are being deployed across WAN access links in order to eliminate redundancy in network traffic and reduce WAN access costs. In this paper, we present the design and implementation of EndRE, an alternate approach where redundancy eliminatio ..."
Abstract
-
Cited by 22 (4 self)
- Add to MetaCart
(Show Context)
In many enterprises today, middleboxes called WAN optimizers are being deployed across WAN access links in order to eliminate redundancy in network traffic and reduce WAN access costs. In this paper, we present the design and implementation of EndRE, an alternate approach where redundancy elimination is provided as an end system service. Unlike middleboxes, such an approach benefits both end-to-end encrypted traffic as well as traffic on last-hop wireless links to mobile devices. EndRE needs to be fast, adaptive and parsimonious in memory usage in order to opportunistically leverage resources on end hosts. Thus, we design a new fingerprinting scheme called SampleByte that is much faster than Rabin fingerprinting while delivering similar compression gains. Unlike Rabin, SampleByte can also adapt its CPU usage depending on server load. Further, we introduce optimizations to reduce server memory footprint by 33-75 % compared to prior approaches. Using several terabytes of network traffic traces from 11 enterprise sites, testbed experiments and a pilot deployment, we show that EndRE delivers 26 % bandwidth savings on average, processes payloads at speeds of 1.5-4Gbps, reduces end-to-end latencies by up to 30%, and translates bandwidth savings into equivalent energy savings on mobile smartphone. 1
Constructing collaborative desktop storage caches for large scientific datasets
- ACM Transaction on Storage (TOS
, 2006
"... or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and ..."
Abstract
-
Cited by 16 (6 self)
- Add to MetaCart
(Show Context)
or classroom use provided that the copies are not made or distributed for profit or commercial advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and
Timely offloading of result-data in hpc centers
- in The 2008 International Conference on Supercomputing
, 2008
"... High performance computing is facing an exponential growth in job output dataset sizes. This implies a significant commitment of supercomputing center resources—most notably, precious scratch space—in handling data staging and offloading. However, the scratch area is typically managed using simple “ ..."
Abstract
-
Cited by 16 (12 self)
- Add to MetaCart
(Show Context)
High performance computing is facing an exponential growth in job output dataset sizes. This implies a significant commitment of supercomputing center resources—most notably, precious scratch space—in handling data staging and offloading. However, the scratch area is typically managed using simple “purge policies”, without sophisticated “end-user data services ” that are required to balance center’s resource consumption and user serviceability. End-user data services such as offloading are performed using point-to-point transfers that are unable to reconcile center’s purge and users delivery deadlines, unable to adapt to changing dynamics in the end-to-end data path and are not fault-tolerant. We propose a robust framework for the timely, decentralized offload of result data, addressing the aforementioned significant gaps in extant direct-transfer-based offloading. The decentralized offload is achieved using an overlay of user-specified intermediate nodes and well known landmark nodes. These nodes serve as a means both to provide multiple data-flow paths, thereby maximizing bandwidth as well as provide fail-over capabilities for the offload. We have implemented our techniques within a production job scheduler (PBS) and data transfer tool (BitTorrent), and our evaluation shows that the offloading times can be significantly reduced (90.2 % for a 2.1 GB file), while also meeting centeruser
Supporting Practical Content-Addressable Caching with CZIP Compression Abstract
"... Content-based naming (CBN) enables content sharing across similar files by breaking files into positionindependent chunks and naming these chunks using hashes of their contents. While a number of research systems have recently used custom CBN approaches internally to good effect, there has not yet b ..."
Abstract
-
Cited by 15 (2 self)
- Add to MetaCart
(Show Context)
Content-based naming (CBN) enables content sharing across similar files by breaking files into positionindependent chunks and naming these chunks using hashes of their contents. While a number of research systems have recently used custom CBN approaches internally to good effect, there has not yet been any mechanism to use CBN in a general-purpose way. In this paper, we demonstrate a practical approach to applying CBN without requiring disruptive changes to end systems. We develop CZIP, a CBN compression scheme which reduces data sizes by eliminating redundant chunks, compresses chunks using existing schemes, and facilitates sharing within files, across files, and across machines by explicitly exposing CBN chunk hashes. CZIPaware caching systems can exploit the CBN information to reduce storage space, reduce bandwidth consumption, and increase performance, while content providers and middleboxes can selectively encode their most suitable content. We show that CZIP compares well to standalone compression schemes, that a CBN cache for CZIP is easily implemented, and that a CZIP-aware CDN produces significant benefits. 1