Results 1 -
9 of
9
An Analysis of Linux Scalability to Many Cores
"... This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard paralle ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet. 1
Scalable Address Spaces Using RCU Balanced Trees
"... Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared address space has a cost: kernel virtual memory operations such as handling soft page faults, growing the address space, mappi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Software developers commonly exploit multicore processors by building multithreaded software in which all threads of an application share a single address space. This shared address space has a cost: kernel virtual memory operations such as handling soft page faults, growing the address space, mapping files, etc. can limit the scalability of these applications. In widely-used operating systems, all of these operations are synchronized by a single per-process lock. This paper contributes a new design for increasing the concurrency of kernel operations on a shared address space by exploiting read-copy-update (RCU) so that soft page faults can both run in parallel with operations that mutate the same address space and avoid contending with other page faults on shared cache lines. To enable such parallelism, this paper also introduces an RCU-based binary balanced tree for storing memory mappings. An experimental evaluation using three multithreaded applications shows performance improvements on 80 cores ranging from 1.7 × to 3.4 × for an implementation of this design in the Linux 2.6.37 kernel. The RCU-based binary tree enables soft page faults to run at a constant cost with an increasing number of cores, suggesting that the design will scale well beyond 80 cores.
Taming hot-spots in dht inverted indexes
- IN PROC. OF LSDS-IR’07
, 2007
"... DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow nonuniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a new distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across all hosts. The approach is stackable with standard DHTs and ensures that the DHT storage subsystem receives an uniform load by assigning fixed sized, or low variance, blocks.
Search Optimizations in Structured Peer–to–peer Systems
"... Abstract—DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index d ..."
Abstract
- Add to MetaCart
Abstract—DHT systems are structured overlay networks capable of using P2P resources as a scalable platform for very large data storage applications. However, their efficiency expects a level of uniformity in the association of data to index keys that is often not present in inverted indexes. Index data tends to follow non-uniform distributions, often power law distributions, creating intense local storage hotspots and network bottlenecks on specific hosts. Current techniques like caching cannot, alone, cope with this issue. We propose a distributed data structure based on a decentralized balanced tree to balance storage data and network load more uniformly across hosts. The results show that the data structure is capable of balancing resources, in particular when performing
Peer-to-Peer Systems
"... doi:10.1145/1831407.1831427 Within a decade, P2P has proven to be a technology that enables innovative new services and is used by millions of people every day. ..."
Abstract
- Add to MetaCart
doi:10.1145/1831407.1831427 Within a decade, P2P has proven to be a technology that enables innovative new services and is used by millions of people every day.
Peer-to-Peer Systems PEER-TO-PEER (P2P) COMPUTING has attracted
"... Within a decade, P2P has proven to be a technology that enables innovative new services and is used by millions of people every day. ..."
Abstract
- Add to MetaCart
Within a decade, P2P has proven to be a technology that enables innovative new services and is used by millions of people every day.
REGULAR PAPER STAIRS: Towards efficient full-text filtering and dissemination in DHT environments
"... Abstract Nowadays “live ” content, such as weblog, wikipedia, and news, is ubiquitous in the Internet. Providing users with relevant content in a timely manner becomes a challenging problem. Differing from Web search technologies and RSS feeds/reader applications, this paper envisions a personalized ..."
Abstract
- Add to MetaCart
Abstract Nowadays “live ” content, such as weblog, wikipedia, and news, is ubiquitous in the Internet. Providing users with relevant content in a timely manner becomes a challenging problem. Differing from Web search technologies and RSS feeds/reader applications, this paper envisions a personalized full-text content filtering and dissemination system in a highly distributed environment such as a Distributed Hash Table (DHT) based Peer-to-Peer (P2P) Network. Users subscribe to their interested content by specifying input keywords and thresholds as filters. Then, content is disseminated to those users having interest in it. In the literature, full-text document publishing in DHTs has suffered for a long time from the high cost of forwarding a document to home nodes of all distinct terms. It is aggravated by the fact that a document contains a large number of distinct terms (typically tens or thousands of terms per document). In this paper, we propose a set of novel techniques to overcome such a high forwarding cost by carefully selecting a very small number of meaningful terms (or key features) among candidate terms inside each document. Next, to reduce the average hop count per forwarding, we further prune irrelevant documents during the forwarding path. Experiments based on two real query logs and two real data sets demonstrate the effectiveness of our solution.
Arpeggio: Metadata Indexing in a . . .
, 2007
"... Peer-to-peer networks require an efficient means for performing searches for files by metadata keywords. Unfortunately, current methods usually sacrifice either scalability or recall. Arpeggio is a peer-to-peer file-sharing network that uses the Chord lookup primitive as a basis for constructing a d ..."
Abstract
- Add to MetaCart
Peer-to-peer networks require an efficient means for performing searches for files by metadata keywords. Unfortunately, current methods usually sacrifice either scalability or recall. Arpeggio is a peer-to-peer file-sharing network that uses the Chord lookup primitive as a basis for constructing a distributed keyword-set index, augmented with index-side filtering, to address this problem. We introduce index gateways, a technique for minimizing index maintenance overhead. Arpeggio also includes a content distribution system for finding source peers for a file; we present a novel system that uses Chord subrings to track live source peers without the cost of inserting the data itself into the network, and supports postfetching: using information in the index to improve the availability of rare files. The result is a system that provides efficient query operations with the scalability and reliability advantages of full decentralization. We use analysis and simulation results to show that our indexing system has reasonable storage and bandwidth costs, and improves load distribution.

