Results 1 - 10
of
49
An Analysis of Linux Scalability to Many Cores
"... This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard paralle ..."
Abstract
-
Cited by 22 (3 self)
- Add to MetaCart
This paper analyzes the scalability of seven system applications (Exim, memcached, Apache, PostgreSQL, gmake, Psearchy, and MapReduce) running on Linux on a 48core computer. Except for gmake, all applications trigger scalability bottlenecks inside a recent Linux kernel. Using mostly standard parallel programming techniques— this paper introduces one new technique, sloppy counters—these bottlenecks can be removed from the kernel or avoided by changing the applications slightly. Modifying the kernel required in total 3002 lines of code changes. A speculative conclusion from this analysis is that there is no scalability reason to give up on traditional operating system organizations just yet. 1
PacketShader: a GPU-Accelerated Software Router
"... We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.
Onix: A Distributed Control Platform for Large-scale Production Networks
- In Proc. OSDI
, 2010
"... Computer networks lack a general control paradigm, as traditional networks do not provide any networkwide management abstractions. As a result, each new function (such as routing) must provide its own state distribution, element discovery, and failure recovery mechanisms. We believe this lack of a c ..."
Abstract
-
Cited by 14 (2 self)
- Add to MetaCart
Computer networks lack a general control paradigm, as traditional networks do not provide any networkwide management abstractions. As a result, each new function (such as routing) must provide its own state distribution, element discovery, and failure recovery mechanisms. We believe this lack of a common control platform has significantly hindered the development of flexible, reliable and feature-rich network control planes. To address this, we present Onix, a platform on top of which a network control plane can be implemented as a distributed system. Control planes written within Onix operate on a global view of the network, and use basic state distribution primitives provided by the platform. Thus Onix provides a general API for control plane implementations, while allowing them to make their own trade-offs among consistency, durability, and scalability. 1
Verifiable Network-Performance Measurements
"... In the current Internet, there is no clean way for affected parties to react to poor forwarding performance: to detect and assess Service Level Agreement (SLA) violations by a contractual partner, a domain must resort to ad-hoc monitoring using probes. Instead, we propose Network Confessional, a new ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
In the current Internet, there is no clean way for affected parties to react to poor forwarding performance: to detect and assess Service Level Agreement (SLA) violations by a contractual partner, a domain must resort to ad-hoc monitoring using probes. Instead, we propose Network Confessional, a new, systematic approach to the problem of forwardingperformance verification. Our system relies on voluntary reporting, allowing each network domain to disclose its loss and delay performance to its customers and peers and, potentially, a regulator. Most importantly, it enables verifiable performance measurements, i.e., domains cannot abuse it to significantly exaggerate their performance. Finally, our system is tunable, allowing each participating domain to determine how many resources to devote to it independently (i.e., without any inter-domain coordination), exposing a controllable trade-off between performance-verification quality and resource consumption. Our system comes at the cost of deploying modest functionality at the participating domains ’ border routers; we show that it requires reasonable resources, well within modern network capabilities. 1.
Frenetic: A Network Programming Language
"... Modern networks provide a variety of interrelated services including routing, traffic monitoring, load balancing, and access control. Unfortunately, the languages used to program today’s networks lack modern features—they are usually defined at the low level of abstraction supplied by the underlying ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Modern networks provide a variety of interrelated services including routing, traffic monitoring, load balancing, and access control. Unfortunately, the languages used to program today’s networks lack modern features—they are usually defined at the low level of abstraction supplied by the underlying hardware and they fail to provide even rudimentary support for modular programming. As a result, network programs tend to be complicated, error-prone, and difficult to maintain. This paper presents Frenetic, a high-level language for programming distributed collections of network switches. Frenetic provides a declarative query language for classifying and aggregating network traffic as well as a functional reactive combinator library for describing high-level packet-forwarding policies. Unlike prior work in this domain, these constructs are—by design—fully compositional, which facilitates modular reasoning and enables code reuse. This important property is enabled by Frenetic’s novel runtime system which manages all of the details related to installing, uninstalling, and querying low-level packet-processing rules on physical switches. Overall, this paper makes three main contributions: (1) We analyze the state-of-the art in languages for programming networks and identify the key limitations; (2) We present a language design that addresses these limitations, using a series of examples to motivate and validate our choices; (3) We describe an implementation of the language and evaluate its performance on several benchmarks.
ServerSwitch: A Programmable and High Performance Platform for Data Center Networks
, 2011
"... As one of the fundamental infrastructures for cloud computing, data center networks (DCN) have recently been studied extensively. We currently use pure software-based systems, FPGA based platforms, e.g., NetFPGA, or OpenFlow switches, to implement and evaluate various DCN designs including topology ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
As one of the fundamental infrastructures for cloud computing, data center networks (DCN) have recently been studied extensively. We currently use pure software-based systems, FPGA based platforms, e.g., NetFPGA, or OpenFlow switches, to implement and evaluate various DCN designs including topology design, control plane and routing, and congestion control. However, software-based approaches suffer from high CPU overhead and processing latency; FPGA based platforms are difficult to program and incur high cost; and OpenFlow focuses on control plane functions at present. In this paper, we design a ServerSwitch to address the above problems. ServerSwitch is motivated by the observation that commodity Ethernet switching chips are becoming programmable and that the PCI-E interface provides high throughput and low latency between the server CPU and I/O subsystem. ServerSwitch uses a commodity switching chip for various customized packet forwarding, and leverages the server CPU for control and data plane packet processing, due to the low latency and high throughput between the switching chip and server CPU. We have built our ServerSwitch at low cost. Our experiments demonstrate that ServerSwitch is fully programmable and achieves high performance. Specifically, we have implemented various forwarding schemes including source routing in hardware. Our in-network caching experiment showed high throughput and flexible data processing. Our QCN (Quantized Congestion Notification) implementation further demonstrated that ServerSwitch can react to network congestions in 23us. ∗ This work was performed when Zhiqiang Zhou was a visiting student at Microsoft Research Asia. 1
Building extensible networks with rule-based forwarding
- In OSDI
, 2010
"... We present a network design that provides flexible and policy-compliant forwarding. Our proposal centers around a new architectural concept: that of packet rules. A rule is a simple if-then-else construct that describes the manner in which the network should – or should not – forward packets. A pack ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present a network design that provides flexible and policy-compliant forwarding. Our proposal centers around a new architectural concept: that of packet rules. A rule is a simple if-then-else construct that describes the manner in which the network should – or should not – forward packets. A packet identifies the rule by which it is to be forwarded and routers forward each packet in accordance with its associated rule. Each packet rule is certified, guaranteeing that all parties involved in forwarding a packet agree with the packet’s rule. Packets containing uncertified rules are simply dropped in the network. We present the design, implementation and evaluation of a Rule-Based Forwarding (RBF) architecture. We demonstrate flexibility by illustrating how RBF supports a variety of use cases including content caching, middlebox selection and DDoS protection. Using our prototype router implementation we show that the overhead RBF imposes is within the capabilities of modern network equipment. 1
Locating Cache Performance Bottlenecks Using Data Profiling
"... Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may b ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Effective use of CPU data caches is critical to good performance, but poor cache use patterns are often hard to spot using existing execution profiling tools. Typical profilers attribute costs to specific code locations. The costs due to frequent cache misses on a given piece of data, however, may be spread over instructions throughout the application. The resulting individually small costs at a large number of instructions can easily appear insignificant in a code profiler’s output. DProf helps programmers understand cache miss costs by attributing misses to data types instead of code. Associating cache misses with data helps programmers locate data structures that experience misses in many places in the application’s code. DProf introduces a number of new views of cache miss data, including a data profile, which reports the data types with the most cache misses, and a data flow graph, which summarizes how objects of a given type are accessed throughout their lifetime, and which accesses incur expensive cross-CPU cache loads. We present two case studies of using DProf to find and fix cache performance bottlenecks in Linux. The improvements provide a 16–57 % throughput improvement on a range of memcached and Apache workloads.
Devoflow: Scaling flow management for high-performance networks
- In ACM SIGCOMM
, 2011
"... OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. Howev ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
OpenFlow is a great concept, but its original design imposes excessive overheads. It can simplify network and traffic management in enterprise and data center environments, because it enables flow-level control over Ethernet switching and provides global visibility of the flows in the network. However, such fine-grained control and visibility comes with costs: the switch-implementation costs of involving the switch’s control-plane too often and the distributed-system costs of involving the OpenFlow controller too frequently, both on flow setups and especially for statistics-gathering. In this paper, we analyze these overheads, and show that OpenFlow’s current design cannot meet the needs of highperformance networks. We design and evaluate DevoFlow, a modification of the OpenFlow model which gently breaks the coupling between control and global visibility, in a way that maintains a useful amount of visibility without imposing unnecessary costs. We evaluate DevoFlow through simulations, and find that it can load-balance data center traffic as well as fine-grained solutions, without as much overhead: DevoFlow uses 10–53 times fewer flow table entries at an average switch, and uses 10–42 times fewer control messages.
Building a single-box 100 gbps software router
- In IEEE Workshop on Local and Metropolitan Area Networks
, 2010
"... great leaps in terms of CPU, memory, and I/O bus speeds. Benefiting from the hardware innovation, recent software routers on commodity PC now report about 10 Gbps in packet routing. In this paper we map out expected hurdles and projected speed-ups to reach 100 Gbps in packet routing on a single comm ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
great leaps in terms of CPU, memory, and I/O bus speeds. Benefiting from the hardware innovation, recent software routers on commodity PC now report about 10 Gbps in packet routing. In this paper we map out expected hurdles and projected speed-ups to reach 100 Gbps in packet routing on a single commodity PC. With careful measurements, we identify two notable bottlenecks for our goal: CPU cycles and I/O bandwidth. For the former, we propose reducing per-packet processing overhead with softwarelevel optimizations and buying extra computing power with GPUs. To improve the I/O bandwidth, we suggest scaling the performance of I/O hubs that limits packet routing speed to well before 50 Gbps. I.

