Results 1 - 10
of
107
Scalability and Accuracy in a Large-Scale Network Emulator
- IN PROCEEDINGS OF THE 5TH SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI
, 2002
"... This paper presents ModelNet, a scalable Internet emulation environment that enables researchers to deploy unmodified software prototypes in a configurable Internet-like environment and subject them to faults and varying network conditions. Edge nodes running user-specified OS and application softwa ..."
Abstract
-
Cited by 297 (47 self)
- Add to MetaCart
(Show Context)
This paper presents ModelNet, a scalable Internet emulation environment that enables researchers to deploy unmodified software prototypes in a configurable Internet-like environment and subject them to faults and varying network conditions. Edge nodes running user-specified OS and application software are configured to route their packets through a set of ModelNet core nodes, which cooperate to subject the traffic to the bandwidth, congestion constraints, latency, and loss profile of a target network topology.
This paper describes and evaluates the ModelNet architecture and its implementation, including novel techniques to balance emulation accuracy against scalability. The current ModelNet prototype is able to accurately subject thousands of instances of a distributed application to Internet-like conditions with gigabits of bisection bandwidth. Experiments with several large-scale distributed services demonstrate the generality and effectiveness of the infrastructure.
EMP: Zero-copy OS-bypass NIC-driven Gigabit Ethernet Message Passing
"... Modern interconnects like Myrinet and Gigabit Ethernet offer Gb/s speeds which has put the onus of reducing the communication latency on messaging software. This has led to the development of OS bypass protocols which removed the kernel from the critical path and hence reduced the endto -end latency ..."
Abstract
-
Cited by 103 (10 self)
- Add to MetaCart
Modern interconnects like Myrinet and Gigabit Ethernet offer Gb/s speeds which has put the onus of reducing the communication latency on messaging software. This has led to the development of OS bypass protocols which removed the kernel from the critical path and hence reduced the endto -end latency. With the advent of programmable NICs, many aspects of protocol processing can be o#oaded from user space to the NIC leaving the host processor to dedicate more cycles to the application. Many host-o#oad messaging systems exist for Myrinet; however, nothing similar exits for Gigabit Ethernet. In this paper we propose Ethernet Message Passing (EMP), a completely new zero-copy, OS-bypass messaging layer for Gigabit Ethernet on Alteon NICs where the entire protocol processing is done at the NIC. This messaging system delivers very good performance (latency of 23 us, and throughput of 880 Mb/s). To the best of our knowledge, this is the first NIC-level implementation of a zero-copy message passing layer for Gigabit Ethernet. Categories and Subject Descriptors C.2.2 [Computer-Communication Networks]: Network Protocols---Protocol Architecture General Terms Design, Performance Keywords Gigabit ethernet, message passing, OS bypass, user level protocol 1.
Parallelizing the Murφ verifier
- Computer Aided Verification. 9th International Conference
, 1997
"... With the use of state and memory reduction techniques in verification by explicit state enumeration, runtime becomes a major limiting factor. We describe a parallel version of the explicit state enumeration verifier Murφ for distributed memory multiprocessors and networks of workstations that is ba ..."
Abstract
-
Cited by 72 (0 self)
- Add to MetaCart
With the use of state and memory reduction techniques in verification by explicit state enumeration, runtime becomes a major limiting factor. We describe a parallel version of the explicit state enumeration verifier Murφ for distributed memory multiprocessors and networks of workstations that is based on the message passing paradigm. In experiments with three complex cache coherence protocols, parallel Murφ shows close to linear speedups, which are largely insensitive to communication latency and bandwidth. There is some slowdown with increasing communication overhead, for which a simple yet relatively accurate approximation formula is given. Techniques to reduce overhead and required bandwidth and to allow heterogeneity and dynamically changing load in the parallel machine are discussed, which we expect will allow good speedups when using conventional networks of workstations.
LoGPC: Modeling Network Contention in Message-Passing Programs
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 1998
"... In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, ..."
Abstract
-
Cited by 59 (4 self)
- Add to MetaCart
In many real applications, for example those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP [9] and LogGP [4] models to account for the impact of network contention and network interface DMA behavior on the performance of message-passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages [11, 18] on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50% of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify tradeoffs between synchronous and asynchronous message passing styles.
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
- in Proceedings of the 2002 Workshop on Communication Architecture for Clusters
, 2002
"... This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building b ..."
Abstract
-
Cited by 54 (20 self)
- Add to MetaCart
(Show Context)
This paper describes the evolution of the Portals message passing architecture and programming interface from its initial development on tightly-coupled massively parallel platforms to the current implementation running on a 1792-node commodity PC Linux cluster. Portals provides the basic building blocks needed for higher-level protocols to implement scalable, low-overhead communication. Portals has several unique characteristics that differentiate it from other high-performance system-area data movement layers. This paper discusses several of these features and illustrates how they can impact the scalability and performance of higher-level message passing protocols.
Implicit Coscheduling: Coordinated Scheduling with Implicit Information in Distributed Systems
- ACM TRANSACTIONS ON COMPUTER SYSTEMS
, 1998
"... In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing natural ..."
Abstract
-
Cited by 53 (2 self)
- Add to MetaCart
(Show Context)
In this thesis, we formalize the concept of an implicitly-controlled system, also referred to as an implicit system. In an implicit system, cooperating components do not explicitly contact other components for control or state information; instead, components infer remote state by observing naturally-occurring local events and their corresponding implicit information, i.e., information available outside of a defined interface. Many systems, particularly in distributed and networked environments, have leveraged implicit control to simplify the implementation of services with autonomous components. To concretely demonstrate the advantages of implicit control, we propose and implement implicit coscheduling, an algorithm for dynamically coordinating the time...
LoPC: Modeling Contention in Parallel Algorithms
, 1997
"... Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is inspired by the LogP model but accounts for contention for message processing resources in parallel al ..."
Abstract
-
Cited by 52 (8 self)
- Add to MetaCart
Parallel algorithm designers need computational models that take first order system costs into account, but are also simple enough to use in practice. This paper introduces the LoPC model, which is inspired by the LogP model but accounts for contention for message processing resources in parallel algorithms on a multiprocessor or network of workstations. LoPC takes the , and parameters directly from the LogP model and uses them to predict the cost of contention, .
Efficiency vs. Portability in Cluster-Based Network Servers
"... Efficiency and portability are usually conflicting objectives for cluster-based network servers that distribute the clients ’ requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. portability tradeoff has not been discussed b ..."
Abstract
-
Cited by 52 (22 self)
- Add to MetaCart
(Show Context)
Efficiency and portability are usually conflicting objectives for cluster-based network servers that distribute the clients ’ requests across the cluster based on the actual content requested. Our work is based on the observation that this efficiency vs. portability tradeoff has not been discussed before in the literature. To fill this gap, in this paper we study this tradeoff in the context of an interesting class of content-based network servers, the locality-conscious servers, using modeling and experimentation. Our analytical model gauges the potential performance benefits of portable and non-portable localityconscious request distribution with respect to a traditional, locality-oblivious server, as a function of multiple parameters. Based on our experience with the model, we design and evaluate a portable, locality-conscious server. Experiments with our server, a nonportable server, and a traditional server validate and confirm our modeling results under several real workloads. Based on our modeling and experimental results, our main conclusion is that portability should be promoted in cluster-based network servers with low processor overhead communication, given its relatively low cost 15%) in terms of efficiency. For clusters with high processor overhead communication, efficiency should be the overriding concern, as the cost of portability can be very high (as high as 98 % on 32 nodes). We also conclude that user-level communication can be useful even for non-scientific applications such as network servers.
Supporting Parallel Applications on Clusters of Workstations: The Intelligent Network Interface Approach
- Cluster Computing, Special Issue on High Performance Distributed Computing
, 1997
"... This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. This architecture permits (1) the transfer of selected communication-related functionality from the host machine ..."
Abstract
-
Cited by 41 (19 self)
- Add to MetaCart
This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed networks. This architecture permits (1) the transfer of selected communication-related functionality from the host machine to the network interface coprocessor, and (2) the exposure of this functionality directly to applications as instructions of aVirtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly with the network coprocessor as the host kernel only 'connects' the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network withthe applicationcan be varied to maximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor, by latency and bandwidth close to the hardware limits, and by an applicatio...
Can a Shared-Memory Model Serve as a Bridging Model for Parallel Computation?
, 1999
"... There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style fo ..."
Abstract
-
Cited by 40 (12 self)
- Add to MetaCart
There has been a great deal of interest recently in the development of general-purpose bridging models for parallel computation. Models such as the BSP and LogP have been proposed as more realistic alternatives to the widely used PRAM model. The BSP and LogP models imply a rather different style for designing algorithms when compared with the PRAM model. Indeed, while many consider data parallelism as a convenient style, and the shared-memory abstraction as an easyto-use platform, the bandwidth limitations of current machines have diverted much attention to message-passing and distributed-memory models (such as the BSP and LogP) that account more properly for these limitations. In this paper we consider the question of whether a shared-memory model can serve as an effective bridging model for parallel computation. In particular, can a shared-memory model be as effective as, say, the BSP? As a candidate for a bridging model, we introduce the Queuing Shared-Memory (QSM) model, which accounts for limited communication bandwidth while still providing a simple shared-memory abstraction. We substantiate the ability of the QSM to serve as a bridging model by providing a simple work-preserving emulation of the QSM on both the BSP, and on a related model, the (d, x)-BSP. We present evidence that the features of the QSM are essential to its effectiveness as a bridging model. In addition, we describe scenarios