Download:
|
by Yoav Etsion, Dror G. Feitelson
In Proceedings of the International Parallel and Distributed Processing Symposium 2001, IPDPS2001
http://www.cs.huji.ac.il/~feit/gang_comm.ps.gz
Add To MetaCart
Abstract:
One of the main limitations on multiprogramming of parallel jobs on clusters is the communication bandwidth allocation. When using an o-the-shelf network card there are memory limitations: the memory size resident on the network card is xed. Because of this, any use of this memory as a fast send or receive buer prohibits us from using a multiprogramming schene because of the obvious need to divide the buer among the dierent jobs, which reduces the achievable bandwidth substantially. In this paper we propose a dierent scheme, which combines gang scheduling (coordinated context switching of the processes on dierent nodes) with a communication buers switch. In this scheme, we associate a communication buer with each context. Whenever the scheduling algorithm initiates a context switch, we also replace the communication buers | the send queue and receive queue. This allows the jobs to utilize the bufers as if it was the only job in the system, resulting in improved bandwidth. 1
Citations
|
807
|
Myrinet: A Gigabit-per-second Local Area Network
– Boden, Cohen, et al.
- 1995
|
|
261
|
UNIX Network Programming
– Stevens
- 1990
|
|
83
|
Fast Messages: Efficient, portable communication for workstation clusters and MPPs
– Pakin, Karamcheti, et al.
- 1997
|
|
79
|
Distributed hierarchical control for parallel processing
– Feitelson, Rudolph
- 1990
|
|
73
|
GLUnix: A Global Layer Unix for a Network of Workstations
– Ghormley, Petrou, et al.
- 1998
|
|
67
|
Virtual network transport protocols for Myrinet
– Chun, Mainwaring, et al.
- 1997
|
|
61
|
The Network Architecture of the Connection Machine CM-5
– Pierre, Wong, et al.
|
|
56
|
The Prospero resource manager: A scalable framework for processor allocation in distributed systems. Concurrency: Practice and Experience
– Neumann, Rao
- 1994
|
|
49
|
Dynamic Coscheduling on Workstation Clusters
– Sobalvarro, Pakin, et al.
- 1998
|
|
43
|
Interfacing Condor and PVM to Harness the Cycles of WorkstationClusters. Future Generation Computer Systems
– Pruyne, Livny
- 1996
|
|
26
|
Gang Scheduling for Highly Efficient Distributed Multiprocessor Syetems
– Franke, Pattnaik, et al.
- 1996
|
|
25
|
Myrinet: A gigabitper -second local area network
– Boden, Cohen, et al.
- 1995
|
|
21
|
Leiserson et al., "The Network Architecture of the Connection Machine CM-5
– E
- 1992
|
|
13
|
Overhead Analysis of Preemptive Gang Scheduling,” Job Scheduling
– Hori, Tezuka, et al.
- 1998
|
|
11
|
Fast messages: Ecient, portable communication for workstation clusters and MPPs
– Pakin, Karamcheti, et al.
- 1997
|
|
7
|
The ParPar system: a software MPP
– Feitelson, Batat, et al.
- 1999
|
|
4
|
Using multicast to pre-load jobs on the parpar cluster
– Kavas, Er-El, et al.
|
|
2
|
Feitelson, "Using multicast to preload jobs on the ParPar cluster ". Parallel Comput
– Kavas, Er-El, et al.
- 2001
|
|
2
|
Gang scheduling for highly ecient distributed multiprocessor systems
– Franke, Pattnaik, et al.
- 1996
|
|
1
|
Write Combining Memory Implementation Guidelines. Order number
– Corp
- 1998
|
|
1
|
an agent-based architecture for dynamic resource management
– Tan
- 1999
|