| Franck Cappello and Olivier Richard. Performance characteristics of a network of commodity multipro- cessors for the has benchmarks using a hybrid memory model. Report 1197, LRI, Universitd Paris-Sud, 91405, Orsay cedex, 1999. |
....of SMPs, BIP can not be adapted because only one process per node is able to access the Myrinet board. This makes it impossible to support several MPI processes on SMPs (except by using special compilation techniques to regroup several MPI processes into a unique multi threaded UNIX process [4]) We chose to keep the message passing point of view to avoid this restriction. The challenge is to provide the support of several processes per node with the same performance as multi threaded programming. We need to manage the concurrent access to the Myrinet board and to provide local ....
Franck Capello and Olivier Richard. Performance characteristics of a network of commodity multiprocessors for the nas benchmarks using a hybrid memory model. Technical Report 1197, LRI, Universite Paris-Sud, 91405 Orsay cedex, FRANCE, 1999.
....gain access to the Myrinet board while the other processors must remain idle for communication. A rst solution is to use the other processors with BIP threads. Communications between threads use simple memory copies and communications between nodes are performed with the classical BIP strategy [3]. However, this solution forces the use of the multi thread paradigm for communications so we chose to keep the message passing semantic. Providing the support of several processes per node with the same performance as multi threaded programming presents the following diculties: manage the ....
Franck Capello and Olivier Richard. Performance characteristics of a network of commodity multiprocessors for the nas benchmarks using a hybrid memory model. Technical Report 1197, LRI, Universite Paris-Sud, 91405 Orsay cedex, FRANCE, 1999.
....intra noeud mesurdes sur les NAS sont assez doigndes des accddrations mesurdes sur les SPLASH 2. Plusieurs programmes (LU, SP, BT) ne tirent pratiquement aucun bdndfice de l utilisation de noeuds biprocesseur. L accdldration locale varie avec le nombre de nceuds dans le CLUMP. L article [18] prdsente des analyses permettant de comprendre les raisons de l accdldration modeste sur les programmes LU, SP, BT et de l dvolution de l accdldration locale pour FT. Les conclusions sont les suivantes: un CLUMP de PCs biprocesseurs n alleinl pas des accddralions proche de 2 sur les programmes ....
....l autre pour tousles programmes du benchmark HAS. L accdldration en mode SMM est la plus forte lorsque la contribution des boucles non paralldisdes 10 (dans le code des programmes MPI) est significative. C est le cas pour LU notamment. La mthode HMM est plus favorable dans le cas de CG. L article [18] montre que les communications contribuent pour une part non ngligeable au temps de calcul total de CG, ce qui n est pas le cas pour LU ou EP par exemple. Les conflits d accbs k la carte rseau (2 processeurs pour 1 carte) et les nombreux accbs au bus local occasionns par les communications locales ....
Franck Cappello and Olivier Richard. Performance characteristics of a network of commodity multipro- cessors for the has benchmarks using a hybrid memory model. Report 1197, LRI, Universitd Paris-Sud, 91405, Orsay cedex, 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC