| X. Martorell, E. Ayguad, J.I. Navarro, J. Corbaln, M. Gonzlez, and J. Labarta. Thread fork/join techniques for multi--level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS'99, June 1999. |
....mechanisms to spawn parallelism, depending on the hierarchy level in which the application is running. When spawning the deepest (fine grain) level of parallelism, a mechanism based on work descriptors is available to supply the work to all the threads participating in the parallel region [5]. The mechanism is implemented as efficiently as the ones available in current thread packages. Although efficient, this mechanism does not allow the exploitation of further levels of parallelism. This requires the generation of work using a more costly interface that provides work descriptors ....
X. Martorell, E. Ayguad'e, J.I. Navarro, J. Corbal'an, M. Gonz'alez and J. Labarta. Thread Fork/join Techniques for Multi--level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....spawn parallelism, depending on the hierarchy level in which the application is running. When spawning the deepest (fine grain) level of parallelism, a mechanism based on work descriptors is available to supply the work to all (or just a subset of) the participant threads in the parallel region [6]. The mechanism is implemented as efficiently as the one currently available in most of the current available thread packages. Although efficient, this mechanism does not allow the exploitation of further levels of parallelism. This requires the generation of work using a more costly interface ....
X. Martorell, E. Ayguad'e, J.I. Navarro, J. Corbal 'an, M. Gonz'alez and J. Labarta. Thread Fork/join Techniques for Multi--level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....runtime system is based POSIX spin locks. If they are not supported, their primitives are implemented with mutexes (pthread mutex trylock) While threads allow us to exploit multilevel parallelism, single level parallelism can be expressed efficiently using lightweight work and loop descriptors [15]. Actually, a work descriptor is a minimal nanothread descriptor, i.e. containing only a pointer to the function to be executed and its arguments. An array of pointers to work descriptors corresponds with each virtual processor. The virtual processor checks also its descriptor s array, executes ....
X. Martorell, E. Ayguad, N. Navarro, J. Corbalan, M. Gonzalez, and J. Labarta. Thread Fork/Join Techniques for Multi-level Parallelism Exploitation in NUMA Multiprocessors, In Proceedings of the 13 Int. Conference on Supercomputing ICS99. Rhodes, Greece, June 1999.
No context found.
X. Martorell, E. Ayguad'e, J.I. Navarro, J. Corbal'an, M. Gonz'alez and J. Labarta. Thread Fork/join Techniques for Multi--level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....proposal in this paper requires explicit point to point synchronization mechanisms. The description of the runtime support for multiple levels of parallelism and thread groups is not included in this paper and can be found elsewhere [1] All these functionalities are included in the NthLib library [5] supporting the code generated by the NanosCompiler [3] The following subsections describe the most important implementation aspects of the precedences module in the NthLib library. The runtime description is divided in two sections. The rst one covers the aspects that have to be considered ....
X. Martorell, E. Ayguade, J.I. Navarro, J. Corbalan, M. Gonzalez and J. Labarta. Thread Fork/join Techniques for Multi{level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....efficient mechanisms to manage the parallelism, at different levels of granularity, present in parallel applications. On one hand, the library offers a simple interface for the management of fine grain loop level parallelism (through work descriptors) designed to minimize fork join overheads [15]. On the other hand, the library provides mechanisms for the management of coarser levels of parallelism (through work queues) where the specification of precedences among threads is necessary, thus supporting the unstructured parallelism found in arbitrary task graphs [16] These general ....
X. Martorell, E. Ayguad, N. Navarro, J. Corbalan, M. Gonzalez and J. Labarta. Thread Fork/Join Techniques for Multi-level Parallelism Exploitation in NUMA Multiprocessors. 13 th Int. Conference on Supercomputing ICS99. Rhodes (Greece), June 1999.
No context found.
X. Martorell, E. Ayguade, J.I. Navarro, J. Corbalan, M. Gonzalez and J. Labarta. Thread Fork/join Techniques for Multi--level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....di erent mechanisms to spawn parallelism, depending on the hierarchy level in which the application is running. When spawning the deepest ( ne grain) level of parallelism, a mechanism based on work descriptors is available to supply the work to all the threads participating in the parallel region [5]. The mechanism is implemented as eciently as the ones available in current thread packages. Although ecient, this mechanism does not allow the exploitation of further levels of parallelism. This requires the generation of work using a more costly interface that provides work descriptors with a ....
X. Martorell, E. Ayguade, J.I. Navarro, J. Corbalan, M. Gonzalez and J. Labarta. Thread Fork/join Techniques for Multi{level Parallelism Exploitation in NUMA Multiprocessors. In 13th Int. Conference on Supercomputing ICS'99, Rhodes (Greece), June 1999.
....of groups of threads within the scope of PARALLEL constructs; the second one deals with the specification of precedence relations among different sections in a SECTIONS work sharing construct. The code generated by the compiler contains calls to a highly tuned user level threads library (NthLib [9,10]) The paper outlines the requirements needed at this level to efficiently support multiple levels of parallelism and thread groups, and the execution of parallel tasks expressed by means of a generic hierarchical task graph. 2. THE NanosCompiler INTERNAL REPRESENTATION The OpenMP NanosCompiler ....
....descriptor and supplies it to the participating threads. The mechanism is implemented as efficiently as the one currently available in most of the current available thread packages. However, extra functionality has been included to allow work supply from several (simultaneously executing) threads [9]. On the other hand, when the application knows that it is spawning coarse grain parallelism, not at the deepest level, it can pay the cost of supporting nested parallelism. Higher levels of parallelism, containing other parallel regions, are generated using a more costly interface that provides ....
Martorell X, Ayguade E, Navarro JI, Corbalan J, Gonzalez M, Labarta J. Thread fork/join techniques for multi-level parallelism exploitation in NUMA multiprocessors. 13th International Conference on Supercomputing ICS'99, Rhodes, Greece, June 1999.
No context found.
X. Martorell, E. Ayguad, J.I. Navarro, J. Corbaln, M. Gonzlez, and J. Labarta. Thread fork/join techniques for multi--level parallelism exploitation in NUMA multiprocessors. In Proceedings of the 13th Int. Conference on Supercomputing ICS'99, June 1999.
No context found.
X. Martorell, E. Ayguade, N. Navarro, J. Corbalan, M. Gonzalez, and J. Labarta. Thread fork/join techniques for multi-level parallelism exploitatio in numa multiprocessors. Proc. of the 1999 International Conference on Supercomputing, June 1999.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC