Integrating MPI and the Nanothreads Programming Model
 In Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and NetworkBased Processing (PDP 2002), Las Palmas
, 2002
"... This paper presents a prototype runtime system that integrates MPI, used on distributed memory systems, and Nanothreads Programming Model (NPM), a programming model for shared memory multiprocessors. This integration does not alter the independence of the two models, since the runtime system is base ..."
Abstract

Cited by 3 (2 self)
This paper presents a prototype runtime system that integrates MPI, used on distributed memory systems, and Nanothreads Programming Model (NPM), a programming model for shared memory multiprocessors. This integration does not alter the independence of the two models, since the runtime system is based on a multilevel design that supports each of them individually but offers the capability to combine their advantages. Existing MPI codes can be executed without any changes, codes for shared memory machines can be used directly, while the concurrent use of both models is easy. Major feature of the runtime system is portability, as it is based exclusively on calls to MPI and Nthlib, a userlevel threads library that has been ported to several operating systems. The runtime system supports the hybridprogramming model (MPI+OpenMP), providing also a solution for better load balancing in MPI applications. Moreover, it extends the API and the multiprogramming functionality of the NPM on clusters of multiprocessors and can support an extension of the OpenMP standard on distributed memory multiprocessors.
Parallel Euler tour and Post Ordering for Parallel Tree Accumulations
, 2003
"... Tree accumulation is the process of aggregating data placed in tree nodes according to their tree structure. The aggregation can be a simple arithmetic operation or a more complex function. This is similar to arithmetic expression evaluation except there is one operation to be ..."
Abstract
Tree accumulation is the process of aggregating data placed in tree nodes according to their tree structure. The aggregation can be a simple arithmetic operation or a more complex function. This is similar to arithmetic expression evaluation except there is one operation to be
A Parallel State Assignment Algorithm for Finite State Machines
"... Abstract. This paper summarizes the design and implementation of a parallel algorithm for state assignment of large Finite State Machines (FSMs). High performance CAD tools are necessary to overcome the computational complexity involved in the optimization of large sequential circuits. FSMs constitu ..."
Abstract
Abstract. This paper summarizes the design and implementation of a parallel algorithm for state assignment of large Finite State Machines (FSMs). High performance CAD tools are necessary to overcome the computational complexity involved in the optimization of large sequential circuits. FSMs constitute an important class of logic circuits, and state assignment is one of the key steps in combinational logic optimization. The SMPbased parallel algorithm – based on the sequential program JEDI targeting multilevel logic implementation – scales nearly linearly with the number of processors for FSMs of varying problem sizes chosen from standard benchmark suites while attaining quality of results comparable to the best sequential algorithms. 1