Results 1 -
3 of
3
Feedback-directed thread scheduling with memory considerations
- In HPDC ’07: Proceedings of the 16th international symposium on High performance distributed computing
, 2007
"... This paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. The approach relies upon a binary instrumentation tool to automatically acquire the memory sharing relationship between user-level threads by analyzing their memory trac ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper describes a novel approach to generate an optimized schedule to run threads on distributed shared memory (DSM) systems. The approach relies upon a binary instrumentation tool to automatically acquire the memory sharing relationship between user-level threads by analyzing their memory trace. We introduce the concept of Affinity Graph to model the relationship. Expensive I/O for large trace files is completely eliminated by using an online graph creation scheme. We apply the technique of hierarchical graph partitioning and thread reordering to the affinity graph to determine an optimal thread schedule. We have performed experiments on an SGI Altix system. The experimental results show that our approach is able to reduce the total execution time by 10 % to 38 % for a variety of applications through the maximization of the data reuse within a single processor, minimization of the data sharing between processors, and a good load balance.
Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms ∗
"... This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore shared-memory systems. This model considers a memory architecture as a generic tree structure and allows for a portable, architecture-aware optimization framework to find an optimiz ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore shared-memory systems. This model considers a memory architecture as a generic tree structure and allows for a portable, architecture-aware optimization framework to find an optimized schedule for multi-threaded programs. It consists of three submodels in order to measure the cost of executing a thread schedule: affinity graph model, memory hierarchy model, and a cost model that characterize machines, programs, and costs respectively. With the aid of the model, we formalize the problem of finding the best thread schedule as an optimization problem. Due to the NP-hardness of the problem, we designed a hierarchical graph partitioning algorithm to compute an approximate solution. We then extended the algorithm to support threads with data dependencies (i.e., DAGs). The algorithm has been implemented in a feedback-directed optimization framework and applied to two real-world scientific applications: a Computational Fluid Dynamics (CFD) kernel and Cholesky factorization. We conducted our experiments on both SMP and DSM machines. The results show that our analytical model is accurate enough, and using the optimized thread schedule improves the program performance by 25 % to 4 times, demonstrating that our method is efficient and practical. 1.
DSM System Cabinet
"... Abstract—This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to evaluate the cost of executing a thread schedule: an affinity-graph submodel, a memory hierarchy submodel, and a cost subm ..."
Abstract
- Add to MetaCart
Abstract—This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to evaluate the cost of executing a thread schedule: an affinity-graph submodel, a memory hierarchy submodel, and a cost submodel that characterize programs, machines, and costs respectively. We applied the analytical model to both synthetic and realworld applications. The estimated cost accurately predicts which schedule will provide better performance. Due to the NP-hardness of the scheduling problem, we designed an approximation algorithm to compute near-optimal solutions. We have extended the algorithm to support threads with data dependences. We conducted experiments with a computational fluid dynamics (CFD) kernel and Cholesky factorization on both UMA SMP and NUMA DSM machines. The results show that using the optimized thread schedule can improve the program performance by 25% to 400%, demonstrating that our method for determining an optimized thread schedule for multicore systems is efficient and practical.

