Results 11 - 20
of
21
Adaptive Prefetching Technique for Shared Virtual Memory
"... Though shared virtual memory (SVM) systems promise low cost solutions for high performance computing, they suffer from long memory latencies. These latencies are usually caused by repetitive invalidations on shared data. Since shared data are accessed through synchronizations and the patterns by whi ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Though shared virtual memory (SVM) systems promise low cost solutions for high performance computing, they suffer from long memory latencies. These latencies are usually caused by repetitive invalidations on shared data. Since shared data are accessed through synchronizations and the patterns by which threads synchronizes are repetitive, a prefetching scheme based on such repetitiveness would reduce memory latencies. Based on this observation, we propose a prefetching technique which predicts future access behavior by analyzing access history per synchronization variable. Our technique was evaluated on an 8-node SVM system using the SPLASH-2 benchmark. The results show that our technique could achieve 34 % – 45 % reduction in memory access latencies. 1
Programming Distributed Memory Sytems Using OpenMP ∗
"... OpenMP has emerged as an important model and language extension for shared-memory parallel programming. On shared-memory platforms, OpenMP offers an intuitive, incremental approach to parallel programming. In this paper, we present techniques that extend the ease of sharedmemory parallel programming ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
OpenMP has emerged as an important model and language extension for shared-memory parallel programming. On shared-memory platforms, OpenMP offers an intuitive, incremental approach to parallel programming. In this paper, we present techniques that extend the ease of sharedmemory parallel programming in OpenMP to distributedmemory platforms as well. First, we describe a combined compile-time/runtime system that uses an underlying Software Distributed Shared Memory System and exploits repetitive data access behavior in both regular and irregular program sections. We present a compiler algorithm to detect such repetitive data references and an API to an underlying software distributed shared memory system to orchestrate the learning and proactive reuse of communication patterns. Second, we introduce a direct translation of standard OpenMP into MPI message-passing programs for execution on distributed memory systems. We present key concepts and describe techniques to analyze and efficiently handle both regular and irregular accesses to shared data. Finally, we evaluate the performance achieved by our approaches on representative OpenMP applications. 1
Exploiting Lock-Related Primitives in Distributed Shared-Memory Systems
, 1999
"... In this paper we investigate how lock-related primitives can be exploited to improve the performance of software distributed shared-memory systems (software DSMs). In particular, we study three novel Entry Consistency-based software DSMs that take advantage of lock-related primitives by using a ne ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper we investigate how lock-related primitives can be exploited to improve the performance of software distributed shared-memory systems (software DSMs). In particular, we study three novel Entry Consistency-based software DSMs that take advantage of lock-related primitives by using a new overheadtolerance technique, called Lock Acquirer Prediction (LAP). LAP predicts the pattern of lock accesses, so that the acquirer of a lock can be updated with the shared data it is going to require, sometimes even before requesting access to the lock. The extent to which lock-related primitives are used in applications has a direct impact on whether the advantages of Entry Consistency and LAP can be exploited to the fullest. Thus, our three systems impose variations of the shared-memory programming model that apply lock-related primitives with increasing degrees of intensity. The first system imposes a plain shared-memory model based on barriers and mutual exclusion locks, where ...
Comparing LatencyTolerance Techniques for Software DSM Systems
- in physics (1987) from the University of Bologna, Italy, the MS (1994) and PhD
, 2003
"... Abstract—This paper studies the isolated and combined effects of several latency-tolerance techniques for software-based distributed shared-memory systems (software DSMs). More specifically, we focus on data prefetching, update-based coherence, and single-writer optimizations for page-based software ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract—This paper studies the isolated and combined effects of several latency-tolerance techniques for software-based distributed shared-memory systems (software DSMs). More specifically, we focus on data prefetching, update-based coherence, and single-writer optimizations for page-based software DSMs. Our experimental results with six parallel applications show that, when these techniques are carefully combined, they can provide running time and speedup improvements of up to 54 percent and 110 percent, respectively, on a cluster of eight PCs. Index Terms—Distributed systems, performance. 1
Comparative Evaluation of Latency-Tolerating and -Reducing Techniques for Hardware-Only and Software-Only Directory Protocols
, 2000
"... this paper how effective latency-tolerating and-reducing ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
this paper how effective latency-tolerating and-reducing
Tolerating Latency in Software Distributed Shared Memory Systems Through NonBinding Prefetching
"... Akey obstacle to achieving high performance on software distributed shared memory (DSM) systems is their high memory latencies. Software-controlled prefetching tolerates memory latency by overlapping computation with communication. This thesis proposes and evaluates an implementation of software-con ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Akey obstacle to achieving high performance on software distributed shared memory (DSM) systems is their high memory latencies. Software-controlled prefetching tolerates memory latency by overlapping computation with communication. This thesis proposes and evaluates an implementation of software-controlled non-binding prefetching on a software DSM called TreadMarks. With programmer-inserted prefetching, all of our applications achieve better performance. The overall speedup ranges from 4 % to 29%. In addition, we observe that the performance of compiler-inserted prefetching matches that of programmer-inserted prefetching in a few cases. We also investigate prefetching with runtime information. Although using dynamic information to issue prefetches can overcome some of the limitations of statically inserted prefetches, the overheads of this approach often more than o set any gain in memory performance. Finally,weevaluate the combined e ects of prefetching and multithreading on application performance. In several cases, the combined approach outperforms either technique alone, but the overall results are mixed. ii
Where Does the Time Go in Software DSM Systems: Experiences with JIAJIA?
, 1999
"... The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this paper, we take the challenge to find where should we focus our strength on in the f ..."
Abstract
- Add to MetaCart
The performance gap between software DSM systems and message passing platforms prevents the prevalence of software DSM system greatly, though great efforts have been delivered in this area in the past decade. In this paper, we take the challenge to find where should we focus our strength on in the future design. The components of total system overhead of software DSM systems are analyzed in detail firstly. Based on a state-ofthe -art software DSM system JIAJIA, we measure these components on Dawning parallel system and draw five important conclusions which are different from some traditional viewpoints. (1) The performance of the JIAJIA software DSM system is acceptable. For four of eight applications, the parallel efficiency achieved by JIAJIA is about 80%, while for two others, 70% efficiency can be obtained. (2) 40.94% interrupt service time is overlapped with waiting time. (3) Encoding and decoding diffs do not cost much time(!1%), so using hardware support to encode/dec...
Heterogeneous Distributed Shared Memory on Wide Area Network
"... In this paper, we analyze the applicability of start-of-theart software DSM techniques for supporting a single shared address space in large, heterogeneous wide area network. The main contributions of this paper include following three aspects. First, based on the detail analysis, ten challenges rel ..."
Abstract
- Add to MetaCart
In this paper, we analyze the applicability of start-of-theart software DSM techniques for supporting a single shared address space in large, heterogeneous wide area network. The main contributions of this paper include following three aspects. First, based on the detail analysis, ten challenges related to implement single shared address space on heterogeneous, dynamic scenario are listed. Furthermore, for every challenges, we discuss the applicability of new techniques which are widely used in the homogeneous software DSM systems. Second, for two kind of typical applications, two hierarchical schemes, ###### for scientific applications and #### for information service applications, are proposed to implement heterogeneous distributed share memory system on wide area system. Finally, four key problems, such as coherent information maintenance, fault tolerance, resource discovery and join, and application adaptation, inherent in both ###### and #### scheme are analyzed and partial solutions are proposed too.
Efficient Categorization of Memory Sharing Patterns in Software DSM Systems
, 2001
"... This work introduces a new technique that enables SDSMs to categorize dynamically and accurately memory sharing patterns in both classes of regular and irregular applications. The categorization is carried out automatically at run-time on a per-page basis, requiring no user or compiler assistance. ..."
Abstract
- Add to MetaCart
This work introduces a new technique that enables SDSMs to categorize dynamically and accurately memory sharing patterns in both classes of regular and irregular applications. The categorization is carried out automatically at run-time on a per-page basis, requiring no user or compiler assistance.
Contents lists available at ScienceDirect Parallel Computing
"... journal homepage: www.elsevier.com/locate/parco Overcoming performance bottlenecks in using OpenMP on SMP clusters ..."
Abstract
- Add to MetaCart
journal homepage: www.elsevier.com/locate/parco Overcoming performance bottlenecks in using OpenMP on SMP clusters

