| M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early Experiences with Olden. In Proceedings of the 6th International Workship on Languages and Compilers for Parallel Computing, pages 1--20, August 1993. |
....to the schedulers in some of these other systems, though Cilk s algorithm uses randomness and is provably efficient. Many multithreaded programming languages and runtime systems are based on heuristic scheduling techniques. Though systems such as Charm [91] COOL [27, 28] Id [3, 37, 80] Olden [22], and others [29, 31, 38, 44, 54, 55, 63, 88, 98] are based on sound heuristics that seem to perform well in practice and generally have wider applicability than Cilk, none are able to provide any sort of performance guarantee or accurate machine independent performance model. These systems ....
....somewhat onerous for the programmer, the decision to break procedures into separate nonsuspending threads with heapallocated closures simplifies the Cilk runtime system. Each Cilk thread runs to completion without suspending and leaves the C runtime stack empty when it dies. A common alternative [22, 47, 52, 63, 77, 81] is to directly support spawn return threads (or procedures) in the runtime system, possibly with stack allocated activation frames. In such a system, threads can suspend waiting for synchronization and leave temporary values on the calling stack. Consequently, this alternative strategy requires ....
Martin C. Carlisle, Anne Rogers, John H. Reppy, and Laurie J. Hendren. Early experiences with Olden. In Proceedings of the Sixth Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon, August 1993.
....processors) before performing the modification. An alternate approach involves moving computation to the data it references. Systems organized along these lines avoid the overhead of frequent remote communication by migrating computation to the node upon which frequently referenced data resides [10, 65]. Implementations utilizing both computation and data migration techniques are also possible [4, 11, 32] As with static software DSMs, the high fixed overheads of message based communication in many current generation systems drive dynamic software DSM implementors toward optimizations that ....
Martin C. Carlisle, Anne Rogers, John H. Reppy, and Laurie J. Hendren. Early Experiences with Olden. In Conference Record of the Sixth Workshop on Languages and Compilers for Parallel Computing, August 1993. 157
....run time techniques to improve locality for linked lists and trees [9] In this paper, we propose further extensions. Calder et al. use profiling to guide layout of global and stack variables to avoid conflicts [3] Carlisle et al. investigate parallel performance of pointer based codes in Olden [5]. 8 Conclusion Several conclusions can be drawn from our work. First, the relative effectiveness of software prefetching and locality optimizations depends on available memory bandwidth. For our array based benchmarks, software prefetching outperforms locality optimizations at high memory ....
M. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
....The approach also gives a natural way to restrict algorithms so they have no concurrent memory accesses. The futures construct was developed in the late 70s for expressing parallelism in programming languages and has been included in several programming languages [24] 25] 15] 17] [16]. Conceptually, the future construct forks a new thread t 1 to calculate Pipelining with Futures 215 a value (evaluate an expression) and immediately returns a pointer to where the result of t 1 will be written. This pointer can then be passed to other threads. When a thread t 2 needs the result ....
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early experiences with OLDEN (parallel programming). In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 1--20. Springer-Verlag, New York, Aug. 1993.
....1024 nodes 9 53 treeadd 1.6M nodes, 16 levels, 100 iter. 4 2 compress train.in 4 4 perl prime.pl 1 0 gcc 166.i o 166.s 11 22 mcf inp.in 24 49 ammp ammp.in 14 74 art reference input 46 40 equake inp.in 6 49 mgrid mgrid.in 4 25 swim swim.in 5 30 TABLE 2. Benchmarks and input parameters. [2], and four integer and five floating point applications from the SPEC2K (i.e. gcc, mcf, ammp, art, and equake) and SPEC95 (i.e. compress, perl, mgrid, and swim) All tested SPEC benchmarks use reference inputs except for compress. In the interest of reduced simulation time, we simulated the ....
Martin C. Carlisle, Anne Rogers, John H. Reppy, and Laurie J. Hendren. Early experiences with Olden. In Proceedings of the Sixth Languages and Compilers for Parallel Computing, pages 1--20. Springer-Verlag, 1994.
....and run time techniques to improve locality for linked lists and trees [9] We propose extension to their work. Calder et al. use profiling to guide layout of global and stack variables to avoid conflicts [3] Carlisle et al. investigate parallel performance for pointer based codes in Olden [5]. 8 Conclusion Several conclusions can be drawn from our work. First, the relative effectiveness of software prefetching and locality optimizations depends on available memory bandwidth. For our array based benchmarks, software prefetching outperforms locality optimizations at high memory ....
M. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
....in the original DAG to elements in the new DAG. After reorganizing the original DAG, the algorithm makes a separate pass to fix the pointers in the new DAG. Evaluation To evaluate different hints for CCMALLOC, we used the Olden benchmark MST, which computes the minimum spanning tree of a graph [10]. The graph is represented by a linked list of nodes, where each node contains a linked list representing the list of edges for that node (edge list) This code was used to evaluate giving different hints to the memory allocator. In the first scheme hints are given so that nodes in the list are ....
....locality. By using a specialized memory allocator which keeps track of memory size separately, nodes can be allocated much more closely, resulting in improved spatial locality. In a preliminary experiment, we optimized the Olden benchmark HEALTH, which simulates the Colombian health care system [10]. A quad tree (tree with 4 children) is used, with a linked list at each node in the tree. The original code uses malloc( to allocate the nodes of the linked list. Because of inefficiencies with the standard memory allocator, performance was poor even when nodes of a linked list are allocated ....
[Article contains additional citation context not shown here]
M. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
....in Cilk [8] and all computations that can be expressed in Nesl [5] Our results show that nested parallel computations have much better locality characteristics under work stealing than do general computations. We also briefly consider another class of computations, computations with futures [12, 13, 14, 20, 25], and show that they can be as bad as general computations. The second part of our results are on further improving the data locality of multithreaded computations with work stealing. In work stealing, a processor steals a thread from a randomly (with uniform distribution) chosen processor when ....
....dags G1(V1 ; E1) and G2(V2 ; E2 ) with disjoint edges sets such that s and t are the source and the sink of both G1 and G2 . Moreover V1 V2 = fs; tg. A nested parallel computation is a race free series parallel computation [6] We also consider multithreaded computations that use futures [12, 13, 14, 20, 25]. The dag structures of computations with futures are defined elsewhere [4] This is a superclass of nested parallel computations, but still much more restrictive than general computations. The work stealing algorithm for futures is a restricted form of work stealing algorithm, where a process ....
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early experiences with OLDEN (parallel programming). In Proceedings 6th International Workshop on Languages and Compilers for Parallel Computing, pages 1--20. SpringerVerlag, August 1993.
....and run time techniques to improve locality for linked lists and trees [10] We propose extension to their work. Calder et al. use profiling to guide layout of global and stack variables to avoid conflicts [4] Carlisle et al. investigate parallel performance for pointer based codes in Olden [6]. 8 Conclusions and Future Work We believe software and architecture support is need to reduce the memory bottleneck for advanced microprocessors. In this paper, we demonstrated how prefetching and locality optimizations can be used to improve locality and performance for several types of ....
M. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Computing, Portland, OR, August 1993.
....scheduled scheduled pcbroute 119262 66692 44 logicsim 36877 15460 58 multigrid 53972 13816 74 Table 8: Scheduling statistics in CA programs. 5 Related work Our work is related to many efforts focusing on runtime support for efficiently executing irregular computations on stock hardware [19, 20]. It differs from runtime systems for coarse grained object oriented languages such as COOL [21] and Mentat [22] by focusing on fine grained object level concurrency. The ABCL onAP1000implementation [18] is most similar with our work but adopts a traditional design, emphasizing techniques for ....
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren, "Early experiences with olden," in Proceedings of the Sixth Workshop on Languages and Compilers for Parallel Machines, 1993.
....computations are nowadays quite common for parallel computers. Multithreaded models of parallel computation have typically been proposed as a general approach to model dynamic, unstructured parallelism (see e.g. 17, 19, 26] and have been employed by several parallel programming languages [12, 13, 14, 22]. To specify parallelism, a thread can spawn child threads. Additionally, a thread may synchronize with some or all of its (direct or indirect) descendants by suspending its execution until a descendant reaches a specific point of computation. A multithreaded computation is identified with a ....
M. C. Carlisle, A. Rogers, J. H. Reppy and L. J. Handren, "Early experiences with Olden," Proceedings of the 6th Annual Workshop on Languages and Compilers for Parallel Computing, Portland, Oregon, August 1993.
....a data shipping model in which threads of computation are relatively immobileand data items (or copies of data items) are brought to the threads that reference them. Other types of DSM systems (e.g. those in which threads of computation are migrated to the data they reference) are also possible [4, 7, 13, 19]; a suitably generalized version of the classification scheme presented here could likely be applied to these systems as well. We classify systems by three basic mechanisms required to implement DSM and whether or not those mechanisms are implemented in hardware or software. These basic mechanisms ....
Martin C. Carlisle, Anne Rogers, John H. Reppy, and Laurie J. Hendren. Early Experiences with Olden. In Conference Record of the Sixth Workshop on Languages and Compilers for Parallel Computing, August 1993.
....gave experimental results showing the effectiveness of the technique. All this work, however, has been limited to computations in which threads can only synchronize with their sibling or ancestor threads. Although this is a reasonably general class, it does not include languages based on futures [23, 27, 12, 14, 13], languages based on lenient or speculative evaluation [1, 32] or languages with general user specified synchronization constraints [33] In this paper we show how to extend the results to support synchronization based on write once synchronization variables. A write once synchronization variable ....
....to such synchronization variables can be passed around among threads and synchronization can take place between two threads that have pointers to the variable. Such synchronization variables can be used to implement futures in such languages as Multilisp [23] Mul T [27] Cool [14] and olden [13]; I structures in ID [1] events in PCF [38] streams in sisal [18] and are likely to be helpful in implementing the user specified synchronization constraints in Jade [33] We model computations that use synchronization variables as directed acyclic graphs (dags) in which each node is a unit ....
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early experiences with OLDEN (parallel programming) . In Proc. International Workshop on Languages and Compilers for Parallel Computing, pages 1--20. Springer-Verlag, August 1993.
.... 61, 63] or by supporting fine grained parallel execution directly in hardware [3, 34, 50] These approaches, among others, have been used in implementing parallel programming languages such as ABCL [65] CC [13] Charm [35] Cid [48] Cilk [7] Concert [36] Id90 [16, 49] Mul T [39] and Olden [12]. In some cases, the cost of the fork is reduced by severely restricting what can be done in a thread. Lazy Task Creation [44] implemented in Mul T, is the most successful in reducing the cost of a fork. However, in all of these approaches, a fork remains substantially more expensive than a ....
....a newly allocated stack (See Figure 3.40) 17 In either case, since we copy the local data to a new location, we must either forbid pointers to data in a frame or scan memory and update any pointers to the region being copied. Eager disconnect is currently employed by most lazy thread systems. [12, 44, 57]. 17 We can also copy only some of the parent s ancestors and leave behind frames which forward data to the new location. CHAPTER 3. MULTITHREADED ABSTRACT MACHINE 54 The suspend operation in Figure 3.41 is an example of child copying that splits the stack. In addition to changing the state of ....
[Article contains additional citation context not shown here]
M.C. Carlisle, A. Rogers, J.H. Reppy, and L.J. Hendren. Early experiences with Olden (parallel programming). In Languages and Compilers for Parallel Computing. 6th International Workshop Proceedings, pages 1--20. Springer-Verlag, 1994.
....cases, the cost of the fork is reduced by severely restricting what can be done in a thread. These approaches, among others, have been used in implementing parallel programming languages such as ABCL [40] CC [6] Charm [20] Cid [29] Cilk [4] Concert [21] Id90 [9, 30] Mul T [23] and Olden [5]. Still, a fork remains substantially more expensive than a simple sequential call. Our goal is to support an unrestricted parallel thread model and yet bring the cost of thread creation, termination, and switching down to essentially the cost of a sequential call. We observe that fine grained ....
.... techniques and clever run time representations [9, 30, 26, 39, 37, 33, 17] and direct hardware support for fine grained parallel execution [19, 3] These approaches have been used to implement many parallel languages, e.g. Mul T [23] Id90 [9, 30] CC [6] Charm [20] Opus [25] Cilk [4] Olden [5], and Cid [29] The common goal is to reduce the overhead associated with managing the logical parallelism. While much of this work overlaps ours, none has combined the techniques described in this paper into an integrated whole. More importantly, none has started from the premise that all calls, ....
[Article contains additional citation context not shown here]
M.C. Carlisle, A. Rogers, J.H. Reppy, and L.J. Hendren. Early experiences with Olden (parallel programming). In Languages and Compilers for Parallel Computing. 6th International Workshop Proceedings, pages 1--20. Springer-Verlag, 1994.
.... [7, 19, 23, 16, 25, 20, 18] and by supporting fine grained parallel execution directly in hardware [13, 2] These approaches, among others, have been used in implementing the parallel programming languages Mul T [15] Id90 [7, 19] CC [5] Charm [14] Cilk [3] Cid [18] and Olden [4]. In many cases, the cost of the parallel call is reduced by severely restricting what can be done in a thread. In earlier approaches, the full cost of parallelism is borne for all potentially parallel calls, although the parallelism is neither needed nor exploited in most instances. For example, ....
.... compiler techniques and clever run time representations [7, 19, 16, 25, 23, 20, 10] and direct hardware support for fine grained parallel execution [13, 2] These approaches have been used to implement many parallel languages, e.g. Mul T [15] Id90 [7, 19] CC [5] Charm [14] Cilk [3] Olden [4], and Cid [18] The common goal is to reduce the overhead associated with managing the logical parallelism. While much of this work overlaps ours, none has combined the techniques described in this paper into an integrated whole. More importantly, none has started from the premise that all calls, ....
[Article contains additional citation context not shown here]
M.C. Carlisle, A. Rogers, J.H. Reppy, and L.J. Hendren. Early experiences with Olden (parallel programming). In Languages and Compilers for Parallel Computing. 6th International Workshop Proceedings, pages 1--20. Springer-Verlag, 1994.
....processors) before performing the modification. An alternate approach involves moving computation to the data it references. Systems organized along these lines avoid the overhead of frequent remote communication by migrating computation to the node upon which frequently referenced data resides [10, 65]. Implementations utilizing both computation and data migration techniques are also possible [4, 11, 32] As with static software DSMs, the high fixed overheads of message based communication in many current generation systems drive dynamic software DSM implementors toward optimizations that ....
Martin C. Carlisle, Anne Rogers, John H. Reppy, and Laurie J. Hendren. Early Experiences with Olden. In Conference Record of the Sixth Workshop on Languages and Compilers for Parallel Computing, August 1993.
....parallelization techniques that we intend to use to transform a sequential program automatically into a program that can take advantage of our execution model. Finally, we discuss related work in Section 8 and conclusions in Section 9. This paper expands upon our earlier work [Rogers et al. 1992; Carlisle et al. 1993]. 2. THE SPMD MODEL Before we explain the details of our approach, let us review the basic SPMD and programming models that we are using. In our SPMD model, each processor has an identical copy of the program, as well as a local stack that is used to store procedure arguments, local variables, ....
Carlisle, M., Rogers, A., Reppy, J., and Hendren, L. 1993. Early experiences with Olden. In Conference Record of the Sixth Workshop on Languages and Compilers for Parallel Computing. Also appears in Springer Verlag LNCS 768 (pp. 1-20).
No context found.
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early Experiences with Olden. In Proceedings of the 6th International Workship on Languages and Compilers for Parallel Computing, pages 1--20, August 1993.
No context found.
M. C. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (LNCS), 1993.
No context found.
M. C. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (LNCS), 1993.
No context found.
M. C. Carlisle, A. Rogers, J. Reppy, and L. Hendren. Early experiences with Olden. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing (LNCS), 1993.
No context found.
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early Experiences with Olden. In Proceedings of the 6th International Workship on Languages and Compilers for Parallel Computing, pages 1--20, August 1993.
No context found.
M. C. Carlisle, A. Rogers, J. H. Reppy, and L. J. Hendren. Early Experiences with Olden. In Proceedings of the 6th International Workship on Languages and Compilers for Parallel Computing, pages 1--20, August 1993.
No context found.
M. Carlisle, A. Roges and L. Hendren, "Early Experiences with Olden", In Proc. of the ACM Conference on Principles and Practice of Parallel Processin
First 50 documents
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC