| M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the Conference on Programming Language Design and Implementation, pp. 212--223, 1998. |
....clients. For parallel applications, operating systems provide system calls for the creation and synchronization of multiple threads, and they provide high level multithreaded programming support with parallelizing compilers and threads libraries. In addition, programming languages, such as Cilk [7, 21] and Java [3] support multithreading with linguistic abstractions. A major factor in the performance of such multithreaded parallel applications is the operation of the thread scheduler. Prior work on thread scheduling [4, 5, 8, 13, 14] has dealt exclusively with non multiprogrammed environments ....
....it pushes the child thread on the bottom of its deque and continues executing the root thread at x 3 , or it pushes the root thread on the bottom of its deque and starts executing the child thread at x 4 . The bounds proven in this paper hold for either choice. The latter choice is often used [21, 22, 31], because it follows the natural depth first single processor execution order. It is possible that a thread may enable another thread and die simultaneously. An example is the join between the root thread and the child thread in Figure 1. If the root thread is blocked at x 10 , then when a ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the 1998.
....2 shows how tasks are fused in the program. Captions for the subfigures in Fig. 2 are given below. Although each thread has its own stack in Fig. 2, this is not essential. Our implementation scheme can also be applied to systems in which di#erent threads reside in the same stack (e.g. Cilk [6], StackThreads MP [19] and Schematic [14, 15, 17] fa (x) fa (y) thread X 2 fa (x) fa (y) thread Y thread Z S T T S 4 thread Y thread X S z T 5 fa(x y) thread X thread Y 6 z y S thread X T z thread Y Fig. 2. How two method invocations are fused. 1. Thread Z is executing ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI '98), pages 212--223, 1998.
....2 shows how tasks are fused in the program. Captions for the subfigures in Fig. 2 are given below. Although each thread has its own stack in Fig. 2, this is not essential. Our implementation scheme can also be applied to systems in which di#erent threads reside in the same stack (e.g. Cilk [6], StackThreads MP [19] and Schematic [14, 15, 17] fa (x) fa (y) thread X 2 fa (x) fa (y) thread Y thread Z ST TS 4 thread Y thread X S z T 5 fa(x y) thread X thread Y 6 z y S thread X T z thread Y Fig. 2. How two method invocations are fused. 1. Thread Z is executing ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation (PLDI '98), pages 212--223, 1998.
....the code, that certain function calls can run concurrently with the caller. That is, the parallel implementation is based entirely on recursive calls that can be performed in parallel, and not on loop partitioning, explicit multithreading, or message passing. The parallel implementation uses Cilk [21, 39], a programming environment that supports a fairly minimal parallel extension of the C programming language and a specialized run time system. One of the most important aspects of using Cilk is the fact that it performs dynamic scheduling that leads to both load balancing and locality of ....
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223, 1998.
....and the result with an Instruction and achieves almost the same speedup. 9.4.5 Checkers Checkers is an implementation of the two player game of checkers played using alphabeta search. This implementation is a straightforward Java conversion of a C version from MIT s CILK distribution (see [16]) with CILK s spawn, sync and abort statements replaced by a multithreaded scheme with a job queue of spawned explicit invocationrecord objects. Object combining opportunities stem from combining the board s data (a one dimensional integer array marking the positions of the pieces) into the board ....
M. Frigo, C.E. Leiserson, and K.H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), pages 212--223, Montreal, Quebec, Canada, 1998. ACM Press.
.... between instructions (the computation) from the way that instructions are mapped to the processors (the schedule) 8] One example of a computation centric memory model is the Dag Consistency or Location Consistency model[8, 4, 3] that was developed for Cilk like multithreaded computations[6, 7, 10, 14]. Cilk uses a randomized work stealing scheduler and achieves performance close to the lower bounds of fully strict multithreaded algorithms without using user level shared memory. In a fully strict multithreaded algorithms a thread may only synchronize with its children. In Cilk, a multithreaded ....
M. Frigo, K. H. Randall, and C. E. Leiserson. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 1998.
....evaluation of all cases on several platforms. Finally, in Section 7, we summarize our conclusions and present our future work. 2 RELATED WORK Only a few runtime systems implement an efficient and portable two level threads model for multiprocessors. The latest version of the Cilk runtime system [6] is intended to run on Unix like systems that support POSIX threads. As a programming language, Cilk does not use explicitly user level threads but Cilk frames, which are generated by its cilk2c compiler. IBM s State threads [7] is a very portable user level threads package based on the ....
....of memory allocation is avoided. This is an optimization possibly used internally by the two level threads library. The minimal overhead for creating and executing a single nanothread cached descriptor is just a few cycles. 6. 2 Applications Table 6 presents the execution time of Fibonnaci [6]. This application corresponds to the divide and conquer programming paradigm and requires the creation of a large number of nanothreads that remain active (blocked) There is no single level parallelism and consequently work descriptors cannot be used. Since, this program generates a tree of ....
M. Frigo, K. H. Randall, and C. E. Leiserson. The implementation of the Cilk-5 multithreaded language, In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 1998.
....system (the subject of our research) Many other natural language parsers will not be able to benefit from this approach either. 6] Our approach provides a solution in case fine grained parallelism is required. On the scheduling side, our approach shows close resemblance to the Cilk 5 system. [16] It implements work stealing using similar techniques and also minimizes overhead according to principle of moving as much of the overhead as possible from the workers to the thieves. An important di#erence, though, is that our scheduler was designed with tabulation algorithms in mind (cf. ....
....incurs no extra overhead. The stealing stack also functions to let thieves perform the cheapest stealing operations possible. Synchronization between worker and thief has been optimized to move as much of the overhead as possible to the thief by using a Dijkstra like mutual exclusion protocol. [18, 16] The protocol is illustrated in figure 3. As long as no stealing is taking place, workers will not have to resort to an expensive lock. In addition, a worker will only need to lock if a thief is stealing at the same or higher stealing level (as defined by the stack) This prevents, for example, a ....
Matteo Frigo, Charles E. Leiserson, and Keigh H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223, May 1998.
....implementations for Awari and Amazons. This does not include the time to tune the performance, for example, setting the best granularity for the target environment. There are several ways that games programmers may parallelize their sequential algorithms with a minimal amount of e ort. Cilk [17] is an extension of the C language, with some primitives to support parallelism. Given a sequential algorithm, Cilk is a choice for implementing synchronous parallel algorithms, such as YBWC. However, we note that the runtime system of Cilk is based on the work stealing framework, which ....
M. Frigo, C. E. Leiserson, and K. H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In ACM SIGPLAN Conferences on Programming Language Design and Implementation (PLDI'98), pages 212{ 223, 1998.
....have been proposed for commodity architectures that introduce a compromise between stack and heap allocation techniques. For example, Lazy Threads [8] are based on stacklets, and the Illinois Concert system [9] employs a hybrid stack heap execution mechanism. The multithreaded Cilk language [7] exhibits a striking balance between versatility of threads, portability, and efficiency. Cilk s implementation is based on the cactus stack semantics proposed by Moses in 1970 [12] As operating systems and compilers are turning into commodity components, it is desirable to implement ....
....work is sketched in Section 6, and conclusions for future work are drawn in Section 7. 2 Cilk and the Work First Principle The idea of indolent closures has been motivated by the work first principle that underlies the implementation of the Cilk language. Cilk extends the C programming language [7] into an algorithmic multithreaded language. In Cilk, parallelism is exposed explicitly by means of the spawn and sync keywords. If the spawn keyword precedes the call of a procedure, the calling parent can continue execution in parallel with the called child. If a Cilk program is executed on one ....
[Article contains additional citation context not shown here]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Conference on Programming Language Design and Implementation, pages 212--223, Montreal, Canada, June 1998. ACM SIGPLAN.
....executions, then merging the results. efficient fixed point algorithm that runs in polynomial time and solves these dataflow equations. We have implemented this algorithm in the SUIF compiler infrastructure and used the system to analyze programs written in Cilk, a multithreaded extension of C [10]. The implemented algorithm handles a full range of constructs in multithreaded programs, including function pointers, recursive functions, pointers to structures and arrays, pointer arithmetic, casts between pointer variables of different types, heap allocated memory, stack allocated memory, ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....design, Cilk targetted exclusively shared memory machines. Cilk uses a provably good work stealing scheduling algorithm and follows a work rst principle. Cilk concentrates on minimizing overheads that contribute to work, even at the expense of overheads that contribute to the critical path [8]. CilkNOW is an implementation of Cilk for networks of workstations [1, 3] It transparently manages resources, provides transparent fault tolerance, and implements adaptive parallelism which allows a Cilk application to run on a set of workstations that may grow and shrink throughout program ....
M. Frigo, C.E. Leiserson, and K.H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 212-223, Montreal, Quebec, June 17-19, 1998. SIGPLAN Notices, 33(6), June 1998.
....that communicate through a virtually or physically shared memory address space. A multitude of multithreading interfaces for parallel programming is in use, including standardized interfaces like POSIX threads [4] and experimental systems that serve as the backend of speci c algorithmic (e.g. Cilk [2]) or compilation (e.g. Nanothreads [11] frameworks. This work has been carried out while the second author was with the High Performance Information Systems Laboratory, University of Patras, Greece. The metrics used so far as evaluation criteria of multithreading systems are primarily the ....
M. Frigo, C. Leiserson, and K. Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proc. of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, Montreal, Canada, June 1998.
....chosen busy processor. This kind of scheduler is also called greedy scheduler. The execution time of a multithreaded program running on P processors with the greedy scheduler is T p T 1 =P T1 , where T 1 is the executing time on one processor and T1 is the executing time on infinite processors [6]. The work stealing strategy is also used by distributed Cilk for scheduling in a cluster of SMPs. As is required in the distributed memory clustering environment, the distributed Cilk runtime system implements its own distributed shared memory, supporting a memory consistency model called ....
M. Frigo, K. H. Randall, and C. E. Leiserson. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 1998.
.... threads that persist across steps but periodically synchronize at a barrier, reconstructing the structure is a challenging analysis problem [2] If the program uses structured control constructs such as parallel loops or recursively generates parallel computations in a divide and conquer fashion [42], the parallel phases are obvious from the syntactic structure of the program. Parallel computing programs use many of the same kinds of data as activity management programs. An additional complication is the fact that the parallel tasks often access disjoint parts of the same data structure. ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....work. We further limit the number of steals by allowing a thread to steal an amount of work proportional to the amount of work available at the victim. 1 The Deltra system comprises a medium sized grammar and dictionary for Dutch. 2 The work stealing approach is similar to the Cilk 5 system [2], but di ers in its optimization for chart parsers. 3 There is one thread for each processor. Threads are automatically distributed amongst processors by the OS. As mentioned before, synchronization is also required to store results in the chart. Letting threads wait unconditionally for ....
M. Frigo, C.E. Leiserson, and K.H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212-223, May 1998.
....verification algorithms presented in this paper. This compiler was implemented using the SUIF compiler infrastructure [1] We implemented all of the analyses, including the pointer analysis, from scratch starting with the standard SUIF distribution. Our compiler generates parallel code in Cilk [7], a parallel dialect of C. We present experimental results for two recursive sorting programs (Quicksort and Mergesort) a divide and conquer blocked matrix multiply (BlockMul) a divide and conquer LU decomposition (LU) and a scientific computation (Heat) We would like to emphasize the ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....to determine that [d; d n=4 Gamma 1] and [d n=4; d n=2 Gamma 1] denote nonoverlapping regions of the same block. It can use similar strategies to determine that all of the other pairs are independent, and that the calls can execute in parallel. Our compiler generates parallel code in Cilk [7], a parallel dialect of C. In Cilk, a spawn construct in front of a procedure call instructs the compiler that the designer expects the call to execute in parallel; the sync construct indicates that previous parallel calls may conflict with the computation after the sync statement, and that the ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....in seconds. Intel Linux, 333 MHz Sparc Solaris, 60 MHz Intel Solaris, 450 MHz Athapascan 1 VDS Cilk 5 VDS DOTS VDS 4.8 1.1 0.56 2.38 21.5 0.78 The only system that generates less overhead than VDS is Cilk 5. The reason for this is that Cilk 5 uses a heap for the allocation of activation frames [29]. This can be done very e ciently, since all frames of the a Cilk 5 procedure have the same size. The local variables of the threads are kept on the runtime stack. VDS stores the data that belong to the same object in a continuous area of virtual memory. This requires a call to malloc for every ....
....time t is set to 0:1 seconds. Figure 11 shows the e ciencies that were obtained by the systems. Since Cilk 5 implements work stealing through shared memory, it is able to achieve higher e ciencies than VDS. Both victim and thief operate directly through shared memory on the victim s task deque [29]. On the other hand, VDS realizes work stealing with message passing which involves more overhead. Compared to Athapascan 1 and DOTS, VDS is more e cient. One reason for this is the C overhead. Another reason is the fact that Athapascan 1 is designed for general data ow applications and ....
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), volume 33 of SIGPLAN Notices, pages 212223, 1998.
....tens or hundreds of thousands of threads are created. Because this is often the case in very ne grain parallel programs (see below) in this paper we focus exclusively on user level threads packages. The purpose of this paper is to evaluate four user level threads packages: Cilk [BJK 95, FLR98] Filaments [FLA94, LFA96] Lazy Threads [GSC96] and StackThreads MP [TTY96, TTY99] Each claims to provide support for large numbers of ecient, ne grain threads. We compare these packages in two ways. First, we examine how closely each package supports the thread model that is generally ....
....to optimize loops. When threads are created for individual loop iterations, as is often the case in parallel programs, these optimizations are lost. 4 4 Fine Grain Threads Packages A number of research groups have developed methods for the ecient support of ne grain threads. Cilk ( BJK 95, FLR98] uses a theoretically ecient scheduler and a restricted thread model to eciently manage threads. Concert[PKZA95, CKP93] is a full compiler designed with heavy optimizations for eciency of multithreaded programs. Filaments ( FLA94, LFA96] also restricts the thread model to reduce the overhead ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, June 1998.
....3.8 to 8.6 times better than the unoptimized version. We also evaluate our transformations by comparing the performance of our automatically generated code with that of several versions of the programs with optimized, hand coded base cases. We obtained these versions from the Cilk benchmark set [9]. The last column in Tables 1 and 2 presents the running times of the hand coded versions. The best automatically unrolled version of Mul performs between 2.2 and 2.9 worse than the hand optimized version. The performance of the best automatically unrolled version of LU is basically comparable to ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....discusses related work. We conclude in Section 7. 2 Example Figure 4 presents an example that illustrates the kinds of programs that our analysis is designed to handle. The mul procedure implements a recursive, divide and conquer matrix multiply algorithm written in Cilk, a parallel dialect of C [16]. 2.1 Parallelism in the Example In the divide part of the algorithm, the mul procedure divides each array into four subarrays. It then calls itself recursively to multiply the appropriate pairs of subarrays as typedef double block[N N] void serialmul(double A, double B, double R) f int i, ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....expression of dynamic, lightweight threads. These include data parallel languages like HPF [22] or Nesl [5] where the sequence of instructions executed over individual data elements are the threads ) dataflow languages like ID [16] control parallel languages with fork join constructs like Cilk [20], CC [13] and Proteus [29] languages with futures like Multilisp [39] and various user level thread libraries [3, 17, 30, 43] In the lightweight threads model, the programmer simply expresses all the parallelism in the program, while the language implementation performs the task of ....
....thread creation and scheduling are typically local operations, they incur low overhead and contention. Further, threads close together in the computation graph are often scheduled on the same processor, resulting in good locality. Several systems have used work stealing to provide high performance [11, 17, 18, 20, 26, 39, 42, 44]. When each processor treats its own ready queue as a LIFO stack (that is, adds or removes threads from the top of the stack) and steals from the bottom of another processor s stack, the scheduler successfully throttles the excess parallelism [8, 39, 41, 44] For fully strict computations, such a ....
[Article contains additional citation context not shown here]
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proc. ACM Conf. on Programming Language Design and Implementation, pages 212--223, 1998.
....2 Example Figure 1 presents a simple example that illustrates the kinds of programs that our analysis is designed to handle. The dcInc procedure implements a recursive, divide and conquer algorithm that increments each element of an array. The example is written in Cilk, a parallel dialect of C [14]. 2.1 Parallelism in the Example In the divide part of the algorithm, the dcInc procedure divides each array into two subarrays. It then calls itself recursively to increment the elements in each subarray. Because the two recursive calls are independent, they can execute concurrently. The ....
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
....a sequential object, which the system then composes with a proxy and a so called body. The proxy turns local calls into messages, which are decoded by the body. Futures and continuations are provided by creating specialized subclasses of user objects which contain the appropriate code. Cilk [14] is an algorithmic, multithreaded language that compiles to ANSI C. The runtime system guarantees predictable performance. It features an architecture and language independent checkpointing facility based on source to source translations. Code is structured into procedures. Each procedure ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI '98, pages 212--23, Montreal, Canada, 17--19 June 1998.
.... developed a life of its own as we developed such algorithms for LU decomposition, including undulant pivoting and exact arithmetic and more recently for QR factorization [48, 20] Others have implemented LU and Cholesky factorization and other classic algorithms over the quadtree representation [45, 27, 22]. Since we seek style and consistent representation before raw performance, translating internal blocks to and from column major representation just to get impressive BLAS3 performance from DGEMM is not a choice [12, 13] The idea is to explore a uniform representation, and to discover the ....
M. Frigo, C. E. Leiserson, & K. H. Randall. The implementation of the CILK-5 multithreaded language. Proc. ACM SIGPLAN '98 Conf. on Program. Language Design and Implementation, SIGPLAN Not. 33, 6 (May 1998), 212--223.
....parallelism, since many parallel programming languages allow the expression of dynamic threads. Such languages include data parallel languages like High Performance Fortran [31] and NESL [5] data ow languages like ID [21] control parallel languages with fork join constructs like CILK [8, 27], CC [19] and Proteus [40] languages with synchronization primitives like Multilisp [30] Mul T [35] COOL [20] SISAL [25] Jade [49] and various other user level thread libraries [3, 46, 53] For the execution of a multithreaded computation on a parallel computer, one should specify which ....
....Designing a scheduler to achieve all of the above goals is not a trivial task. There are three main performance parameters for scheduling algorithms for multithreaded computations, namely their space complexity, their execution time and the communication cost incurred by them. Several systems [16, 23, 27, 30, 35] have used work stealing to achieve the scheduling goals described above and to provide high performance. Work stealing is a technique in which underutilized processors try to steal work from other, hopefully overutilized processors. Indeed, work stealing has been proved (see e.g. 1, 8, 12, ....
[Article contains additional citation context not shown here]
M. Frigo, C. E. Leiserson and K. H. Randal, \The Implementation of the Cilk-5 Multithreaded Language," Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 212-223, Montreal, Canada, June 1998.
....as well, it is impossible (or at least impractical) to remove overhead or tune scheduling of threads themselves just for the sake of supporting this style. While the ideas surely have a longer heritage, the first published framework to offer systematic solutions to these problems was Cilk[5]. Cilk and other lightweight executable frameworks layer special purpose fork join support on top of an operating system s basic thread or process mechanisms. This tactic applies equally well to Java, even though Java threads are in turn layered onto lower level OS capabilities. The main advantage ....
....work and fails to steal any from others, it backs off (via yields, sleeps, and or priority adjustment see section 3) and tries again later unless all workers are known to be similarly idle, in which case they all block until another task is invoked from top level. As discussed in more detail in [5], the use of LIFO rules for each thread processing its own tasks, but FIFO rules for stealing other tasks is optimal for a wide class of recursive fork join designs. Less formally, the scheme offers two basic advantages: It reduces contention by having stealers operate on the opposite side of the ....
[Article contains additional citation context not shown here]
Frigo, Matteo, Charles Leiserson, and Keith Randall. The Implementation of the Cilk-5 Multithreaded Language. In Proceedings of 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1998.
....resembles Brent s theorem [CLR90, p. 709] which yields the upper bound of T p T 1 =P T1 [Blu95] It has been verified empirically that the constant factor hidden by the order notation is small, so that T p T 1 =P T1 is a good approximation for a wide range of applications [Blu95,BJK 95, FLR98,BL94] 8.6 The Cilk Algorithms for PDES PDES by using Cilk is a relatively new research topic. Only two existing results based on conservative protocols have been presented in the literature. Both results and their empirical findings are presented below. 8.6.1 The Safe Time Approach Cai, ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98), Montreal, Canada, June 17--19, 1998.
No context found.
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 212--223, 1998.
....the Cilk language, illustrating how Cilk supports the programming of parallel game tree search and other chess mechanisms. 1 Introduction The Supercomputing Technologies (Supertech) Research Group in the MIT Laboratory for Computer Science began developing the Cilk multithreaded language [5, 8, 22, 27] in 1994. Development of Cilk has been intertwined with the development of a series of computer chess programs: StarTech, Socrates, and Cilkchess. Although the development of Cilk itself has been funded by the U.S. Defense Advanced Research Projects Agency (DARPA) all of our chess programs have ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), pages 212-223, Montreal, Canada, June 1998.
....can be written to use many concurrent execution streams and not su#er when running on a machine with fewer processors. Cilk has the further advantage of guaranteeing e#cient and predictable performance, despite running on such a variety of parallel platforms. Cilk is more fully described in [FLR98] and [Ran98] Despite Cilk s goal of simplifying parallel software development, it lacks a parallel file I O framework. Cilk programs must use the standard POSIX file I O library, and 12 they must explicitly arrange any necessary serialization or multiplexing. A programmer who wants to use ....
....serial append. A complete reference of Cheerio entry points follows, along with a list of caveats, and a discussion about porting applications to Cheerio. Chapter 3 presents the rationale for Cheerio s design decisions. It presents the crucial idea of a faithful extension, first introduced in [FLR98] along with four criteria for designing Cheerio: similar to POSIX when appropriate, high performance, hardware and platform independent, and minimal but complete. Chapter 3 also presents two other parallel APIs, compares them to Cheerio, and demonstrates why Cheerio is a better API in certain ....
[Article contains additional citation context not shown here]
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM SIGPLAN 70 '98 Conference on Programming Language Design and Implementation (PLDI), pages 212--223, Montreal, Canada, June 1998.
....a parallel program running on one processor is so much slower and or more complicated than the corresponding serial program that people prefer to use two separate codes. The Cilk 5 multithreaded language, which I have designed and implemented together with Charles Leiserson and Keith Randall [58], addresses this problem. In Cilk, one can write parallel multithreaded programs that run efficiently on any number of processors, including 1, and are in most cases not significantly more complicated than the corresponding serial codes. Cilk is a simple extension of the C language with fork join ....
....allow the program to execute in parallel. If the Cilk keywords for parallel control are elided from a Cilk program, however, a syntactically and semantically correct C program results, This chapter represents joint work with Charles Leiserson and Keith Randall. A preliminary version appears in [58]. Cilk is not a functional language, but the contest was open to entries in any programming language. which we call the C elision (or more generally, the serial elision) of the Cilk program. Cilk is a faithful extension of C, because the C elision of a Cilk program is a correct implementation ....
[Article contains additional citation context not shown here]
M. FRIGO,K.H.RANDALL, AND C. E. LEISERSON, The implementation of the Cilk-5 multithreaded language, in Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), Montreal, Canada, June 1998. 162
....less work There are Gamma n 2 Delta n 1 possible n bit words with at most two 1 s set. Is there a hash function that would allow us to index the 1 s with only a single multiplication and table lookup on a table not much larger than Gamma n 2 Delta n 1 We have written a Cilk [1, 6] parallel backtracking program to search for a hashing constant that could be used in a multiplicative hash function to index two or fewer bits. The key observation used by the algorithm is that the hash value of an input word whose s low order bits are all zero depends only on the low order n ....
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM SIGPLAN '98 Conference on Programming Language Design and Implementation (PLDI), pages 212--223, Montreal, Canada, June 1998.
No context found.
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the Conference on Programming Language Design and Implementation, pp. 212--223, 1998.
No context found.
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 212-223, 1998.
No context found.
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM PLDI, 1998.
No context found.
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In PLDI, pages 212--223, 1998.
No context found.
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
No context found.
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
No context found.
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN '98 Conference on Programming Language Design and Implementation, pages 212--223, 1998.
No context found.
Matteo Frigo, Charles Leiserson, and Keith Randall. The Implementation of the Cilk-5 Multithreaded Language. Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, 1998.
No context found.
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Language Design and Implementation, Montreal, Canada, June 1998.
No context found.
M. Frigo, C.E. Leiserson, and K.H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223, May 1998.
No context found.
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. In ACM Conference on Programming Languages Design and Implementation (PLDI'98), volume 33, pages 212--223, Atlanta, May 1998. ACM.
No context found.
M. Frigo, C. Leiserson, and K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of the SIGPLAN '98 Conference on Program Language Design and Implementation, Montreal, Canada, June 1998.
No context found.
M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223, 1998.
No context found.
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223,
No context found.
Matteo Frigo, Charles E. Leiserson, and Keigh H. Randall. The implementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices, 33(5):212--223, May 1998.
No context found.
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the Cilk-5 multithreaded language. In 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'98), Montreal, Canada, June 17-- 19, 1998.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC