40 citations found. Retrieving documents...
Wilson C. Hsieh, Paul Wang, and William E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 239--248, San Diego, California, May 1993.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Executing Multithreaded Programs Efficiently - Blumofe (1995)   (12 citations)  (Correct)

....in some of these other systems, though Cilk s algorithm uses randomness and is provably efficient. Many multithreaded programming languages and runtime systems are based on heuristic scheduling techniques. Though systems such as Charm [91] COOL [27, 28] Id [3, 37, 80] Olden [22] and others [29, 31, 38, 44, 54, 55, 63, 88, 98] are based on sound heuristics that seem to perform well in practice and generally have wider applicability than Cilk, none are able to provide any sort of performance guarantee or accurate machine independent performance model. These systems require that performance minded programmers become ....

Wilson C. Hsieh, Paul Wang, and William E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 239--248, San Diego, California, May 1993.


ADAM: A Decentralized Parallel Computer Architecture Featuring.. - Huang (2002)   (Correct)

....(NOW) migration mechanisms tend to operate on coarse grained processes and objects. Migration on NOWs tend to be under dynamic run time control, and migration times are on the order of tens to hundreds of milliseconds. RC96] On the other hand, computation migration on Alewife [HWW93] implements structured activation frame movement throughout the machine using statically compiled migration directives, yielding migration times on the order of several hundreds of processor cycles. At the least common denominator, every migration mechanism must do the following things: figure ....

....In addition, Active Threads uses simple user space messaging protocols for communication, to cut the overhead of copying messages and buffers in OS space. User space thread migration reduces thread migration latencies down to about 150 s (about 16,000 processor cycles) Computation Migration [HWW93] also performs users space thread migration, but in a more restrictive fashion. In Computation Migration, static annotations in user code cause a thread to spawn new procedures on remote nodes; also, HWW93] makes no indication that inter thread communication resources are migrated. A single ....

[Article contains additional citation context not shown here]

Wilson C. Hsieh, Paul Wang, and William E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proceedings of the Fourth ACM PPOPP, pages 239--248, California, May 1993.


High-Performance All-Software Distributed Shared Memory - Johnson (1995)   (9 citations)  (Correct)

....the data it references. Systems organized along these lines avoid the overhead of frequent remote communication by migrating computation to the node upon which frequently referenced data resides [10, 65] Implementations utilizing both computation and data migration techniques are also possible [4, 11, 32]. As with static software DSMs, the high fixed overheads of message based communication in many current generation systems drive dynamic software DSM implementors toward optimizations that reduce the total number of message sends and receives. In addition, because many dynamic software DSM ....

....personal communication, August 1995. 84 and, if so, how frequently can be replaced with carefully selected, efficient message based primitives. The extent to which any migratory data problem can be addressed using other techniques such as function shipping [11] and computation migration [31, 32] probably also warrants further investigation. 85 Related Work CRL builds on a large body of research into the construction of distributed shared memory systems. However, as discussed in Section 1.1, four key properties distinguish CRL from other DSM systems: simplicity, portability, ....

Wilson C. Hsieh, Paul Wang, and William E. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In Proceedings of the Fourth Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 239--248, May 1993.


Runtime Optimizations for a Java DSM Implementation - Veldema, Hofman, Bhoedjang.. (2001)   (8 citations)  (Correct)

....(7) Flush (diff) 8) Decrement (3) Read or write miss (6) Region data (shared) 9) Shared to exclusive Figure 4: Lazy flushing. nation with heap analysis is used to remove checks on objects that remain local to the creating thread. The compiler may generate code for computation migration [14]: part or all of a method invocation is moved to the machine where the data resides. This may be especially effective for synchronized blocks and thread object constructors. In Jackal, the home node of the lock object acts as the manager of the lock. Lock, unlock, wait and notify calls are ....

W.C. Hsieh, P. Wang, and W.E. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. pages 239--248.


Linear Naming: Experimental Software for Optimizing.. - Bawden, Mairson (1998)   (Correct)

....a call message from A to B; then B would send the data to C in a second (tail recursive) call message that contained the same continuation; finally C would return to that continuation by sending a return message to A. People working on distributed systems are starting to work with this idea now [HWW93, CJK95] In addition to performance problems, there is another shortcoming shared by RPC and all the improvements described above: They all require the caller to specify the network node where the next step of the computation is to take place. The caller must explicitly think about where data is ....

W. C. Hsieh, P. Wang, and W. E. Weihl. Computation migration: enhancing locality for distributed-memory parallel systems. In 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP 93), pages 239--248, July 1993.


Runtime Optimizations for a Java DSM Implementation - Veldema, Hofman, Bhoedjang.. (2001)   (8 citations)  (Correct)

....thread. Home (1) Write miss (2) Region data (exclusive) 4) Exclusive to shared (5) Shared ack diff (7) Flush (diff) 8) Decrement (3) Read or write miss (6) Region data (shared) 9) Shared to exclusive Figure 4: Lazy flushing. The compiler may generate code for computation migration [12]: part or all of a method invocation is moved to the machine where the data resides. This may be especially effective for synchronized blocks and thread object constructors. In Jackal, the home node of the lock object acts as the manager of the lock. Lock, unlock, wait and notify calls are ....

W.C. Hsieh, P. Wang, and W.E. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. pages 239--248.


Load Balancing in Home-based Software DSMs - Shi, Tang   (Correct)

....resources dynamically. Within the context of dynamic load balancing scheme, the run time system needs an appropriate way to change the amount of work assigned to each processor. Dynamic task scheduling is an effective strategy to attack this problem, and it has been widely studied in the literature[20, 14, 19, 3, 10, 2, 22, 12, 23]. However, most of these previous work focus either on shared memory multiprocessors or on distributed systems. Furthermore, many evaluation environments are dedicated, which can not reflect the dynamic behaviors of real world. According to the granularity of a task, previous work can be ....

.... SS BSS GSS FS TSS SSS AFS AAFS ABS (e) Figure 4: Comparison of execution time of different scheduling schemes in metacomputing environment: a)SOR, b) JI, c) TC, d) MM, and (e) AC 15 Hsieh et al. proposed computation migration to enhancing locality for distributed memory parallel systems[3]. They compared the performance of RPC and data migration. Since the goal of their computation migration is locality, it may result in load imbalance, which is different from our goal, which uses computation migration to balance the load, and try to maintain processor locality at the same time. ....

W. C. Hsieh, P. Wang, and W. E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proc. of the Fourth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP'93), May 1993.


Source-Level Global Optimizations for Fine-Grain.. - Veldema, Hofman.. (2001)   (Correct)

....With object graph aggregation the entire object graph will be marked dirty, causing the entire object graph to be flushed and re fetched. This extra communication must be traded against the gains in sequential code speed and bulk data transfer. 6. COMPUTATION MIGRATION Computation migration [13] is a mechanism that moves part or all of a computation and its state to the data used by that computation. This may be more efficient than moving multiple data objects to the computation or than repeatedly moving the same data object to the computation, which occurs typically with e.g. barrier ....

....has to work harder to combine data and synchronization traffic because no programmer assistance is available. Implementing computation migration requires compiler support, both to detect opportunities and implement computation migration. Compiler supported computation migration is used in Prelude [13] and Olden [24] Like Jackal, both these systems use compiler support to find the live variables that must be shipped to the remote processor. Both Prelude and Olden use computation migration to enhance locality by eliminating multiple roundtrips to remote data objects (using programmer ....

W. Hsieh, P. Wang, and W. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In PPoPP'93, pages 239--248, May 1993.


High-Level Abstractions for Efficient Concurrent Systems - Jagannathan, Philbin   (Correct)

....to the processor on which the database resides; the database itself does need not to migrate to processors executing queries. The ability to send procedures to data rather than more traditional RPC style communication leads to a number of potentially significant performance and expressivity gains [14]. First class procedures and lightweight threads make active message passing an attractive high level communication abstraction. In systems that support active messages without the benefit of these abstraction, this functionality is typically realized in terms of low level support protocols. ....

Wilson Hsieh, Paul Wang, and William Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 239--249, May 1993. Also appears as ACM SIGPLAN Notices, Vol. 28, number 7, July, 1993.


Affinity-based Self Scheduling: A More Practical Load.. - Shi, Hu, Tang (1999)   (Correct)

....the data to be accessed by this thread. Their results show that the looser the interconnection work, the more important the locality management. Our algorithm depends greatly on this conclusion. Hsieh et.al propose computation migration to enhance locality for distributed memory parallel systems[2], they compare the performance of RPC, data migration. Since the goal of their computation migration is locality, it may result in load imbalancing, which is different from our goal. Furthermore, the implementation methods used by us is different too. They implement computation migration in kernel ....

W. C. Hsieh, P. Wang, and W. E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proc. of the Fourth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP'93), May 1993. 30


Concert - Efficient Runtime Support for Concurrent.. - Karamcheti, Chien (1993)   (47 citations)  (Correct)

....scheduled scheduled pcbroute 119262 66692 44 logicsim 36877 15460 58 multigrid 53972 13816 74 Table 8: Scheduling statistics in CA programs. 5 Related work Our work is related to many efforts focusing on runtime support for efficiently executing irregular computations on stock hardware [19, 20]. It differs from runtime systems for coarse grained object oriented languages such as COOL [21] and Mentat [22] by focusing on fine grained object level concurrency. The ABCL onAP1000implementation [18] is most similar with our work but adopts a traditional design, emphasizing techniques for ....

W. C. Hsieh, P. Wang, and W. E. Weihl, "Computation migration: Enhancing locality for distributed-memory parallel systems," in Proceedings of the Fifth ACM SIGPLAN Symposium on the Principles and Practice of Parallel Programming, pp. 239--248, 1993.


Heterogeneous Process Migration: The Tui System - Smith, Hutchinson (1996)   (32 citations)  (Correct)

....best, data can only be transmitted at the speed of light, causing noticeable delays. If a program makes frequent use of remote data, its performance will suffer. Process migration can help alleviate this problem by moving the program closer to the data, rather than moving the data to the program [20]. Typically, a program would start executing on the user s local machine. If it later makes frequent accesses to remote data, the migration system will reduce the delay by moving the process to a machine that is physically closer to the data. This makes a lot of sense in the case where the program ....

Wilso C. Hsieh, Paul Wang, and William E. Weihl. Computation Migration: Enhancing Locality for Distributed Memory Parallel Systems. SIGPLAN Notices, page 239, July 1993.


Supporting Dynamic Data Structures on Distributed.. - Rogers, CARLISLE.. (1995)   (98 citations)  (Correct)

....has chosen a good data layout, then migrating the computation to the data allows the system to perform the computation on the processor that owns most of the required data. This can reduce the amount of interprocessor communication substantially and thereby reduce execution time. Recent work by Hsieh, Wang, and Weihl [1993] supports this claim. 3.2 Thread Splitting While the migration scheme provides a mechanism for operating on distributed data, it does not provide a mechanism for extracting parallelism from the computation. When a thread migrates from Processor P to Q, P is left idle. In this section, we ....

....These languages provide primitives for object location and mobility, and constructs to allow the programmer to indicate whether the thread or the object(s) should move to satisfy an invocation that references a remote object. Prelude, a language being developed at MIT [Weihl et al. 1991; Hsieh et al. 1993], includes a migration mechanism similar to ours. The goal of the Prelude project is to develop a language and support system for writing portable parallel programs. Prelude is an explicitly parallel language that provides a computation model based on threads and objects. Annotations are added to ....

Hsieh, W., Wang, P., and Weihl, W. 1993. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 239--248.


Continuation-based Transformations for Coordination Languages - Jagannathan (1999)   (Correct)

....these continuations, the greater the bandwidth requirements imposed. Moreover, if a continuation is closed over a large structure that never happens to be accessed on the address spaces to which the continuation migrates, efficiency is likely to be significantly impacted. Computation migration [HWW93] refers to an implementation technique that lazily transmits portions of a thread s state when the thread is involved in a migration event. The technique reduces bandwidth requirements while potentially increasing the number of remote communication events. Conceptually, this scheme suggests that a ....

....Standard ML of New Jersey [AM87, App92] or Chez Scheme [HDB90, BWD96] By using techniques found in these implementations, we believe the overhead of moving continuations in a distributed environment can also be made tractable. Indeed, experimental results from early work on computation migration [HWW93] provides evidence that the cost of lazy migration of continuation frames can be efficient in practice. We intend to validate these conjectures in a realistic implementation and believe the careful use of freeze annotations imposes a small burden on the programmer, but can greatly reduce the ....

[Article contains additional citation context not shown here]

Wilson Hsieh, Paul Wang, and William Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In The 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pages 239--249, New York, May 1993. ACM.


Heterogeneous Process Migration: The Tui System - Smith, Hutchinson (1997)   (32 citations)  (Correct)

....best, data can only be transmitted at the speed of light, causing noticeable delays. If a program makes frequent use of remote data, its performance will suffer. Process migration can help alleviate this problem by moving the program closer to the data, rather than moving the data to the program [4]. Typically, a program would start executing on the user s local machine. If it later makes frequent accesses to remote data, the migration system will reduce the delay by moving the process to a machine that 4 is physically closer to the data. This makes the most sense in the case where the ....

Wilso C. Hsieh, Paul Wang, and William E. Weihl. Computation Migration: Enhancing Locality for Distributed Memory Parallel Systems. SIGPLAN Notices, 28(7):239--248, July 1993.


Dynamic Task Migration in Home-based Software DSM Systems - Shi, Hu, Tang (1999)   (1 citation)  (Correct)

....compared with static task allocation schemes. As a result, our new task migration scheme performs better than other computation only migration schemes for an average of 30 . 2 Dynamic Task Migration 2. 1 Principles Different from the computation migration scheme (CompMig) proposed in the past [5, 1], in our new migration scheme, not only the computation subtasks, but also the data subtasks are migrated to appropriate processors. This scheme is represented as TaskMig. The basic framework of TaskMig is listed in Figure 1. y The work of this paper is supported by the CLIMBING Program, and the ....

W. C. Hsieh, P. Wang, and W. E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proc. of the Fourth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP'93), May 1993.


Dynamic Computation Scheduling for Load Balancing in Home-based.. - Shi, Tang (1999)   (Correct)

....data the thread will access. Their results showed that the looser the interconnection work, the more important the locality management. Our algorithm is greatly inspired by this conclusion. Hsieh et al. proposed computation migration to enhancing locality for distributed memory parallel systems[2]. They compared the performance of RPC and data migration. Since the goal of their computation migration is locality, it may result in load imbalancing, which is different from our goal, which uses computation migration to balance the load, and try to maintain processor locality at the same time. ....

W. C. Hsieh, P. Wang, and W. E. Weihl. Computation migration: Enhancing locality for distributed-memory parallel systems. In Proc. of the Fourth ACM SIGPLAN Symp. on Principles and Practice of Parallel Programming (PPOPP'93), May 1993.


Communication-Passing Style for Coordination Languages - Suresh Jagannathan   (Correct)

....the base of a migrated continuation, the continuation below the base is resumed. Thus, falling off the end of a migrated continuation is tantamount to shifting control back to the address space where the suspended portion resides. This approach, sometimes referred to as computation migration [13] reduces bandwidth requirements while potentially increasing the number of remote communication events. To support computation migration, we extend the coordination language with one new operation. The expression, e) marks the top frame of the current continuation stack, and returns the result ....

....similar to the one described here. The language analyzed did not include continuations, however. The formal exact semantics was given in direct style, and thus is not well suited to specifying issues related to task migration as done here. Process migration [19] and computation migration [13] are two approaches to moving threads in distributed environments. We have presented the semantics of computation migration using a well specified operational semantics. A pleasant property of our formal description of computation migration is that only minor modifications to the machine state ....

W. Hsieh, P. Wang, and W. Weihl, Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems, in The 4th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, New York, May 1993, ACM, pp. 239--249.


CRL: High-Performance All-Software Distributed Shared Memory - Johnson, Kaashoek, Wallach (1995)   (135 citations)  (Correct)

....a data shipping model in which threads of computation are relatively immobileand data items (or copies of data items) are brought to the threads that reference them. Other types of DSM systems (e.g. those in which threads of computation are migrated to the data they reference) are also possible [4, 7, 13, 19]; a suitably generalized version of the classification scheme presented here could likely be applied to these systems as well. We classify systems by three basic mechanisms required to implement DSM and whether or not those mechanisms are implemented in hardware or software. These basic mechanisms ....

Wilson C. Hsieh, Paul Wang, and William E. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In Proceedings of the Fourth Symposium on Principles and Practice of Parallel Programming (PPoPP), pages 239--248, May 1993.


The Importance of Locality in Scheduling and Load Balancing for.. - Keckler (1994)   (1 citation)  (Correct)

....their data is likely to reside in the cache, and data reuse between siblings is likely. Lazy task creation already supports this with the double ended lazy task queue. Determining a task s affinity is not unlike the problem of selecting RPC, data migration, or computation migration as discussed in [10]. 4.3 Analytical Model Consider a simple model in which all tasks begin on a single processor in a p processor system. If executed on that processor, each task runs in time t. Tasks that have affinity for this processor incur a cost hm if migrated to another processor. Tasks without affinity ....

Hsieh, W. C., Wang, P., and Weihl, W. E. Computation migration: Enhancing locality for distributed-memory parallel systems. In Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (May 1993), pp. 239--248.


Dynamic Load Distribution in Massively Parallel.. - Antonio Corradi (1994)   (Correct)

....applications onto the machine, unless (s)he wants to do it. These characteristics are requirements of the Parallel Objects programming environment [1] In addition, in massively parallel architectures, load balancing and locality of communications are also goals to be met for efficiency sake [2]. Load balancing and locality can be obtained via a dynamic distribution of load during execution, that, in the PO environment, is achieved in two ways. On the one hand, PO uses remote creation of newly needed objects [3] On the other hand, PO provides migration of already allocated objects [4] ....

W. C. Hsieh, P. Wang, W. E. Weihl, "Computation Migration: Enhancing Locality for Distributed Memory Parallel System", ACM SIGPLAN Notices, Vol. 28, No. 7, July 1993.


Data and Workload Distribution in a Multithreaded.. - Sohn, Sato, Yoo, Gaudiot (1996)   (3 citations)  (Correct)

....Data partitioning and alignment [9] is another typical method to reduce communication overhead. Analyzing the behavior of the program, data can be partitioned and allocated to processors such that runtime data movement (read write) can be minimized. The dynamic migration technique reported in [11] moves computation or data based on runtime determined statistics. The heuristic used in the approach consists in migrating the computation to where data is whenever the computation requires a remote write. Data migration takes place when there are more than two repeated remote reads. While data ....

W. C. Hsieh, P. Wang, and W.E. Wiehl, "Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems," in Proceedings of ACM Symposium on Principles and Practice of Parallel Programming, San Diego, California, May 1993, pp.239-248.


A Framework for Space and Time Efficient Scheduling of.. - Narlikar, Blelloch (1996)   (Correct)

....If the scheduling is done at runtime, then the performance of the high level code relies heavily on the scheduling algorithm, which should have low scheduling overheads and good load balancing. Several systems providing dynamic parallelism have been implemented with efficient runtime schedulers [7, 11, 13, 18, 19, 22, 23, 29, 30, 31], resulting in good parallel performance. However, in addition to good time performance, the memory requirements of the parallel computation must be taken into consideration. In an attempt to expose a sufficient degree of parallelism to keep all processors busy, schedulers often create many more ....

W. E. Hseih, P. Wang, and W. E. Weihl. Computation migration: enhancing locality for distributed memory parallel systems. In Proceedings of the Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Francisco, California, May 1993.


Dynamic Computation Migration in Distributed Shared Memory Systems - Hsieh (1995)   (6 citations)  Self-citation (Weihl)   (Correct)

....and could be used together in one implementation. The first implementation described relies on the runtime system to implement migration, and has been implemented in the Olden system [81] The second relies on the compiler to generate code to handle migration, and was implemented in Prelude [50]. MCRL provides an interface for a compiler based implementation, but does not provide compiler support for using it. 3.1.1 Runtime System The most direct implementation of computation migration is to have the runtime system move an activation record. In other words, the runtime system must ....

....benefits of computation migration, we implemented static computation migration in the Prelude system. This chapter describes the Prelude implementation, and summarizes some of our performance results. A more complete description of our measurements and results can be found in our conference paper [50]. Our Prelude results lead to several conclusions: Computation migration, like data migration with replication, outperforms RPC. The reason is that both data migration with replication and computation migration improve the locality of repeated accesses. The performance of a software ....

[Article contains additional citation context not shown here]

W.C. Hsieh, P. Wang, and W.E. Weihl. "Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems". In Proceedings of the 4th Symposium on Principles and Practice of Parallel Programming, pages 239--248, San Diego, CA, May 1993.


Dynamic Computation Migration in DSM Systems - Hsieh, Kaashoek, Weihl (1996)   (8 citations)  Self-citation (Hsieh Weihl)   (Correct)

....system. Performance measurements demonstrate that migrating computations for some accesses is better than always migrating data, and that choosing dynamically whether to migrate data or computation is better yet for dynamic data structures with unpredictable access patterns. Computation migration [11, 15] is the partial migration of active threads. Under computation migration, a currently executing thread has some of its state migrated to remote data that it accesses. Computation migration is distinct from RPC (which was explored in systems such as Emerald [13] Technical paper. Contact ....

....used to implement computation migration. In the first approach, the compiler generates calls into the runtime system, which manages migration. This approach was used in the Olden system [15] In the second approach, the compiler manages migration more directly. We took this approach in Prelude [11]. MCRL is designed for the latter approach, but does not provide compiler support. Compiler support should not be difficult to add, although it would require some language restrictions similar to those in Olden. 3 Dynamic Computation Migration This section describes the mechanics of dynamic ....

[Article contains additional citation context not shown here]

W.C. Hsieh, P. Wang, and W.E. Weihl. Computation Migration: Enhancing Locality for Distributed-Memory Parallel Systems. In Proceedings of the 4th Symposium on Principles and Practice of Parallel Programming, pages 239--248, San Diego, CA, May 1993.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC