31 citations found. Retrieving documents...
M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Bug Isolation via Remote Program Sampling - Liblit, Aiken, Zheng, Jordan (2003)   (15 citations)  (Correct)

....assertion dense code. The individual assertions are quite small and fast (array bounds checks, testing for null, etc. but their performance impact can be significant. We wish to use random sampling to spread this cost among many users. We have applied sampling to CCured versions of several Olden [10] and SPECINT95 [23] benchmarks. All programs run to completion and we are simply measuring the overhead of performing the dynamic checks. 3.1.1 Whole Program Sampling Table 1 summarizes static aspects of the sampling transformation when applied to the entirety of each benchmark. For each ....

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Enhancing Memory Level Parallelism via Recovery-Free Value.. - Zhou, Conte (2003)   (Correct)

....hardware changes in our scheme also enable aggressive memory disambiguation to break alias (i.e. the load after store) dependencies. Such disambiguation is used for prefetching and is also recovery free. The experimental results, based on a set of SPEC2000 benchmarks [11] and Olden benchmarks [5] including both computation intensive and memory intensive benchmarks, show significant speedups resulting from breaking both true dependencies and alias dependencies between memory operations. Such speedups also scale well with the current trend in microprocessor design. The remainder of the ....

M. Carlisle, "Olden: parallelizing programs with dynamic data structures on distributed-memory machines", Ph.D. thesis, Princeton University Computer Science Department, 1996


Type Systems for Distributed Data Sharing - Liblit, Aiken, Yelick (2001)   (3 citations)  (Correct)

....under specified semantics where optimization may change program behavior in unexpected ways. One group of languages guarantees safety but has no facility for declaring private heap data. In these languages the stack is private but the entire heap must be treated as potentially shared. Java, Olden [9], and Titanium (prior to this work) 20] are in this category. For these languages, our techniques provide a basis for automatically inferring private heap data. We also believe it is important for programmers to be able to declare private data explicitly, as knowledge of what data is private is ....

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on DistributedMemory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Mondrian Memory Protection - Witchel, Cates, Asanovic (2002)   (21 citations)  (Correct)

....and their reference properties. Benchmark names prefixed with a j are Java programs. Benchmarks crafty, gcc, twolf and vpr are from SPEC 2000, and vortex is from SPEC 95. The tr suffix indicates the training input, and test suffix indicates the test input. Names prefixed o are from the Olden [6] benchmark suite. Names prefixed with m are from the Mediabench benchmark suite. Table 3 includes the number of memory references per table update. The permissions table is only updated on malloc, realloc, and free calls, and the results show a wide variation in how frequently objects are ....

M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University, 1996.


CCured: Type-Safe Retrofitting of Legacy Code - Necula, McPeak, Weimer (2002)   (8 citations)  (Correct)

....the program source are required to make the program run under the CCured restrictions. We used several test cases, some from SPECINT95 [26] compress is LZW data compression; go plays the board game Go; ijpeg compresses image les; li is a Lisp interpreter; and some from the Olden benchmark suite [6], a collection of small, compute intensive kernels: bh is an n body simulator; bisort is a sorting algorithm; em3d solves an elecromagnetism problem; health simulates Colombia s health care system; mst computes minimum spanning trees; perimeter computes perimeters of regions in images; power ....

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Dynamic Speculative Precomputation - Collins, Tullsen, Wang, Shen (2001)   (18 citations)  (Correct)

....TLB misses handled by pipelined, on chip TLB miss handler, 60 cycle latency Multithreading 8 total hardware thread contexts Dynamic SP 64 entry DLIT, 32 entry SIT, 8 entry SAT, 512 entry RIB Table 1. Details of the modeled processor the SPEC suite are compiled for a base SPEC build, and Olden [3] benchmarks are compiled with gcc O4. All simulations with a single non speculative thread are executed for 300 million total committed instructions. Because it is essential that they execute as quickly as possible, fetch preference is given to speculative threads over non speculative ones, and ....

M. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. In PhD Thesis, Princeton University Department of Computer Science, June 1996.


Speculative Precomputation: Long-range Prefetching .. - Collins, Wang.. (2001)   (34 citations)  (Correct)

....that has been enhanced to work with Itanium binaries. SMTSIM is a cycle accurate, executiondriven simulator of SMT processors. Benchmarks for this study include both integer and floating point benchmarks selected from the CPU2000 suite [16] and pointer intensive benchmarks from the Olden suite [2]. Benchmarks are selected because their performance is limited by poor cache performance or because they experience high data cache miss rates. The benchmarks and simulation setup are summarized in Table 2. Unless otherwise noted, all benchmarks are simulated for 100 million retired instructions ....

M. Carlisle. Olden: Parallelizing programs with dynamic data structures on distributed-memory machines. In PhD Thesis, Princeton University Department of Computer Science, June 1996.


REAPAR User Manual and Reference: Automatic Parallelization of.. - Hänßgen (1998)   (Correct)

....For a closer discussion, please refer to the PhD thesis [H#n98] Makefile Runs reapar on all the benchmarks with sample problems in the 1 minute range. barnes.c Galaxy simulation, based on the original source code by J. Barnes [BH86] bitonic.c Bitonic sorting, based on Olden code [Car96] eigenvalue.c Computation of Eigenvalues of symmetrical tridiagonal matrices, based on code by S. Chakrabarti et al. CRY94] fractal.c Fractal computation by recursive heuristics (own code) heat.c Heat dioeusion simulation, based on Cilk code [Sup97] knapsack.c Solution to the 0 1 knapsack ....

....results of the REAPAR research without going into details. A closer discussion can be found in the corresponding PhD thesis [H#n98] 8. 1 Speedups in comparison to related systems REAPAR yields the following speedups on 4 processor benchmark runs, compared to the related Cilk [Sup97] and Olden [Car96] systems: Bench Barnes Bitonic Eigen Fractal Heat Knap Magic Power Queens mark Hut Sort value sack Olden 3,0 2,3 3,8 Cilk 3,1 3,9 3,6 3,9 REAPAR 3,4 3,4 4,0 3,8 4,2 3,1 3,7 3,6 3,9 Benchmarks marked as are part of the validation set which was only examined after the work on ....

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Automatically Partitioning Threads Based on Remote Paths - Tang, Gao (1998)   (Correct)

....we first introduce the benchmarks and the machine we used in our experiment. Then we report the timing comparison of two DDG building algorithms. At last, the impact of the remote height based thread scheduling algorithm is discussed. 6. 1 Benchmarks The benchmarks we used are from the Olden [7, 25] benchmarks: treeadd, em3d, perimeter, power and mst. All are irregular applications and pointer based data structures are pervasively used. We have rewritten them in EARTH C and run the threaded code on the EARTH MANNA emulator, which has a maximum of 20 processor nodes. Treeadd sums up all ....

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Communication Optimizations for Parallel C Programs - Yingchun Zhu And (1998)   (11 citations)  (Correct)

....paper we are also concerned about the placement of the pointer dereferences, pipelining remote operations, and blocking of associated remote operations. Another approach for compilation of dynamic data structure based applications on distributed memory machines was proposed by Carlisle and Rogers [5, 4]. They propose the Olden runtime system, that uses a trade off between software caching of remote data and computation migration, depending on the data distribution and the amount of communication required. This decision is guided, in part, by programmer specified path affinity hints for recursive ....

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Locality Analysis for Parallel C Programs - Zhu, Hendren (1997)   (3 citations)  (Correct)

....of the five benchmarks. In the last two cases the localized version does give an improvement, but does not compete with the hand coded advanced version. We analyze these results in detail individually for each benchmark below. Power: This benchmark implements the power system optimization problem [9]. It uses a four level tree structure with different branching widths at each level. Our locality analysis achieves 3 4 improvement over the simple version in execution time, which is quite close to the advanced version (5 ) However, it is able to achieve an 80 reduction in the number of ....

....achieves 93 reduction in the number of remote calls over both the simple and localized version, by using basic functions. This factor enables it to achieve slightly better speedup than the localized version. Perimeter: This benchmark computes the perimeter of a quad tree encoded raster image [9]. The unit square image is recursively divided into four quadrants until each one has only one point. The tree is then traversed bottom up to compute the perimeter of of each quadrant. The localized version achieves 15 27 speedup and comes very close to the advanced version for the 8 and 16 ....

[Article contains additional citation context not shown here]

M. C. Carlisle, Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Type Systems for Distributed Data Structures - Liblit, Aiken (2000)   (10 citations)  (Correct)

....identi ed ten years ago [2] and many more have appeared since. We highlight some representative examples of approaches previously taken to the local global pointer problem. Olden adds parallelism to C, focusing on dynamic structures augmented with compiler directed software caching and migration [8, 9, 24]. All Olden pointers are global, so it is never possible to see an invalid local pointer from another processor s address space. However, pointer operations require four extra instructions to test the processor ID and decode the machine address. Data ow analyses can eliminate some redundant ....

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Hot Pages: Software Caching for Raw Microprocessors - Moritz, Frank, Lee.. (1999)   (10 citations)  (Correct)

....memory. Although Shasta leveraged compile time information to some degree, its transformations were based on a binary modification tool called ATOM [16] on a program intermediate format where much of the high level information already has been altered. Several other systems such as Olden [5], Split C [6] use compiler generated checks to support a global address space in the context of a parallel programming model. These systems solve a different problem, namely the sharing of the global address space in a parallel execution environment. The Hot Pages system in contrast, does the ....

M. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines, June 1996.


Optimizing Parallel Programs With Dynamic Data Structures - Zhu (2000)   (Correct)

....voronoi and em3d. In the other three cases the localized version does give an improvement, but does not compete with the hand coded advanced version. We analyze these results in detail individually for each benchmark below. Power: This benchmark implements the power system optimization problem [Car96] It uses a four level tree structure with different branching widths at each level. For powerI version, our locality analysis achieves 4 improvement over the simple version in execution time, which is quite close to the advanced version (5.5 ) However, it is able to achieve an 89 reduction ....

....the remote accesses in the benchmark. Nevertheless, as power is a computation intensive benchmark, we still achieve 1.5 improvement on the overall performance for powerII. Tsp: This benchmark solves the traveling salesman problem using a divide andconquer approach based on close point algorithm [Car96] This algorithm first searches a suboptimal tour for each subtree(region) and then merges subtours into bigger ones. 53 Computebranchspec(branch local br) temp237 = thetaI ( br) X ) temp236 = temp237 ( br) R ) temp235 = thetaR temp236) temp233 = br) alpha ....

[Article contains additional citation context not shown here]

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Locality Analysis for Parallel C Programs - Zhu, Hendren (1997)   (3 citations)  (Correct)

....of the five benchmarks. In the last two cases the localized version does give an improvement, but does not compete with the hand coded advanced version. We analyze these results in detail individually for each benchmark below. Power: This benchmark implements the power system optimization problem [2]. It uses a fourlevel tree structure with different branching widths at each level. Locality Analysis for Parallel C Programs 24 Benchmark Simple Localized Advanced Localized Advanced EARTH C EARTH C EARTH C vs. Simple vs. Simple (# remote) # remote) # remote) impr) impr) power 2294179 ....

....achieves 93 reduction in the number of remote calls over both the simple and localized version, by using basic functions. This factor enables it to achieve slightly better speedup than the localized version. Perimeter: This benchmark computes the perimeter of a quad tree encoded raster image [2]. The unit square image is recursively divided into four quadrants until each one has only one point. The tree is then traversed bottom up to compute the perimeter of of each quadrant. The localized version achieves 16 27 speedup and comes very close to the advanced version for the 8 and 16 ....

[Article contains additional citation context not shown here]

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Compiling For Multithreaded Architectures - Tang (1999)   (1 citation)  (Correct)

....efficiently in EM C. Like other systems, remote access should explicitly be spelled out by the programmer and no automatic thread generation is done by the EM C compiler. 10.2. 5 Olden Olden is a compiler and runtime system supporting parallel C programming on distributed memory machines [101, 22, 21]. The design philosophy of Olden is very similar to that of EARTH C. In both cases they were designed to minimize the burden on the programmer, and to have the compiler automatically handle communication and synchronization. In Olden, the programmer uses a future notation to expose parallelism, ....

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Type Systems for Distributed Data Structures - Liblit Cs Berkeley (2000)   (10 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Westley Weimer - University Of California   (Correct)

No context found.

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


Type Systems for Distributed Data Structures - Ben Liblit Liblit (2000)   (10 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Data Forwarding through In-Memory Precomputation Threads - Hassanein, Fortes, Eigenmann   (Correct)

No context found.

M. Carlisle. "Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines", PhD. Thesis, Princeton University Departement of Computer Science, June 1996.


Distributed Program Sampling - Liblit, Aiken, Zheng (2003)   (2 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Department of Computer Science, Princeton University, June 1996.


Taming C Pointers - Necula, McPeak, Weimer (2002)   (Correct)

No context found.

Martin C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


CCured: Type-Safe Retrofitting of Legacy Code - Necula, McPeak, Weimer (2002)   (63 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


CCured in the Real World - Jeremy Condit Matthew (2003)   (13 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.


CCured in the Real World - Condit, Harren, McPeak, Necula.. (2003)   (13 citations)  (Correct)

No context found.

M. C. Carlisle. Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines. PhD thesis, Princeton University Department of Computer Science, June 1996.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC