See this document in CiteSeerX!

Data Locality Optimization of Shared Memory Programs on NUMA Architectures Using an Integrated Tool Environment  (Make Corrections)  
Jie Tao



  Home/Search   Context   Related

 
View or download:
wwwbode.in.tum.de/~tao/pap...book.ps.gz
Cached:  PS.gz  PS  PDF   Image  Update  Help

From:  tum.edu/~tao/papers/indexA (more)
(Enter author homepages)

Rate this article: (best)
  Comment on this article  
(Enter summary)

Abstract: Due to their excellent price-performance ratio, clusters built from commodity nodes have become broadly adopted and increasingly popular as platforms for parallel processing. Among them, the clusters of standard PCs interconnected with high-speed system area networks (SANs) are especially attractive and have been widely established. At the same time, the developments in interconnection technologies also formed the basis for the rise of Non-Uniform Memory Access (NUMA) architectures, i.e.... (Update)

Active bibliography (related documents):   More   All
2.4:   ARS: an adaptive runtime system for locality optimization - Tao, Schulz, Karl (2003)   (Correct)
2.2:   Interactive Locality Optimization on NUMA Architectures - Mu, Tao, Schulz, McKee (2003)   (Correct)
1.9:   A Framework for Monitoring Shared Memory Applications - Jie Tao And   (Correct)

Similar documents based on text:   More   All
0.4:   Methodology, Tools & Case Studies for Ontology based Knowledge.. - Sure (2003)   (Correct)
0.4:   Social Relationship Management in Internet-based Communication and .. - Galla (2004)   (Correct)
0.3:   Implementation and Evaluation of Methods for Solving.. - Bemmerl, Kremenek..   (Correct)

BibTeX entry:   (Update)

@misc{ tao-data,
  author = "Jie Tao",
  title = "Data Locality Optimization of Shared Memory Programs on NUMA Architectures
    Using an Integrated Tool Environment",
  url = "citeseer.ist.psu.edu/617485.html" }
Citations (may not include all citations):
912   MPI: A Message-Passing Interface Standard - Interface, MPIF - 1995
478   The Stanford DASH Multiprocessor (context) - Lenoski, Laudon et al. - 1992
353   The SPLASH-2 Programs: Characterization and Methodological C.. - Woo, Ohara et al. - 1995
326   TreadMarks: Shared Memory Computing on Networks of Workstati.. - Amza, Cox et al. - 1995
267   Multi-Level Adaptive Solutions to Boundary-Value Problems (context) - Brandt - 1977
262   Visualizing the Performance of Parallel Programs (context) - Heath, Etheridge - 1991
237   Global Optimizations for Parallelism and Locality on Scalabl.. - Anderson, Lam - 1993
222   The SGI Origin: A ccNUMA Highly Scalable Server (context) - Laudon, Lenoski - 1997
212   The MIT Alewife Machine: Architecture and Performance - Agarwal, Bianchini et al. - 1995
207   CORBA - Fundamentals and Programming (context) - Siegel - 1996
199   The Paradyn Parallel Performance Measurement Tools - Miller, Cargille et al. - 1995
182   A Comparison of Sorting Algorithms for the Connection Machin.. - Blelloch, Leiserson et al. - 1991
180   PVM: Parallel Virtual Machine --- A User's Guide and Tutoria.. (context) - Geist, Beguelin et al. - 1994
178   The Connection Machine CM-5 Technical Summary (context) - Corporation - 1991
166   The Wisconsin Wind Tunnel: Virtual Prototyping of Parallel C.. - Reinhardt, Hill et al. - 1993
156   An Evaluation of Directory Schemes for Cache Coherence - Agarwal, Simoni et al. - 1988
150   PROTEUS: A High-performance Parallel-Architecture Simulator - Brewer, Dellarocas et al. - 1991
126   Scalable Performance Analysis: The Pablo Performance Analysi.. - Reed, Aydt et al. - 1993
110   Portable Programs for Parallel Processors (context) - Boyle, Butler et al. - 1987
107   The DASH Prototype: Implementation and Performance (context) - Lenoski, Laudon et al. - 1992
106   Microprocessor User's Manual (context) - Inc - 1995
94   The DASH Prototype: Logic Overhead and Performance - Lenoski, Laudon et al. - 1993
82   A Low Overhead Coherence Solution for Multiprocessor with Pr.. (context) - Papamarcos, Patel - 1984
79   Intel Architecture Software Developer's Manual for the Penti.. (context) - Corporation - 1998
69   CoCheck: Checkpointing and Process Migration for MPI - Stellner - 1996
66   IEEE Standard for the Scalable Coherent Interface (context) - Society - 1993
66   Implementing a Cache Consistency Protocol (context) - Katz, Eggers et al. - 1985
64   Exploiting Process Lifetime Distribution for Dynamic Load Ba.. (context) - Harchol-Balter, Downey - 1996
61   FFTs in External or Hierarchical Memory - Bailey - 1990
59   IPS-2: The Second Generation of a Parallel Program Measureme.. - Miller, Clark et al. - 1990
57   Lazy Release Consistency for Distributed Shared Memory - Keleher - 1995
55   Software Caching and Computation Migration in Olden - Carlisle, Rogers - 1995
53   An Integrated Compilation and Performance Analysis Environme.. - Adve, Mellor-Crummey et al. - 1995
45   mp Scalable Shared Memory Multiprocessor (context) - Nowatzyk, Aybay et al. - 1994
43   Thread Migration and its Applications in Distributed Shared .. - Itzkovitz, Schuster et al. - 1998
41   VM--Based Shared Memory on Low--Latency (context) - Kontothanassis, Hunt et al. - 1996
40   A Users' Guide to PICL: A Portable Instrumented Communicatio.. (context) - Geist, Heath et al. - 1990
37   Run-Time Monitoring of Real-Time Systems - Chodrow, Jahanian et al. - 1991
37   Tolerating Memory Latency through Software-Controlled Pre-Ex.. - Luk - 2001
37   A Hardware-Driven Profiling Scheme for Identifying Program H.. - Merten, Trick et al. - 1999
35   Virtual Reality and Parallel Systems Performance Analysis - Reed, Shields et al. - 1995
33   Wisconsin Wind Tunnel II: A Fast and Portable Parallel Archi.. - Mukherjee, Reinhardt et al. - 2000
32   Using Complete Machine Simulation to Understand Computer Sys.. (context) - Herrod - 1998
32   Managing Pages in Shared Virtual Memory Systems: Getting the.. (context) - Granston, Wijshoff - 1993
32   PCI-SCI Cluster Adapter Specification (context) - Solutions
32   RSIM: An Execution-Driven Simulator for ILP-Based Shared-Mem.. - Pai, Ranganathan et al. - 1997
31   Cray Research Massively Parallel Processor System CRAY TD (context) - Cray, Parallel et al. - 1993
31   A Compiler-Assisted Cache Coherence Solution for Multiproces.. (context) - Veidenbaum - 1986
30   MINT Tutorial and User Manual - Veenstra, Fowler - 1993
29   The Augmint Multiprocessor Simulation Toolkit for Intel x86 .. - Nguyen, Michael et al. - 1996
27   Augmint: A Multiprocessor Simulation Environment for Intel x.. - Nguyen, Michael et al. - 1996
27   Efficient Memory Simulation in SimICS - Magnusson, Werner - 1995
27   Runtime and Language Support for Compiling Adaptive Irregula.. - Hwang, Moon et al. - 1995
25   Improving CC-NUMA Performance Using Instruction-Based Predic.. (context) - Kaxiras, Goodman - 1999
25   The Hector Multiprocessor (context) - Vranesic, Stumm et al. - 1991
25   The Dragon Computer System: An Early Overview (context) - McCreight - 1984
23   Monitoring and Debugging Distributed Realtime Programs - Dodd, Ravishankar - 1992
23   SCI: Scalable Coherent Interface: Architecture and Software .. (context) - Hellwagner, Reinefeld - 1999
23   TRAPPER: A Graphical Programming Environment for Industrial .. - Scheidler, Schaefers - 1993
22   Simulation Analysis Data Sharing in Shared Memory Multiproce.. (context) - Eggers - 1989
22   The Design of a Parallel Graphics Interface - Igehy, Stoll et al. - 1998
21   Message-Driven Relaxed Consistency in a Software Distributed.. - Koch, Fowler et al. - 1994
21   Integrating Performance Monitoring and Communication in Para.. - Martonosi, Ofelt et al. - 1996
21   HeNCE: A Heterogeneous Network Computing Environment - Beguelin, Dongarra et al. - 1993
20   Multiprocessor Cache Design Considerations (context) - Lee, Yew et al. - 1987
20   An Adaptive Approach to Data Placement - Lowenthal, Andrews - 1996
19   Performance Visualization of Parallel Programs -- The PARvis.. - Nagel, Arnold - 1994
19   Run-Time Spatial Locality Detection and Optimization - Johnson, Merten et al. - 1997
19   Performance Prediction of Large Parallel Applications Using .. - Bagrodia, Deeljman et al. - 1999
18   System Support for Automated Profiling and Optimization (context) - Zhang, Wang et al. - 1997
17   Memory Forwarding: Enabling Aggressive Layout Optimizations .. - Luk, Mowry - 1999
17   Tuning Memory Performance in Sequential and Parallel Program.. - Martonosi, Gupta et al. - 1995
17   A PCI-SCI Bridge for Building a PC-cluster with Distributed .. - Acher, Hellwagner et al. - 1996
15   Thread Migration and Communication Minimization in DSM Syste.. - Thitikamol, Keleher - 1999
15   Chitra: Visual Analysis of Parallel and Distributed Programs.. (context) - Abrams, Doraswamy et al. - 1992
14   Performance Monitoring in a MyrinetConnected SHRIMP Cluster - Liao, Martonosi et al. - 1998
14   OS Support for Improving Data Locality on CC-NUMA Compute Se.. - Verghese, Devine et al. - 1996
14   Run-time Visualization of Program Data - Tuchman, Jablonowski et al. - 1991
14   DETOP - An Interactive Debugger for PowerPC Based Multicompu.. - Oberhuber, Wismller - 1995
14   Supporting Shared Memory and Message Passing on Clusters of .. - Karl, Leberecht et al. - 1999
13   The Impact of Exploiting Instruction-Level Parallelism on Sh.. - Pai, Ranganathan et al. - 1999
13   IPS: An Interactive and Automatic Performance Measurement To.. (context) - Miller, Yang - 1987
13   Visualizing the Memory Access Behavior of Shared Memory Appl.. - Tao, Karl et al. - 2001
12   The SHRIMP Performance Monitor: Design and Applications - Martonosi, Clark et al. - 1996
12   User-Level Dynamic Page Migration for Multiprogrammed Shared.. - Nikolopoulos, Papatheodorou et al. - 2000
11   Automatic Partitioning of Data and Computations on Scalable .. (context) - Tandri, Abdelrahman - 1997
11   Non-intrusive Deep Tracing of SCI Interconnect Traffic (context) - Manzke, Coghlan - 1999
11   Automatic IterationData Distribution Method based Acces Desc.. - Zapata, Iteration et al. - 1999
10   Using Simulation to Understand the Data Layout of Programs - Tao, Karl et al. - 2001
9   Flexible Use Memory ReplicationMigration CacheCoherent DSM M.. (context) - Verghese, Gupta et al. - 1998
9   Improving Fine-Grained Irregular SharedMemory Benchmarks by .. - Hu, Cox et al.
9   Load Balancing and Data Locality in Adaptive Hierarchical N-.. (context) - Singh, Holt et al. - 1995
8   SCI-VM: A Flexible Base for Transparent Shared Memory Progra.. (context) - Schulz - 1999
8   Dynamic Computation Migration in DSM System - Hsieh, Kaashoek et al. - 1996
8   Convex Exemplar Architecture (context) - Corporation - 1994
8   OCM -- A Monitoring System for Interoperable Tools - Wismller, Trinitis et al. - 1998
8   The Stanford FLASH Multiprocessor (context) - Heinrich, Ofelt et al. - 1994
8   OMIS-compliant Monitoring System for MPI Applications (context) - Bubak, Funika et al. - 1999
7   Design and Implementation of the NUMAchine Multiprocessor - Grbic, Brown et al. - 1998
7   A Universal Interface for Monitoring Systems (context) - Ludwig, Wismller et al. - 1997
7   A Tool for Optimizing Programs on Massively Parallel Compute.. - Hansen - 1994
7   Hardware-Based Profiling: An Effective Technique for Profile.. - Conte, Patel et al. - 1996
7   An Approach to Immersive Performance Visualization of Parall.. (context) - DeRose, Pantano - 1999
7   Hybrid-DSM: An Efficient Alternative to Pure Software DSM Sy.. - Karl, Schulz - 2000
7   Optimizing Data Locality for SCI-based PC-clusters with the .. - Karl, Leberecht et al. - 1999
7   Microsoft Platform Software Development Kit (context) - Cooperation - 1997
7   Testing and Debugging Parallel and Distributed Programs with.. (context) - Frey, Oberhuber - 1997
6   True Shared Memory Programming on SCI-based Clusters (context) - Schulz - 1999
6   Efficient Deployment of Shared Memory Models on Clusters of .. (context) - Schulz - 2000
6   Using Remote Access Histories for Thread Scheduling in Distr.. - Schuster, Shalev - 1997
6   Tutorial and Reference Guide (context) - Pase, Class et al. - 1998
6   User's Manual (context) - Heinrich - 1993
6   Monitoring Shared Virtual Memory Performance on a Myrinet-ba.. - Liao, Jiang et al. - 1998
5   Performance Debugging Shared Memory Parallel Programs Using .. (context) - Rajamony, Cox - 1997
5   Impact of CC-NUMA Memory Management Policies on the Applicat.. (context) - Bhuyan, Iyer et al. - 2000
5   Fast Communication Mechanisms -- Coupling Hardware Distribut.. - Hellwagner, Karl et al. - 1997
5   Switch Cache: A Framework for Improving the Remote Memory Ac.. - Iyer, Bhuyan - 1999
5   PCI 9060 PCI Bus Master Interface Chip for Adapters and Embe.. (context) - Incorporation, Avenue et al. - 1995
5   Locality Analysis for Parallel C Programs - Zhu, Hendren - 1999
4   Design and Implementation Aspects for the SMiLE Hardware Mon.. (context) - Hockauf, Jeitner et al. - 2000
4   Considerations on Dynamically Allocated Data Structure Layou.. - Truong - 1998
4   Efficient Coherency and Synchronization Management in SCI-ba.. (context) - Schulz - 2000
4   document number 004--2229--01 edition (context) - Review, OpenMP et al. - 1998
4   A Simulation Study of Snoopy Cache Coherence Protocols (context) - Tomasevic, Milutinovic - 1992
4   Multilayer Online-Monitoring for Hybrid DSM Systems on top o.. - Karl, Schulz et al. - 2000
3   A Tool Environment for Efficient Execution of Shared Memory .. - Tao, Karl - 2001
3   Implementing a directory-based Cache Consistency Protocol (context) - Simoni - 1990
3   Active Threads: Enabling FineGrained Parallelism in Object-O.. (context) - Weissman, Gomes et al. - 1998
3   Performance Analysis and Visualization of Parallel Systems U.. - Bosch, Stolte et al. - 2000
3   MICA: A Memory and Interconnect Simulation Environment for C.. (context) - Hsiao, King - 2000
3   SCI Monitoring Hardware and Software: Supporting Performance.. (context) - Karl, Leberecht et al. - 1999
3   Visualization of Parallel Program Execution - Braun, Wismller - 1995
3   High Performance Fortran (context) - HPCC, HPF - 1999
3   Multi-paradigm Software Infrastructure for SCI-based Cluster.. (context) - Schulz, Tao et al. - 2002
3   Optimizing Explicitly Parallel Programs (context) - Krishnamurthy - 1994
2   Understanding the Behavior of Shared Memory Applications Usi.. - Tao, Karl et al. - 2000
2   Line OCM-Based Tool Support for Parallel Applications (context) - Bubak, Funika et al. - 2001
2   Thread Migration with Active Threads - Holtkamp - 1997
2   Design and Analysis of Static Memory Management Policies for.. - Iyer, Wang et al. - 1998
2   Stanford DASH Multiprocessor: the Hardware and Software (context) - Gupta - 1992
2   An Interactive Graphical Modeling Tool for Performance and P.. (context) - Mok, Funka-Lea - 1993
2   Visualization and Performance Prediction of Multithreaded So.. (context) - Broberg, Lundberg et al. - 1999
2   Amendment2: Threads Extension[C Language (context) - on, System et al. - 1995
2   Improving Data Locality Using Dynamic Page Migration based o.. - Tao, Schulz et al. - 2002
1   Surfoard -- A Hardware Performance Monitor for SHRIMP (context) - Karlin, Clark et al. - 1999
1   Enforcing Deterministic Execution of Parallel Programs - Deb.. - Karl, Leberecht et al. - 1998
1   Visualization and Control of Gigabit Networks (context) - Parulkar, Schmidt et al. - 1997
1   Performance Visualization in the GRADE Parallel Programming .. (context) - Kacsuk - 2000
1   Distributed Simulation of a Distributed Shared Memory System (context) - Pedersen - 1998
1   Experience with Development of Performance Monitoring Tools .. - Bubak, Funika et al. - 1999
1   Design Choices in the SHRIMP System: An Emprical Study (context) - Blumrich, Alpert et al. - 1998
1   A Performance Monitoring Application for Distributed Interac.. (context) - Cavitt, Overstreet et al. - 1997
1   Litchfield Park (context) - Chase, Amador et al. - 1989
1   Myrinet: A Gigabit-Second Local Area Network (context) - Boden, Cohen et al. - 1995
1   Shared Memory Programming on NUMA-based Clusters Using a Gen.. (context) - Schulz - 2001
1   Run-time Monitoring of Concurrent Programs on the Cedar Mult.. (context) - Sharma, Malony et al. - 1990
1   A Runtime Monitoring Framework for the TAU Profiling System (context) - Sheehan, Malony et al. - 1999
1   Interoperable Run-Time Tools for Distributed Systems -- A Ca.. (context) - Wismller, Ludwig - 2000
1   Interoperability Support in the Distributed Monitoring Syste.. (context) - Wismller - 1999
1   PROMS: a PRO-active Monitoring System for SS7 Networks (context) - Hwang, Yu - 2000
1   Line Monitoring Support in PVM and MPI (context) - Wismller - 1998
1   Monitoring PVM Programs Using the DAMS Approach (context) - Cunha, Duarte - 1998
1   TUM PCISCI Adapter (context) - Karl, TUM et al. - 1999
1   A Low-level Software Infrastructure for the SMiLE Monitoring.. - Tao - 2001
1   A Comparative Evaluation of Cache Coherence Schemes Based on.. - Petersen, Li - 1992
1   Analysis and Optimization for Shared Space Programs (context) - Krishnamurthy, Yelick - 1996
http://galeb.etf.bg.ac.yu/davor/limes/
http://www.support.compaq.com/
http://www.xilinx.com/
http://developer.intel.com/design/
http://www.vcc.com/Hot2sys
http://java.sun.com/docs/books/tutorial/

Documents on the same site (http://wwwbode.cs.tum.edu/~tao/papers/index-A.html):   More
Using Simulation to Understand the Data Layout of Programs - Tao, Karl, Schulz (2001)   (Correct)
A Tool Environment for Efficient Execution of Shared Memory.. - Tao, Karl (2001)   (Correct)
Memory Access Behavior Analysis of NUMA-based Shared Memory.. - Tao, Karl, Schulz (2001)   (Correct)

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC