29 citations found. Retrieving documents...
M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. In Supercomputing, Portland, OR, November 1999.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Performance Evaluation of Two Emerging Media Processors: - Chatterji, Narayanan.. (2003)   (1 citation)  (Correct)

....high performance. Future plans include examining a broader scope of application codes, as well as validating our results on real hardware as it becomes available. We also plan to evaluate more complex data parallel systems such as the Streaming Supercomputer [8] and the DIVA architecture [ 10]. Our long term goal is to evaluate these technologies as building blocks for ture high performance multiprocessor systems. Acknowledgments The authors would like to thank Parry Husbands for his insight into the QRD algorithm, and Arin Fishkin for her artistic contribution. This work was ....

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brock- man, V. Freeh, J. Park, and J. Shin. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. Proc. (2fSC99, 1999.


Memory-Intensive Benchmarks: IRAM vs. Cache-Based.. - Gaeke, Husbands, Li.. (2002)   (Correct)

....across machines 0.1 1 10 100 1000 Transitive GUPS SPMV Hist Mesh MOPS Watt R10K P III P4 Sparc EV6 Figure 9. Power efficiency 10. Related Work The VIRAM processor is one example of a system that uses mixed logic and DRAM [14,15] Other examples include the DIVA project [10], the HTMT project [21] and the Mitsubishi M32R processor [17] The DIVA project is directly addressing the use of this technology in a multiprocessor, so their focus has been on scalability [10] The HTMT processor uses fine grained multi threading and forward looking hardware technology. The ....

.... example of a system that uses mixed logic and DRAM [14,15] Other examples include the DIVA project [10] the HTMT project [21] and the Mitsubishi M32R processor [17] The DIVA project is directly addressing the use of this technology in a multiprocessor, so their focus has been on scalability [10]. The HTMT processor uses fine grained multi threading and forward looking hardware technology. The support for control parallelism in HTMT might be an advantage for the Histogram and Mesh benchmarks, although it requires more complex hardware support. The Imagine processor does not use embedded ....

M. Hall, et al. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. Proc. SC99, 1999.


Performance Modeling and Interpretive Simulation of PIM.. - Baker, Prasanna (2002)   (Correct)

....architecture specific simulator, requiring several hours to run, is not suitable for iterative development or experiments on novel ideas. We provide a simulator which will allow faster development cycles and a better understanding of how an application will port to other PIM architectures [4, 7]. For more details and further results, see [2] 1 Supported by the US DARPA Data Intensive Systems Program under contract F33615 99 1 1483 monitored by Wright Patterson Airforce Base and in part by an equipment grant from Intel Corporation. The PIM Simulator is available for download at ....

....Results 3.1 Conjugate Gradient Results Figure 1 shows the overall speedup of the biConjugate Gradient stressmark with respect to the number of active PIM elements. It compares results produced by our tool using a DIVA parameterized architecture to the cycle accurate simu lation results in [4]. Time is normalized to a simulator standard. The label of our results, Overlap 0.8 , denotes that 80 of the data transfer time is hidden underneath the computation time, via prefetching or other latency hiding techniques. The concept of overlap is discussed later in this paper. BiConjugate ....

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture. In $C99.


Gilgamesh: A Multithreaded Processor-In-Memory Architecture.. - Sterling, Zima (2002)   (4 citations)  (Correct)

....expose substantially greater memory bandwidth while imposing significantly lower latency and requiring less power consumption than conventional systems. A number of projects began exploring this technology over the past few years, including IRAM, HTMT, DIVA, FlexRAM, Blue Gene BG C, and Pim Lite [29, 7, 17, 22, 2, 8]. This paper is an early presentation of the hardware and software strategy being developed for a new PIM based high end computer as part of the Gilgamesh Project conducted at the NASA Jet Propulsion Laboratory and the California Institute of Technology. Gilgamesh (billions of Logic Gate ....

....to physical address translation is included as an intrinsic function of every node and works with a distributed directory table as well as local cached address mappings. Thus a full virtual address space shared across all MIND chips is supported. It is noted that the DIVA project at USC ISI [17, 18] is also developing PIM architecture concepts in support of distributed shared memory. The MIND architecture employs an e#cient light weight protocol, referred to as parcels, in support of message driven computation among MIND chips and their nodes. A parcel is a variable length communication ....

[Article contains additional citation context not shown here]

M.Hall,J.Koller,P.Diniz,J.Chame,J.Draper, J.LaCoss, J.Granacki, J.Brockman, A.Srivastava, W.Athas, V.Freeh, J.Shin, and J.Park. Mapping Irregular Applications to DIVA, a PIM-Based Data Intensive Architecture. Proceedings SC'99, November 1999.


Automatically Mapping Code in an Intelligent Memory.. - Lee, Solihin, Torrellas (2001)   (2 citations)  (Correct)

....memory on a single chip can potentially deliver high performance by enabling low latency and high bandwidth communication between processor and memory. This type of architecture, which is popularly known as intelligent memory or processor in memory, has been recently proposed for many systems [8, 11, 12, 13, 16, 19, 22, 25]. Some proposals use this architecture for the main processing unit in the system. Examples of such systems are IRAM [13] Shamrock [12] Raw [25] and Smart Memories [16] among others. Other proposals, instead, use this architecture for the memory system, replacing plain memory chips. In this ....

....instead, use this architecture for the memory system, replacing plain memory chips. In this case, intelligent memory chips act as co processors in memory that execute code when signaled by the host (main) processor. Examples of proposed systems using this approach are Active Pages [19] DIVA [8], and FlexRAM [11] In this second class of systems, we have a heterogenous mix of processors: host and memory processors. A host processor is more powerful, is backed up by a deep cache hierarchy, and has a higher memory latency. A memory processor is typically less powerful, has a lower memory ....

[Article contains additional citation context not shown here]

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park. Mapping Irregular Applications to DIVA, a PIM-Based Data-Intensive Architecture. In Supercomputing


Memory-Side Prefetching for Linked Data Structures - Hughes, Adve (2001)   (5 citations)  (Correct)

....necessitating best case conditions for each. The best case for memory side prefetching would integrate the prefetch engine with memory and address translation hardware. There have been significant recent advances in processor in memory (PIM) systems, which integrate the processor with memory [12, 15, 20, 22, 24, 37]. For the large class of applications where a single PIM chip does not provide sufficient memory, systems based on multiple PIM chips have been proposed (e.g. IBM s Blue Gene [15] and Execube [22] Wallach predicts high performance systems of 2009 will be built solely from multiple PIM chips ....

.... Since the focus of this work is not to suggest an optimal hardware organization for a PIM, for simplicity, we use a common model of a multiprocessor node, but place all of the components on the same chip (Figure 2(b) Other, more aggressive PIM architectures have been proposed, including [10, 11, 12, 20, 24]. These systems exploit data parallelism to take advantage of the high bandwidth available on PIMs. However, they do little to further decrease memory latencies; therefore, they are unlikely to enable faster LDS traversals than our base architecture. The novel feature of our memory side ....

[Article contains additional citation context not shown here]

M. Hall et al. Mapping Irregular Applications to DIVA, a PIM-based Data-Intesive Architecture. In Proc. of Supercomputing'99, 1999.


Cluster Performance and the Implications for.. - Lee, DeMatteis.. (2000)   (1 citation)  (Correct)

....hardware designers have had use increasingly larger caches and to employ numerous techniques to overlap operations to hide latency, such as speculative execution, prefetching, and hardware multithreading. This has also motivated the research in Processing In Memory (PIM) architectures [9] 0 200 400 600 800 1000 1200 1400 0.0001 0.001 0.01 0.1 1 10 100 1000 count bandwidth (Mbits s) 100 log sized bins 0 500 1000 1500 2000 2500 0.1 1 10 100 1000 10000 count latency (ms) 100 log sized bins Figure 5. Globus testbed bandwidth and latency distributions. where much ....

M. Hall et al. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. Supercomputing `99, 1999.


Automatic Code Mapping on an Intelligent Memory Architecture - Solihin, Lee, Torrellas (2001)   (Correct)

....on a single chip can potentially deliver high performance by enabling low latency and high bandwidth communication between processor and memory. This type of architecture, which is popularly known as intelligent memory or processing in memory, has been recently proposed in several research systems [8, 12, 13, 14, 15, 18, 23, 26, 28] and used in a few commercial parts [20, 22] Some proposals use this architecture for the main processing unit in the system. Examples of such systems are EXECUBE [13] and successors [14] IRAM [15] Raw [28] and Smart Memories [18] among others. Other proposals, instead, use this architecture ....

....for the memory system, to replace plain memory chips. In this case, intelligent memory chips act as co processors in memory that execute code when signaled by the host (main) processor. Examples of proposed systems that follow this approach are Active Pages [23] FlexRAM [12] and DIVA [8]. In this second class of systems, we have a heterogeneous mix of processors: host and memory processors. A host processor is a state of the art high end processor. It is backed up by a deep cache hierarchy and suffers a high latency to access memory. A memory processor is typically less ....

[Article contains additional citation context not shown here]

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park. Mapping Irregular Applications to DIVA, a PIM-Based Data-Intensive Architecture. In Supercomputing


Macroservers: An Execution Model for DRAM Processor-In-Memory.. - Zima, al. (2000)   (Correct)

....in Memory or PIM technology and architecture has emerged as one of the most important domains of parallel computer architecture research and development. It is being pursued as a means of accelerating conventional systems for array processing [50] and for manipulating irregular data structures [25]. It is being considered as a basis for scalable spaceborne computing [54] as smart memory to manage systems resources in a hybrid technology multithreaded architecture for ultra scale computing [55] and most recently as the means for achieving Peta ops performance [33] PIM exploits recent ....

M.Hall,J.Koller,P.Diniz,J.Chame,J.Draper, J.LaCoss, J.Granacki, J.Brockman, A.Srivastava, W.Athas, V.Freeh, J.Shin, and J.Park. Mapping Irregular Applications to DIVA,aPIM-BasedDataIntensiveArchitecture. Proceedings SC'99,November 1999.


Providing Hardware DSM Performance at Software DSM Cost - Heinrich, Speight (2000)   (Correct)

....have improved memory bandwidth, but this does nothing to address memory latency or reduce the number of cache misses incurred by the processor. One approach to reducing the gap between processor and memory performance is to move processing into the memory system by using active memories [5,11,12,30,34,35,37]. Schemes vary, but either parts of a program that have poor cache behavior are executed in the memory system, thereby reducing cache misses and memory bandwidth requirements# or address remapping techniques are used to re structure data (like linked lists or non unit stride accesses) so that the ....

M. Hall et al. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. Supercomputing,Portland, OR, Nov. 1999.


Implications of a PIM Architectural Model for MPI - Rodrigues, Murphy, Kogge.. (2003)   Self-citation (Hall Kogge Brockman)   (Correct)

No context found.

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. In Supercomputing, Portland, OR, November 1999.


Characterizing a New Class of Threads in Scientific .. - Rodrigues, Murphy, ..   Self-citation (Hall Kogge)   (Correct)

No context found.

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. In Supercomputing, Portland, OR, November 1999.


Evaluating Compiler Technology for Control-Flow.. - Shin, Hall, Chame   Self-citation (Hall Chame Shin)   (Correct)

No context found.

Mary Hall, Peter Kogge, Jeff Koller, Pedro Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John Granacki, Apoorv Srivastava, William Athas, Jay Brockman, Vincent Freeh, Joonseok Park, and Jaewook Shin. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In ACM International Conference on Supercomputing, November 1999.


Implications of a PIM Architectural Model for MPI - Arun Rodrigues Richard (2003)   Self-citation (Hall Kogge Brockman)   (Correct)

No context found.

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. In Supercomputing, Portland, OR, November 1999.


Trading Bandwidth for Latency: Managing Continuations through.. - Murphy, Kogge (2002)   Self-citation (Kogge)   (Correct)

No context found.

Mary Hall, Peter Kogge, Jeff Koller, Pedro Diniz, Jacqueline Chame, Jeff Draper, Jeff LaCoss, John Granacki, Apoorv Srivastava, William Athas, Jay Brockman, Vincent Freeh, Joonseok Park, and Jaewook Shin. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. In Supercomputing, Portland, OR, November 1999.


Compiler-Controlled Caching in Superword Register Files for.. - Shin, Chame, Hall (2002)   (2 citations)  Self-citation (Hall Chame Shin)   (Correct)

No context found.

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, A. Srivastava, W. Athas, J. Brockman, V. Freeh, J. Park, and J. Shin. Mapping irregular applications to DIVA, a PIMbased data-intensive architecture. In ACM International Conference on Supercomputing, November 1999.


The Architecture of the DIVA Processing-in-Memory Chip - Draper, Chame, Hall.. (2002)   (3 citations)  Self-citation (Hall)   (Correct)

....features are suitable for conventional processors and embedded systems on a chip, such as the design of the WideWord unit. In two previous papers, we presented the DIVA system architecture, memory model and simulated performance improvements due to coarse grain parallelism in PIMs for 3 programs [9], and we described system software requirements and memory management functionality [10] This paper focuses on the DIVA PIM device and makes the following unique contributions. It is the first detailed description of the DIVA PIM microarchitecture. It pinpoints some of the design issues ....

....require a more complex interconnection network and are the topic of future research. Parcels, application code, and data contain virtual addresses. To translate these addresses without the overhead of maintaining conventional page tables at each node, we classify DIVA memory according to usage [9]: 1) global memory visible to the host and PIM nodes; 2) dumb memory allocated as conventional pages in a host application s virtual space and untouched by PIM node processing; and, 3) local memory used exclusively by PIM node routines. To condense translation information, rather than page ....

M. Hall et al. Mapping irregular applications to DIVA, a PIM-based Data-Intensive Architecture. In Proceedings of Supercomputing, Nov. 1999.


Evaluation of OpenMP for the Cyclops Multithreaded.. - Almasi, Ayguade.. (2003)   (2 citations)  (Correct)

No context found.

M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, , and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.


Active Memory Clusters: Efficient Multiprocessing on.. - Heinrich, Speight.. (2002)   (1 citation)  (Correct)

No context found.

Hall, M., et al.: Mapping Irregular Applications to DIVA, A PIM-based DataIntensive Architecture. Supercomputing , Portland, OR, Nov. 1999.


Dissecting Cyclops: A Detailed Analysis of a.. - Almasi, Cascaval, .. (2002)   (1 citation)  (Correct)

No context found.

M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, , and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.


Cache Coherence Protocol Design for Active Memory Systems - Chaudhuri, Kim, Heinrich   (Correct)

No context found.

M. Hall et al. Mapping Irregular Applications to DIVA, A PIM-based Data-Intensive Architecture. Supercomputing , Portland, OR, Nov. 1999.


The Gilgamesh Processor-in-Memory Architecture and Its.. - Hans Zima And (2001)   (1 citation)  (Correct)

No context found.

M.Hall,J.Koller,P.Diniz,J.Chame,J.Draper, J.LaCoss, J.Granacki, J.Brockman, A.Srivastava, W.Athas, V.Freeh, J.Shin, and J.Park. Mapping Irregular Applications to DIVA, a PIM-Based Data Intensive Architecture. Proceedings SC'99, November 1999.


Efficient Remapping Mechanisms for an Adaptable Memory System - Zhang (2002)   (Correct)

No context found.

M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Brockman, W. Athas, A. Srivastava, V. Freeh, J. Shin, and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of 1999.


Evaluation of a Multithreaded Architecture for Cellular.. - Calin Cascaval Jose (2002)   (1 citation)  (Correct)

No context found.

M. W. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCross, J. Brockman, W. Athas, A. Srivasava, V. Freech, J. Shin, , and J. Park. Mapping irregular applications to DIVA, a PIM-based data-intensive architecture. In Proceedings of SC99, November 1999.


Programming the FlexRAM Parallel Intelligent Memory.. - Fraguela, Renau.. (2003)   (1 citation)  (Correct)

No context found.

M. Hall, P. Kogge, J. Koller, P. Diniz, J. Chame, J. Draper, J. LaCoss, J. Granacki, J. Brockman, A. Srivastava, W. Athas, V. Freeh, J. Shin, and J. Park. Mapping Irregular Applications to DIVA, a PIM-Based Data-Intensive Architecture. In Supercomputing, November 1999.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC