• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 161
Next 10 →

Token flow control

by Amit Kumar, et al.
"... As companies move towards many-core chips, an efficient onchip communication fabric to connect these cores assumes critical importance. To address limitations to wire delay scalability and increasing bandwidth demands, state-of-the-art on-chip networks use a modular packet-switched design with route ..."
Abstract - Cited by 635 (35 self) - Add to MetaCart
synthetic traffic and traces from the SPLASH-2 benchmark suite show reduction in packet latency by up to 77.1 % with upto 39.6 % reduction in average router energy consumption as compared to a state-of-theart baseline packet-switched design. For the same saturation throughput as the baseline network, TFC

The SGI Origin: A ccNUMA highly scalable server

by James Laudon, Daniel Lenoski - In Proceedings of the 24th International Symposium on Computer Architecture (ISCA’97 , 1997
"... The SGI Origin 2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor designed and manufactured by Silicon Graphics, Inc. The Origin system was designed from the ground up as a multiprocessor capable of scaling to both small and large processor counts without any bandwidth, laten ..."
Abstract - Cited by 497 (0 self) - Add to MetaCart
the Origin 2000 and then describes its architecture and implementation. In addition, performance results are presented for the NAS Parallel Benchmarks V2.2 and the SPLASH2 applications. Finally, the Origin system is compared to other contemporary commercial ccNUMA systems. 1

TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems

by Pete Keleher , Alan L. Cox, Sandhya Dwarkadas, Willy Zwaenepoel - IN PROCEEDINGS OF THE 1994 WINTER USENIX CONFERENCE , 1994
"... TreadMarks is a distributed shared memory (DSM) system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultrix using DECstation-5000/240's that are connected by a 100-Mbps switch-based ATM LAN and a 10-Mbps Ethernet. Ou ..."
Abstract - Cited by 526 (17 self) - Add to MetaCart
of Water from the SPLASH benchmark suite, we achieved only moderate speedups (4.0) due to the high communication and synchronization rate. Speedups decline on the 10-Mbps Ethernet (5.5 for Jacobi, 6.5 for TSP, 4.2 for Quicksort, 5.1 for ILINK, and 2.1 for Water), reflecting the bandwidth limitations

Logtm: Log-based transactional memory

by Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, David A. Wood - in HPCA , 2006
"... Transactional memory (TM) simplifies parallel programming by guaranteeing that transactions appear to execute atomically and in isolation. Implementing these properties includes providing data version management for the simultaneous storage of both new (visible if the transaction commits) and old (r ..."
Abstract - Cited by 282 (11 self) - Add to MetaCart
detection on evicted blocks and fast commit (using lazy cleanup). Second, LogTM handles aborts in (library) software with little performance penalty. Evaluations running micro- and SPLASH-2 benchmarks on a 32way multiprocessor support our decision to optimize for commit by showing that only 1-2

PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors

by Christian Bienia, Sanjeev Kumar, Kai Li - 112 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC ’08 , 2008
"... The PARSEC benchmark suite was recently released and has been adopted by a significant number of users within a short amount of time. This new collection of workloads is not yet fully under-stood by researchers. In this study we compare the SPLASH-2 and PARSEC benchmark suites with each other to gai ..."
Abstract - Cited by 51 (3 self) - Add to MetaCart
The PARSEC benchmark suite was recently released and has been adopted by a significant number of users within a short amount of time. This new collection of workloads is not yet fully under-stood by researchers. In this study we compare the SPLASH-2 and PARSEC benchmark suites with each other

Performance Evaluation of Two Home-Based Lazy Release Consistency Protocols for Shared Virtual Memory Systems

by Yuanyuan Zhou, Liviu Iftode, Kai Li - In Proceedings of the Operating Systems Design and Implementation Symposium , 1996
"... This paper investigates the performance of shared virtual memory protocols on large-scale multicomputers. Using experiments on a 64-node Paragon, we show that the traditional Lazy Release Consistency (LRC) protocol does not scale well, because of the large number of messages it requires, the large a ..."
Abstract - Cited by 160 (20 self) - Add to MetaCart
overlapping to the base LRC protocol, with similar results. Our experiments were done using ve of the Splash-2 benchmarks. We report overall execution times, as well as detailed breakdowns of elapsed time, message trac, and memory use for each of the protocols. 1

A Communication Characterisation of Splash-2 and Parsec

by Nick Barrow-williams, Christian Fensch, Simon Moore
"... Recent benchmark suite releases such as Parsec specifically utilise the tightly coupled cores available in chipmultiprocessors to allow the use of newer, high performance, models of parallelisation. However, these techniques introduce additional irregularity and complexity to data sharing and are en ..."
Abstract - Cited by 12 (0 self) - Add to MetaCart
are presented for the full collection of Splash-2 and Parsec benchmarks. Our results aim to support the design of future communication systems for CMPs, encompassing coherence protocols, network-on-chip and thread mapping. 1

Neighborhood Prefetching on Multiprocessors Using Instruction History

by David M. Koppelman , 2000
"... A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood is based on the data address and the past behavior of the instruction that missed the cache. A neighborhood for an instru ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
access patterns. Neighborhood prefetching was compared to adaptive sequential prefetching using execution-driven simulation. Results show more useful prefetches and lower execution time for neighborhood prefetching for six of eight SPLASH-2 benchmarks. On eight SPLASH-2 benchmarks the average normalized

The Augmint Multiprocessor Simulation Toolkit for Intel x86 Architectures

by Anthony-Trung Nguyen, Maged Michael, Arun Sharma, Josep Torrellas , 1996
"... Most publicly-available simulation tools only simulate RISC architectures. These tools cannot capture the instruction mix and memory reference patterns of CISC architectures. In this paper, we present an overview of Augmint, an execution-driven multiprocessor simulation toolkit that fills this gap b ..."
Abstract - Cited by 61 (6 self) - Add to MetaCart
by supporting Intel x86 architectures. Augmint also supports trace-driven simulation for uniprocessors as well as multiprocessors, with minor effort on the part of simulator developers. Augmint runs m4-macro-extended C and C++ applications such as those in the SPLASH and SPLASH-2 benchmark suites. Augmint

Neighborhood Prefetching on Multiprocessors Using Instruction History ∗

by unknown authors
"... A multiprocessor prefetch scheme is described in which a miss is followed by a prefetch of a group of lines, a neighborhood, surrounding the demand-fetched line. The neighborhood is based on the data address and the past behavior of the instruction that missed the cache. A neighborhood for an instru ..."
Abstract - Add to MetaCart
access patterns. Neighborhood prefetching was compared to adaptive sequential prefetching using execution-driven simulation. Results show more useful prefetches and lower execution time for neighborhood prefetching for six of eight SPLASH-2 benchmarks. On eight SPLASH-2 benchmarks the average normalized
Next 10 →
Results 1 - 10 of 161
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University