• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 11,877
Next 10 →

Fast Parallel Algorithms for Short-Range Molecular Dynamics

by Steve Plimpton - JOURNAL OF COMPUTATIONAL PHYSICS , 1995
"... Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dyn ..."
Abstract - Cited by 622 (6 self) - Add to MetaCart
. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers -- the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows

Software pipelining: An effective scheduling technique for VLIW machines

by Monica Lam , 1988
"... This paper shows that software pipelining is an effective and viable scheduling technique for VLIW processors. In software pipelining, iterations of a loop in the source program are continuously initiated at constant intervals, before the preceding iterations complete. The advantage of software pipe ..."
Abstract - Cited by 579 (3 self) - Add to MetaCart
number of iterations. Hierarchical reduction comple-ments the software pipelining technique, permitting a consis-tent performance improvement be obtained. The techniques proposed have been validated by an im-plementation of a compiler for Warp, a systolic array consist-ing of 10 VLIW processors

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

by Edward Ashford Lee, David G. Messerschmitt - IEEE TRANSACTIONS ON COMPUTERS , 1987
"... Large grain data flow (LGDF) programming is natural and convenient for describing digital signal processing (DSP) systems, but its runtime overhead is costly in real time or cost-sensitive applications. In some situations, designers are not willing to squander computing resources for the sake of pro ..."
Abstract - Cited by 592 (37 self) - Add to MetaCart
not be done at runtime, but can be done at compile time (statically), so the runtime overhead evaporates. The sample rates can all be different, which is not true of most current data-driven digital signal processing programming methodologies. Synchronous data flow is closely related to computation graphs, a

The nas parallel benchmarks

by D. H. Bailey, E. Barszcz, J. T. Barton, D. S. Browning, R. L. Carter, R. A. Fatoohi, P. O. Frederickson, T. A. Lasinski, H. D. Simon, V. Venkatakrishnan, S. K. Weeratunga - The International Journal of Supercomputer Applications , 1991
"... A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of ve \parallel kernel " benchmarks and three \simulated application" benchmarks. Together they mimic the computation and data movement characterist ..."
Abstract - Cited by 686 (10 self) - Add to MetaCart
A new set of benchmarks has been developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of ve \parallel kernel " benchmarks and three \simulated application" benchmarks. Together they mimic the computation and data movement characteristics of large scale computational uid dynamics applications. The principal distinguishing feature of these benchmarks is their \pencil and paper " speci cation | all details of these benchmarks are speci ed only algorithmically. In this way many of the di culties associated with conventional benchmarking approaches on highly parallel systems are avoided. 1

A high-performance, portable implementation of the MPI message passing interface standard

by Ewing Lusk, Nathan Doss, Anthony Skjellum - Parallel Computing , 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract - Cited by 888 (67 self) - Add to MetaCart
MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum. 1

U-Net: A User-Level Network Interface for Parallel and Distributed Computing

by Thorsten Von Eicken, Anindya Basu, Vineet Buch, Werner Vogels - In Fifteenth ACM Symposium on Operating System Principles , 1995
"... The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communi ..."
Abstract - Cited by 596 (17 self) - Add to MetaCart
The U-Net communication architecture provides processes with a virtual view of a network interface to enable userlevel access to high-speed communication devices. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while still providing full protection. The model presented by U-Net allows for the construction of protocols at user level whose performance is only limited by the capabilities of network. The architecture is extremely flexible in the sense that traditional protocols like TCP and UDP, as well as novel abstractions like Active Messages can be implemented efficiently. A U-Net prototype on an 8-node ATM cluster of standard workstations offers 65 microseconds round-trip latency and 15 Mbytes/sec bandwidth. It achieves TCP performance at maximum network bandwidth and demonstrates performance equivalent to Meiko CS-2 and TMC CM-5 supercomputers on a set of Split-C benchmarks. 1

Chebyshev and Fourier Spectral Methods

by John P. Boyd , 1999
"... ..."
Abstract - Cited by 778 (12 self) - Add to MetaCart
Abstract not found

What Every Computer Scientist Should Know About Floating-Point Arithmetic

by David Goldberg , 1991
"... ..."
Abstract - Cited by 483 (0 self) - Add to MetaCart
Abstract not found

The Landscape of Parallel Computing Research: A View from Berkeley

by Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, Joseph James Gebis, Parry Husbands, Kurt Keutzer, David A. Patterson, William Lester Plishker, John Shalf, Samuel Webb Williams, Katherine A. Yelick - TECHNICAL REPORT, UC BERKELEY , 2006
"... ..."
Abstract - Cited by 468 (25 self) - Add to MetaCart
Abstract not found

Shared memory consistency models: A tutorial

by Sarita V. Adve, Kourosh Gharachorloo - IEEE Computer , 1996
"... Parallel systems that support the shared memory abstraction are becoming widely accepted in many areas of computing. Writing correct and efficient programs for such systems requires a formal specification of memory semantics, called a memory consistency model. The most intuitive model—sequential con ..."
Abstract - Cited by 435 (11 self) - Add to MetaCart
—sequential consistency—greatly restricts the use of many performance optimizations commonly used by uniprocessor hardware and compiler designers, thereby reducing the benefit of using a multiprocessor. To alleviate this problem, many current multiprocessors support more relaxed consistency models. Unfortunately
Next 10 →
Results 1 - 10 of 11,877
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University