Results 1 - 10
of
14,033
Route Packets, Not Wires: On-Chip Interconnection Networks
, 2001
"... Using on-chip interconnection networks in place of ad-hoc global wiring structures the top level wires on a chip and facilitates modular design. With this approach, system modules (processors, memories, peripherals, etc...) communicate by sending packets to one another over the network. The structur ..."
Abstract
-
Cited by 885 (10 self)
- Add to MetaCart
. The structured network wiring gives well-controlled electrical parameters that eliminate timing iterations and enable the use of high-performance circuits to reduce latency and increase bandwidth. The area overhead required to implement an on-chip network is modest, we estimate 6.6%. This paper introduces
Parallel database systems: the future of high performance database systems
- COMMUNICATIONS OF THE ACM
, 1992
"... Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews t ..."
Abstract
-
Cited by 641 (13 self)
- Add to MetaCart
Parallel database machine architectures have evolved from the use of exotic hardware to a software parallel dataflow architecture based on conventional shared-nothing hardware. These new designs provide impressive speedup and scaleup when processing relational database queries. This paper reviews
A high-performance, portable implementation of the MPI message passing interface standard
- Parallel Computing
, 1996
"... MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we d ..."
Abstract
-
Cited by 890 (65 self)
- Add to MetaCart
describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free
The SGI Origin: A ccNUMA highly scalable server
- In Proceedings of the 24th International Symposium on Computer Architecture (ISCA’97
, 1997
"... The SGI Origin 2000 is a cache-coherent non-uniform memory access (ccNUMA) multiprocessor designed and manufactured by Silicon Graphics, Inc. The Origin system was designed from the ground up as a multiprocessor capable of scaling to both small and large processor counts without any bandwidth, laten ..."
Abstract
-
Cited by 497 (0 self)
- Add to MetaCart
, latency, or cost cliffs.The Origin system consists of up to 512 nodes interconnected by a scalable Craylink network. Each node consists of one or two RlOOOO processors, up to 4 GB of coherent memory, and a connection to a portion of the X I0 10 subsystem. This paper discusses the motivation for building
Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors
- ACM Transactions on Computer Systems
, 1991
"... Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke ..."
Abstract
-
Cited by 573 (32 self)
- Add to MetaCart
Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become
Vogels, U-Net: a user-level network interface for parallel and distributed computing, in:
- Proceedings of the 15th ACM Symposium on Operating System Principles, ACM,
, 1995
"... Abstract The U-Net communication architecture provides processes with a virtual view of a network device to enable user-level access to high-speed communication devices. The architecture, implemented on standard workstations using off-the-shelf ATM communication hardware, removes the kernel from th ..."
Abstract
-
Cited by 597 (17 self)
- Add to MetaCart
Abstract The U-Net communication architecture provides processes with a virtual view of a network device to enable user-level access to high-speed communication devices. The architecture, implemented on standard workstations using off-the-shelf ATM communication hardware, removes the kernel from
Active Messages: a Mechanism for Integrated Communication and Computation
, 1992
"... The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high com ..."
Abstract
-
Cited by 1054 (75 self)
- Add to MetaCart
The design challenge for large-scale multiprocessors is (1) to minimize communication overhead, (2) allow communication to overlap computation, and (3) coordinate the two without sacrificing processor cost/performance. We show that existing message passing multiprocessors have unnecessarily high
Multiscalar Processors
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract
-
Cited by 589 (30 self)
- Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks
The BSD Packet Filter: A New Architecture for User-level Packet Capture
, 1992
"... Many versions of Unix provide facilities for user-level packet capture, making possible the use of general purpose workstations for network monitoring. Because network monitors run as user-level processes, packets must be copied across the kernel/user-space protection boundary. This copying can be m ..."
Abstract
-
Cited by 568 (2 self)
- Add to MetaCart
filter evaluator that is up to 20 times faster than the original design. BPF also uses a straightforward buffering strategy that makes its overall performance up to 100 times faster than Sun's NIT running on the same hardware. 1 Introduction Unix has become synonymous with high quality networking
The Google File System
- ACM SIGOPS OPERATING SYSTEMS REVIEW
, 2003
"... We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While s ..."
Abstract
-
Cited by 1501 (3 self)
- Add to MetaCart
We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients. While
Results 1 - 10
of
14,033