Results 1 - 10
of
320,133
Multiscalar Processors
- In Proceedings of the 22nd Annual International Symposium on Computer Architecture
, 1995
"... Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks are distribute ..."
Abstract
-
Cited by 585 (30 self)
- Add to MetaCart
Multiscalar processors use a new, aggressive implementation paradigm for extracting large quantities of instruction level parallelism from ordinary high level language programs. A single program is divided into a collection of tasks by a combination of software and hardware. The tasks
Synchronization of 1-Way Connected Processors
"... . We are given a network of n identical processors that work synchronously at discrete steps. At each time step every processor sends messages only to a given subset of its neighbouring processors and receives only from the remaining neighbours. The computation starts with one distinguished proc ..."
Abstract
- Add to MetaCart
. We are given a network of n identical processors that work synchronously at discrete steps. At each time step every processor sends messages only to a given subset of its neighbouring processors and receives only from the remaining neighbours. The computation starts with one distinguished
Complexityeffective superscalar processors
- In Proceedings of the 24th annual international symposium on Computer architecture
, 1997
"... The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated f ..."
Abstract
-
Cited by 459 (5 self)
- Add to MetaCart
The performance tradeoff between hardware complexity and clock speed is studied. First, a generic superscalar pipeline is de-fined. Then the specific areas of register renaming, instruction win-dow wakeup and selection logic, and operand bypassing are ana-lyzed. Each is modeled and Spice simulated for feature sizes of 0:8m, 0:35m, and 0:18m. Performance results and trends are expressed in terms of issue width and window size. Our analysis in-dicates that window wakeup and selection logic as well as operand bypass logic are likely to be the most critical in the future. A microarchitecture that simplifies wakeup and selection logic is proposed and discussed. This implementation puts chains of de-pendent instructions into queues, and issues instructions from mul-tiple queues in parallel. Simulation shows little slowdown as com-pared with a completely flexible issue window when performance is measured in clock cycles. Furthermore, because only instructions at queue heads need to be awakened and selected, issue logic is simpli-fied and the clock cycle is faster – consequently overall performance is improved. By grouping dependent instructions together, the pro-posed microarchitecture will help minimize performance degrada-tion due to slow bypasses in future wide-issue machines. 1
A formal basis for architectural connection
- ACM TRANSACTIONS ON SOJIWARE ENGINEERING AND METHODOLOGY
, 1997
"... ..."
Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System
- In Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles
, 1995
"... Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment that includes portable machines with less than ideal network connectivity. To maximize availability, users can read and write any accessible replica. Bayou's design has focused on supporting apph ..."
Abstract
-
Cited by 506 (14 self)
- Add to MetaCart
Bayou is a replicated, weakly consistent storage system designed for a mobile computing environment that includes portable machines with less than ideal network connectivity. To maximize availability, users can read and write any accessible replica. Bayou's design has focused on supporting
A hexagonally connected processor array for Jacobi-type matrix algorithms
, 1990
"... Indexing terms : Parallel algorithms, systolic arrays, matrix computation Jacobi-type matrix algorithms are mostly implemented on orthogonally connected processor arrays. In this letter, an alternative partitioning is described, resulting in a grid of hexagonally connected processors. This partitio ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Indexing terms : Parallel algorithms, systolic arrays, matrix computation Jacobi-type matrix algorithms are mostly implemented on orthogonally connected processor arrays. In this letter, an alternative partitioning is described, resulting in a grid of hexagonally connected processors
The x-Kernel: An Architecture for Implementing Network Protocols
- IEEE Transactions on Software Engineering
, 1991
"... This paper describes a new operating system kernel, called the x-kernel, that provides an explicit architecture for constructing and composing network protocols. Our experience implementing and evaluating several protocols in the x-kernel shows that this architecture is both general enough to acc ..."
Abstract
-
Cited by 663 (21 self)
- Add to MetaCart
to accommodate a wide range of protocols, yet efficient enough to perform competitively with less structured operating systems. 1 Introduction Network software is at the heart of any distributed system. It manages the communication hardware that connects the processors in the system and it defines
Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors
- In Proceedings of the 17th Annual International Symposium on Computer Architecture
, 1990
"... Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory and the f ..."
Abstract
-
Cited by 735 (18 self)
- Add to MetaCart
Scalable shared-memory multiprocessors distribute memory among the processors and use scalable interconnection networks to provide high bandwidth and low latency communication. In addition, memory accesses are cached, buffered, and pipelined to bridge the gap between the slow shared memory
Simultaneous Multithreading: Maximizing On-Chip Parallelism
, 1995
"... This paper examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar’s multiple functional units in a single cycle. We present several models of simultaneous multithreading and compare them with alternative organizations: a wide s ..."
Abstract
-
Cited by 802 (48 self)
- Add to MetaCart
superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. Our results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited in their ability to utilize the resources of a wide-issue processor. Simultaneous
Results 1 - 10
of
320,133