19 citations found. Retrieving documents...
Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Decoupled Vector Architectures: a first look - Espasa, Valero (1995)   (Correct)

....benefits of decoupling. Finally, section 8 will present our conclusions and future work. 2 The Reference Architecture In order to compare the decoupled vector architecture to a standard, non decoupled vector architecture, we have taken as our reference model the Convex C34 series of processors [13]. We have designed a vector architecture, that we will refer to as the Reference Vector Architecture that is a close model of the C34 architecture, albeit the low level details of the particular implementation of the C34 have been overlooked. The C34 architecture was chosen mainly because we had a ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Registers Size Influence on Vector Architectures - Villa, Espasa, Valero   (Correct)

....vector architecture used as a baseline. Second, we will introduce the out of order vector architecture used. Finally, we will describe the tools used to generate traces and for simulating each architecture. 4. 1 The Baseline Architecture We have used a machine loosely based on a Convex C3400 [8], as a baseline vector architecture. Even though this machine is a multiprocessor architecture, our work assumes a uniprocessor vector machine. Figure 3 show a basic description of a C3400. ffl Scalar Unit The scalar unit executes all instructions that involve scalar registers (A and S ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Exploiting a New Level of DLP in Multimedia Applications. - Corbal, Valero, Espasa (1999)   (11 citations)  (Correct)

....the Burroughs Scientific Processor (BSP) 23] were designed for the highend supercomputer market. The BSP compiler was able to generate matrix SIMD instructions from up to two Fortran nested loops with up to 6 different vector references. Even conventional vector processors such as the CONVEX C3 [24] included sub word level operations which allowed to perform two 32 bit parallel operations over the same 64 bit vector element. Nowadays, there is a resurgence of the idea of matrix ISAs for the multimedia domain. Some examples are current projects under development such as the Matrisc processor ....

CONVEX Architecture Reference Manual (C Series). Convex Press, Richardson, Texas, U.S.A, 1992.


MOM: a Matrix SIMD Instruction Set Architecture for.. - Corbal, Espasa, Valero   (Correct)

....diminishing returns. Since the current multimedia extensions are a particular case of a vector architecture, could traditional vector ISAs help us in breaking this parallelism limit Traditional vector architectures have been used for many years for high performance numerical applications [9, 10, 11, 12] and, more recently, have also been used in some neural and DSP applications [13] The current multimedia extensions are nothing more than a somewhat limited ISA vector extension with a fixed vector length (limited by the size of the data types) and a fixed vector stride (always consecutive memory ....

CONVEX Architecture Reference Manual (C Series). Convex Press, Richardson, Texas, U.S.A, 1992.


Out-of-Order Vector Architectures - Espasa, Valero, Smith (1997)   (10 citations)  (Correct)

.... vector machines only became commercially successful with the addition of vector registers in the Cray 1 [24] Following the Cray 1, a number of vector machines have been designed and sold, from supercomputers with very high vector bandwidths [19, 15] to more modest mini supercomputers [3, 22]. More recently, the value of vector architectures for desktop applications is being recognized. In particular, many DSP and multimedia applications graphics, compression, encryption are very well suited for vector implementation [18, 1] In a multimedia application, vector instructions are ....

....we propose is modeled after a Convex C3400. In this section we describe the base C3400 architecture and implementation (henceforth, the reference architecture) and the dynamic out of order vector architecture (referred to as OOOVA) 2. 1 The C3400 Reference Architecture The Convex C3400 [3] consists of a scalar unit and an independent vector unit. The scalar unit executes all instructions that involve scalar registers (A and S registers) and issues a maximum of one instruction per cycle. The vector unit consists of two computation units (FU1 and FU2) and one memory accessing unit ....

[Article contains additional citation context not shown here]

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Instruction level characterization of the Perfect Club.. - Espasa, Valero (1995)   (Correct)

....is organized in banks. It has 4 banks which hold 2 vector registers each. The vector registers hold 128 elements of 64 bits. Each bank has 2 read ports and 1 write port. The machine implements fully flexible chaining [18] except for loads, which can not be chained to a computation instruction. See [19] for further details. We have compiled all the Perfect Club benchmarks using the fortran compiler (Convex FC version V8.0) with optimization level O2 (which implies vectorization) Then, the output of these compilations are fed into Dixie which produces 1) a modified executable file with ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series) , sixth edition, April 1992.


Command Vector Memory Systems: High Performance at Low Cost - Corbal, Espasa, Valero (1998)   (8 citations)  (Correct)

....3 Experimental setup In this section we present the tools and benchmarks used to (1) evaluate our proposed command based memory system, and (2) compare it to more traditional vector memory designs. 3. 1 The Vector Processor We use as our vector cpu an out of order version of a Convex C3400 [7]. This out of order design was introduced in [3] and uses register renaming in a similar fashion as a R10000 processor [8] to achieve out of order execution of all types of vector and scalar instructions (see figure 3) Instructions flow in order through the Fetch and Decode Rename stages and ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


A Victim Cache for Vector Registers - Espasa, Valero (1997)   (1 citation)  (Correct)

....call return sequences might involve saving and restoring of vector registers. In the context of traditional vector architectures, it is typical to find only a few vector registers available to the programmer. The successive Cray machines [12, 6] and mini supercomputers such as Convex C3 Series [2] and Alliant [11] only have 8 logical vector registers. In this type of machines spill code can be very detrimental to performance due to two main reasons: first, when a vector register is spilled it involves a very large amount of data movement and, second, the distance, in terms of processor ....

....298 108 74.7 21 Table 1: Basic operation counts for the Perfect Club and Specfp92 programs (Columns 3 5 are in millions) 2 Spill Code in Vector Programs 2. 1 Benchmarks and Platform The platform chosen for this study is a Convex C3400 having 8 vector registers, each holding 128 64 bit elements [2]. The Convex C3400 consists of a scalar unit and an independent vector unit. The scalar unit issues a maximum of one instruction per cycle. The vector unit consists of two computation units and one memory accessing unit. All units are fully pipelined. The eight vector registers are connected to ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Effective Usage of Vector Registers in Advanced Vector.. - Villa, Espasa, Valero (1997)   (Correct)

....This work was supported by the Ministry of Education of Spain under contract 0429 95, by the Instituto de Cooperacion Iberoamericana (ICI) and by the CEPBA. vector machines have been designed and sold, from supercomputers with very high vector bandwidths [4, 5] to more modest mini supercomputers [6, 7]. The traditional approach to vector processor design has been to use an in order execution engine and achieve high performance exploiting the natural datalevel parallelism embedded in each vector instruction. Typically, traditional vector architectures have used very limited forms of ILP ....

....In order to investigate the effects of reducing the hardware vector register length we need a set of benchmarks compiled assuming different vector lengths. Unfortunately, no public domain vectorizing compiler is available and, therefore, we are forced to artificially fool the Convex compiler [6] to generate code as if the vector length was 16, 32 or 64 (instead of the real 128) To obtain the desired binaries we modified the source benchmarks as follows. Using the vectorization information produced by the Convex compiler, we located in the source code each vectorized loop. For each ....

[Article contains additional citation context not shown here]

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Effective Usage of Vector Registers in Advanced Vector.. - Villa, Espasa, Valero (1997)   (Correct)

....Nacional de Ciencia y Tecnologia (CONACYT) y This work was supported by the Ministry of Education of Spain under contract 0429 95, and by the CEPBA. vector machines have been designed and sold, from supercomputers with very high vector bandwidths [4, 5] to more modest mini supercomputers [6, 7]. The traditional approach to vector processor design has been to use an in order execution engine and achieve high performance exploiting the natural datalevel parallelism embedded in each vector instruction. Typically, traditional vector architectures have used very limited forms of ILP ....

....In order to investigate the effects of reducing the hardware vector register length we need a set of benchmarks compiled assuming different vector lengths. Unfortunately, no public domain vectorizing compiler is available and, therefore, we are forced to artificially fool the Convex compiler [6] to generate code as if the vector length was 16, 32 or 64 (instead of the real 128) To obtain the desired binaries we modified the source benchmarks as follows. Using the vectorization information produced by the Convex compiler, we located in the source code each vectorized loop. For each ....

[Article contains additional citation context not shown here]

Convex Press, Richardson,Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Advanced Vector Architectures - Espasa (1997)   (Correct)

.... machines only became commercially successful with the addition of vector registers in the Cray 1 [Rus78] Following the Cray 1, a number of vector machines have been designed and sold, from supercomputers with very high vector bandwidths [Oed92, KIS 94] to more modest mini supercomputers [Con92, PM86] The peak performance of vector supercomputers has been constantly improving from the original 160 Mflops per processor of the Cray 1 up to 5.5 Gflops per processor of the more recent NEC SX 3 [IW91] This improvement has been achieved both through better cycle times (from 12.5ns in the ....

....sequences, typically to intrinsic math routines, might involve saving and restoring of vector registers. Traditional vector machines have had a relatively small number of vector registers (8 is typical) The successive Cray machines [Rus78, Fat89] and mini supercomputers such as Convex C3 Series [Con92] and Alliant [PM86] only have 8 logical vector registers. In this type of machines spill code can be very detrimental to performance due to two main reasons: first, when a vector register is spilled it involves a very large amount of data movement and, second, the distance, in terms of processor ....

[Article contains additional citation context not shown here]

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Effective Usage of Vector Registers in Decoupled Vector.. - Villa, Espasa, Valero (1997)   (Correct)

....In order to investigate the effects of reducing the hardware vector register length we need a set of benchmarks compiled assuming different vector lengths. Unfortunately, no public domain vectorizing compiler is available and, therefore, we are forced to artificially fool the Convex compiler [5] to generate code as if the vector length was 16, 32 or 64 (instead of the real 128) To obtain the desired binaries we modified the source benchmarks as follows. Using the vectorization information produced by the Convex compiler, we located in the source code each vectorized loop. For each ....

.... loop performing steps of length VLZ and modifying the original vectorized loop to do at most VLZ iterations (see figure 1) To prevent the compiler from generating a doubly strip mined loop (our strip mining plus the natural strip mining introduced by the compiler) we used the MAXTRIPS directive [5]. This directive informed the compiler that the inner loop was performing less than 128 trips and thus no extra stripmining was generated. Using such a procedure we strip mined most (but not all) vectorized loops present in our ten benchmarks. Loops that escaped from this strip mining where vector ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


A Simulation Study of Decoupled Vector Architectures - Espasa, Valero   (Correct)

....important to point out that we used the output of the Convex compilers to evaluate our decoupled architecture. This means that the proposal studied in this paper is able to execute in a fully transparent manner an already existing instruction set. 2.1. The Reference Architecture The Convex C3400 [7] consists of a scalar unit and an independent vector unit (see fig. 1) The scalar unit executes all instructions that involve scalar registers (A and S registers) and issues a maximum of one instruction per cycle. The vector unit consists of two computation units (FU1 and FU2) and one memory ....

....Environment To asses the performance benefits of decoupled vector architectures we have taken a trace driven approach. The Perfect Club and Specfp92 programs have been chosen as our benchmarks [3] The tracing procedure is as follows: the Perfect Club programs are compiled on a Convex C3480 [7] machine using the Fortran compiler (version 8.0) at optimization level O2 (which implies scalar optimizations plus vectorization) Then the executables are processed using Dixie [9] a tool that decomposes executables into basic blocks and then instruments the basic blocks to produce four types ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992. 24 ROGER ESPASA AND MATEO VALERO


Decoupled Vector Architectures - Espasa, Valero (1996)   (9 citations)  (Correct)

....time and also reduce the total memory traffic. 2 Experimental Framework To asses the performance benefits of decoupled vector architectures we have taken a trace driven approach. The Perfect Club programs have been chosen as our benchmarks [8] These programs are compiled on a Convex C3480 [3] machine and using Dixie [4] a detailed trace that describes its full execution is produced. The tracing procedure is as follows: the Perfect Club programs are compiled on a Convex C34 machine using the Fortran compiler (version 8.0) at optimization level O2 (which implies scalar optimizations ....

Convex Press, Richardson,Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Effective Usage of Vector Registers in Decoupled Vector.. - Villa, Espasa, Valero (1998)   (Correct)

.... [1, 2] but vector machines only became commercially successful with the addition of vector registers in the Cray 1 [3] Following the Cray 1, a number of vector machines have been designed and sold, from supercomputers with very high vector bandwidths [4, 5] to more modest mini supercomputers [6, 7]. The traditional approach to vector processor design has been to use an in order execution engine and achieve high performance exploiting the natural datalevel parallelism embedded in each vector instruction. Typically, traditional vector architectures have used very limited forms of ILP ....

....In order to investigate the effects of reducing the hardware vector register length we need a set of benchmarks compiled assuming different vector lengths. Unfortunately, no public domain vectorizing compiler is available and, therefore, we are forced to artificially fool the Convex compiler [6] to generate code as if the vector length was 16, 32 or 64 (instead of the real 128) To obtain the desired binaries we modified the source benchmarks as follows. Using the vectorization information produced by the Convex compiler, we located in the source code each vectorized loop. For each ....

[Article contains additional citation context not shown here]

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Quantitative Analysis of Vector Code - Roger Espasa (1995)   (1 citation)  (Correct)

....register register vector machine. Vector computations can only be performed on data that is in the vector registers and access to memory is through load store instructions. The compiler used in all cases is the Convex FC version V8.0 with optimization level O2 (which implies vectorization) See [Con92] for more details. The vector cpu consists of two functional units. The first one handles all vector operations except multiplication, division and square root. The second one handles all vector operations. Each functional unit has access to 8 vector registers. The vector registers are set up in ....

Convex Press, Richardson, Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


A case for merging the ILP and DLP paradigms - Quintana, Espasa, Valero (1998)   (Correct)

....the notion of parallel operations. 3 Methodology This study will compare the relative merits of the ILP and ILP DLP models using both trace driven simulation and data gathered from hardware counters during real executions. We use instruction and memory traces from a Convex C3400 vector machine [7] and from a Mips R10000 microprocessor [37] Traces on the Convex machine where gathered using the Dixie tool [14] while the R10000 measurements were obtained using the SimpleScalar toolset [3] We start by briefly describing our benchmarks, the relevant aspects of both architectures, and then we ....

....file which is organized in banks. It has 4 banks which hold 2 vector registers each. The vector registers hold 128 elements of 64 bits. Each bank has 2 read ports and 1 write port. The machine implements fully flexible chaining except for loads, which can not be chained to a computation. See [7] for further details. The ILP DLP architecture (see fig. 1) is derived from this baseline machine by adding out of order execution and register renaming, in a very similar way as the R10000 [37] Instructions flow in order through the Fetch and Decode Rename stages and then go to one of the four ....

Convex Press, Richardson,Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Multithreaded Vector Architectures - Espasa, Valero (1997)   (3 citations)  (Correct)

....is not shown in the table as it will be varied during this study. The memory system modeled is as follows. We have a single address bus shared by all types of memory transactions (scalar vector and load store) and physically separate data busses for sending and receiving data to from main memory [4]. A vector load instruction (also gather instructions) pays an initial latency and then receives one datum per cycle. Vector store instructions do not pay latency since the processor sends the vector to memory and does not wait for the write operation to complete. We will use a value of 50 cycles ....

....that fully describe the execution of the program. In (c) this set of traces is fed into the simulator, which will do a cycle by cycle execution of the program and will gather performance results. programs have been chosen as our benchmarks [9, 10] These programs are compiled on a Convex C3480 [4] machine and using Dixie [6] a detailed trace that describes its full execution is produced. The tracing procedure is as follows (see figure 2) the benchmark programs are compiled on a Convex C34 machine using the Fortran compiler (version 8.0) at optimization level O2 (which implies scalar ....

Convex Press, Richardson,Texas, U.S.A. CONVEX Architecture Reference Manual (C Series), sixth edition, April 1992.


Automatic Self-Allocating Threads (asat) On The Convex.. - Charles Severance And (1995)   Self-citation (Corp)   (Correct)

....400,000 cycles after the context switch. In many situations, the cache impact dominated the overall cost of a context switch. Other dynamic, run time, thread management techniques which are geared toward compiler detected parallelism include: Automatic Self Adjusting Processors (ASAP) from Convex [9] and Autotasking on Cray Research [10] computers. Convex ASAP is based on hardware extensions to the architecture and requires very little run time library support. Cray s Autotasking is a software based approach and is supported in the run time library and the operating system scheduler. The ....

....processor will have similar problems in dealing with thread imbalance. A previous study of the benefits of Automatic Self Allocating Threads (ASAT) for the SGI Challenge was done in [5] DYNAMIC LOAD ON THE C 240 Since our concept is based on ASAP, we begin by examining the Convex C Series [9] hardware support for dynamic thread management. As the load changes dynamically, the number of processors and threads assigned to an application changes. The Convex CSeries can adjust the number of threads within a very small number of hardware instructions because thread management is built into ....

Convex Computer Corp., Convex Architecture Reference Manual (C-Series), Document DHW300, Convex Press, Richardson, TX, Apr. 1992.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC