16 citations found. Retrieving documents...
K. Asanovic. Vector microprocessors. Phd thesis, University of California at Berkeley, 1998.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Exploring the VLSI Scalability of Stream Processors - Khailany, Dally, Rixner.. (2003)   (3 citations)  (Correct)

....from stream processors, which execute VLIW instructions from a kernel in a SIMD fashion out of a SRF and contain LRFs to store intermediate results. Several authors have analyzed the VLSI costs of components of vector microprocessors as the number of functional units per vector lane is increased [2, 10, 18]. Kozyrakis also analyzed the natural vector lengths in media benchmarks and the performance of vector microprocessors as the number of FUs per vector lane are increased [11] However, to our knowledge, no previously published studies explore VLSI costs or performance as vector microprocessors are ....

....therefore limiting speedup. In addition to algorithmic inefficiencies leading to poor speedups, applications also suffer from short stream effects [14] when stream lengths are comparable to the number of clusters, similar to performance effects due to short vector lengths in vector processors [2]. With short streams, the number of inner loop iterations executed per kernel call decreases, causing a larger fraction of execution time to be spent in loop prologues and epilogues rather than in kernel inner loops. Furthermore, since software pipelining is used extensively to optimize kernel ....

K. Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, 1998.


Application-Specific Memory Management for Embedded.. - Chiou, Jain, Rudolph, .. (1999)   (10 citations)  (Correct)

....from the nearly horizontal curves for column caching. The idea of statically partitioned caches is not new. The most common example are separate instruction and data caches. Some existing and proposed architectures support a pair of caches, one for spatial locality and one for temporal locality [14, 15, 5, 1, 7, 4]. These designs statically divide the two caches. Other processors support locking of data into the cache[3, 9] but do not include a way to tell if the desired data is in the cache. Sun Microsystems Corporation patented a mechanism [10] very similar to column caching that allows partitioning of a ....

K. Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, May 1998.


Application-Specific Memory Management for Embedded.. - Chiou, Jain, Devadas, .. (1999)   (10 citations)  (Correct)

....Work 5.1 Cache Mechanisms The idea of statically partitioned caches has been around for a long time. The most common example are separate instruction and data caches. Some existing and proposed architectures support a pair of caches, one 14 for spatial locality and one for temporal locality [21, 23, 12, 2, 15, 7]. These designs statically separate the two caches in hardware, generally wasting resources since the partition is rarely exactly correct. Sun Microsystems Corporation holds a patent on a mechanism [17] very similar to column caching that allows partitioning of a cache between processes at cache ....

Krste Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, May 1998.


Energy-Efficient Register File Design - Tseng (1999)   (1 citation)  (Correct)

....The C switch r is the switching capacitance related to node r. The parameters used in this energy estimation model are based on a 0.6 # n well CMOS process technology with 3.3V power supply and two layers of metal. The design of register file and bypassing network is based on the T0 design [1] and is laid out using Magic [12] The layout to circuit extraction tool, Space [17] is used to extract a circuit netlist for further circuit simulation. Space extracts capacitance to the substrate, fringe capacitance, crossover coupling capacitance, and capacitance between parallel wires. Hspice ....

K. Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, May 1998.


Vector Microprocessors for Desktop Computing - Stoodley, Lee (1999)   (3 citations)  (Correct)

.... (ILP) Although limit studies indicate that high levels of ILP (10 instructions per cycle (IPC) or more) are present in such programs [38] 32] architectural and compiler innovations over the past decade have not been highly successful at extracting this parallelism to improve performance[22][2]. To see how much ILP researchers have been able to extract from these types of applications, we surveyed the IPC results presented in papers from three architecture conferences of 1998 and 1999 for two of the most Source gcc go Special Notes [37] Fig. 5 1.6 1.5 multiscalar: 4 Theta2 way OOO ....

....cycles while the third considers the cost effectiveness of providing performance for these applications. First, some of these applications can, in fact, be partially vectorized. Asanovi c has shown that, in some cases, performance critical components of these types of programs can be vectorized [2]. By modifying only 0.3 of the source code of one application and at most 7.5 of another, four of the eight SPEC95 integer benchmarks (compress, ijpeg, li, m88ksim) can be partially vectorized. These modifications result in an average 34 improvement in performance for the entire SPECint95 suite ....

[Article contains additional citation context not shown here]

Krste Asanovi'c. Vector Microprocessors. University of California at Berkeley, 1998. Ph.D. Thesis.


Dynamic Cache Partitioning via Columnization - Chiou, Rudolph, Devadas, Ang (2000)   (4 citations)  (Correct)

....application cache footprints. This paper shows, in Section 3.3, that appropriate cache partitioning can have this beneficial effect. Static cache partitioning is an old idea. Instruction and data caches have long been split in Harvard architectures and spatial temporal caches are becoming popular[26, 29, 16, 2, 21, 13]. The static nature of these partitionings, however, often waste resources, allocating too much to one partition and not enough to another. Column caching, with its dynamic partitioning capability is thus vital for achieving the best partitioning. The unpredictability of caches have made them ....

....allows stream data to occupy more of the 15 cache than it should and replace temporal data that should not be replaced. Many solutions have been proposed to address this problem. Perhaps the most common is a split cache, one part for spatial locality and the other part for temporal locality [26, 29, 16, 2, 21, 13]. These designs statically partition the available real estate between the two caches. Some rely on hardware based algorithms that separate the reference streams into one or the other cache while others keep information indicating which cache to use in the page table, allowing software to specify ....

K. Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, May 1998.


Exploiting a New Level of DLP in Multimedia Applications. - Corbal, Valero, Espasa (1999)   (11 citations)  (Correct)

....bigger that the MMX multimedia register file, the area costs are of the same order. The reduction of complexity of vector register files, due basically to the fact that we can interleave the elements of every vector register among several banks) has already been highlighted in previous works [17, 18]. way 1 way 2 way 4 way 8 ROB size 8 16 32 64 Load Store queue 4 8 16 32 Bimodal predictor 512 2K 4K 16K BTB entries 64 256 512 1024 INT simple complex 0 1 1 1 2 1 2 2 FP simple complex 0 1 1 1 2 1 2 2 MED simple complex 0 1 1 1 2 4 (2x2) memory ports 1 1 2 4 (2x2) INT log ph ....

Krste Asanovic. Vector microprocessors. Phd thesis, University of California at Berkeley, 1998.


Exploiting a New Level of DLP in Multimedia Applications. - Corbal, Valero, Espasa (1999)   (11 citations)  (Correct)

....registers does not force us to increase the hardware requirements of the processor. The reduction of complexity of vector register files, due basically to the fact that we can interleave the elements of every vector register among several banks) has already been highlighted in previous works [14, 15]. way 1 way 2 way 4 way 8 ROB size 8 16 32 64 Load Store queue 4 8 16 32 Bimodal predictor 512 2K 4K 16K BTB entries 64 256 512 1024 INT simple complex 0 1 1 1 2 1 2 2 FP simple complex 0 1 1 1 2 1 2 2 MED simple complex 0 1 1 1 2 4 (2x2) memory ports 1 1 2 4 (2x2) INT log ph ....

Krste Asanovic. Vector microprocessors. Phd thesis, University of California at Berkeley, 1998.


Vector Microprocessors for Desktop Computing - Stoodley, Lee (1998)   (3 citations)  (Correct)

.... in the processor and because the application program can access this register, the vector compiler can easily generate code that dynamically stripmines loops according to the implementation s vector length; the striplength is determined when the program runs rather than when it is compiled [14]. Of course, there are other, non multimedia applications that are important to desktop computing including the SPEC integer benchmark suite and productivity applications such as word processors and presentation software. These types of programs are generally believed to be non vectorizable. We ....

Krste Asanovi'c. Vector Microprocessors. University of California at Berkeley, 1998. Ph.D. Thesis, Section 5.1.4.


Vector Microprocessors for Desktop Computing - Stoodley, Lee (1998)   (3 citations)  (Correct)

.... to characterize the performance of inorder versus out of order designs, thus allowing performance results to be evaluated and understood more quickly [10] Vector architectures also reduce design complexity by virtue of consisting primarily of datapath structures with simple control circuitry [13][16] These structures are easily replicated to produce the vector hardware. This also allows designers to reuse designs across different processor generations and thus provide scalable multi generational implementations with minimal effort. Although the reduced complexity of a vector ....

....design. Additionally, it may be possible to improve performance by vectorizing some portions of these programs even though they are widely believed to be non vectorizable. In his doctoral thesis, Asanovi c investigated the potential for vectorizing benchmarks in the SPEC95 integer suite [13]. He demonstrated that substantial speedups can be achieved by rewriting 4 of the 8 benchmarks to vectorize several key loops with very few localized code modifications. As few as 0.3 of the source code lines in one program had to be modified, and at most 7.5 in another. The vectorized programs ....

[Article contains additional citation context not shown here]

Krste Asanovi'c. Vector Microprocessors. University of California at Berkeley, 1998. Ph.D. Thesis.


Simple Vector Microprocessors for Multimedia Applications - Lee, Stoodley (1998)   (20 citations)  (Correct)

....in other application areas has been reported elsewhere. Their effectiveness on scientific and engineering applications has been demonstrated by their historically dominant use in the supercomputing arena, while other researchers are currently investigating the vectorizability of SPECint programs [1]. The remainder of the paper is organized as follows. In the next section, we describe the details of the processors that we study. In Section 3, we give area estimates for the simple long vector processor and compare its area to those of existing OOO superscalar processors. In Section 4, we ....

....design rules and two metal layers, and was first fully functional in April 1995 at 45MHz. 1 Unlike vector supercomputers, the T0 implementation is inexpensive by virtue of being fabricated as a single VLSI chip [23] In addition to being inexpensive, T0 is also a nimble vector implementation [1]. Much of T0 s nimbleness can be attributed to the tight integration of the scalar processor and vector hardware on a single die, thus reducing the scalar overhead of vector execution significantly. T0 s single die implementation also allows backto back vector instructions to execute in the same ....

[Article contains additional citation context not shown here]

Krste Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, May 1998.


DLP + TLP Processors for the Next Generation of Media Workloads - Jesus Corbal Roger (2001)   (Correct)

No context found.

K. Asanovic. Vector microprocessors. Phd thesis, University of California at Berkeley, 1998.


Unknown - (2001)   (Correct)

No context found.

Krste Asanovic. Vector microprocessors. Phd thesis, University of California at Berkeley, 1998.


The VLSI Implementation and Evaluation of Area- and.. - Khailany (2003)   (5 citations)  (Correct)

No context found.

Krste Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, 1998.


The VLSI Implementation and Evaluation of Area- and.. - Khailany (2003)   (5 citations)  (Correct)

No context found.

Krste Asanovic. Vector Microprocessors. PhD thesis, University of California at Berkeley, 1998.


Stream Register Files with Indexed Access - Nuwan Jayasena Mattan (2004)   (2 citations)  (Correct)

No context found.

K. Asanovic. Vector Microprocessors. Ph.D. thesis, University of California at Berkeley, May 1998.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC