15 citations found. Retrieving documents...
Hennessy, J., and Patterson, D. Computer Architecture - A Quantitive Approach. Morgan Kaufmann Publishers Inc., 1990.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
General-Purpose Architecture Instruction Scheduling Techniques - De Sutter (1998)   (Correct)

....the compiler had to be organized in instructions. Microcode compaction techniques were used to increase the number of operations performed per instruction (read per cycle) Today, we would call this the IPC (instructions per cycle) a factor with major importance, as shown by Hennesy and Patterson [44]. The instruction level parallelism (ILP) that is found in most processors today, whether they are superscalar or VLIW architectures, resembles horizontal microcode in the way it increases IPC. Therefore, instruction scheduling techniques for pipelined processors are not new, but are being built ....

Hennessy, J., and Patterson, D. Computer Architecture - A Quantitive Approach. Morgan Kaufmann Publishers Inc., 1990.


A Framework for Parallel Job Scheduling - Subramanian (1995)   (Correct)

....development, debugging, testing, tinkering and tuning is inherently interactive. A consistent neglect of this fact (caused due to a single minded devotion to speed) was in some part responsible for the recent bankruptcies of Thinking Machines, Kendall Square Research and Cray Computer Corporation [HP96, page 628] In short, if parallel computing is to become a pervasive technology, instead of a privilege of large government labs and companies, then we feel it must take the step to interactiveness that sequential computing took a long time ago. 3.3 The Goodness of a Scheduler As mentioned in ....

J.L. Hennessy and D.A. Patterson. Computer Architecture: A Quantitive Approach. Morgan Kaufmann Publishers, San Mateo, CA, second edition, 1996.


Emmerald: A Fast Matrix-Matrix Multiply Using Intel's SSE.. - Aberdeen, Baxter   (Correct)

....the L2 cache is 900 million single precision values per second. Both L1 and L2 caches are write back, meaning writes go to cache and main memory is only updated when an updated cache line is overwritten. For a detailed treatment of caches and translation lookahead buffers the reader is referred to [4, 5]. A standard matrix multiply technique for deep memory hierarchy machines (used in [3, 7, 14] is matrix blocking. The fundamental concept is to break a large matrixmatrix multiply into a series of smaller matrix multiplies where the data required will fit entirely into cache. This is illustrated ....

J. L. Hennessy and D. A. Patterson. Computer Architecture A Quantitive Approach. Morgan Kaufmann, 2nd edition, 1996.


Advanced Evaluation Techniques for Java Platforms - Grawert, Weiler, Rosenstiel   (Correct)

....be determined. For many applications or parts of applications, this can be done by recording an accurate trace. This information allows a simple calculation of the execution time according the following equation. However, this formula is too simple if such modern architectures are considered (see [6]) as processors using caches and pipelining. Using large BB helps to compensate pipelining slightly. But the size of BB used during platform evaluation is limited by the BB size of those within real applications. Therefore, this paper will introduce an extension toward traces, which is an a ....

J.L. Hennesy, D.A. Patterson, Computer Architecture: A Quantitive Approach, Morgan Kaufman Publishers Inc., 1990


Report on the Parsytec Simulations - van Brummen, Papathanassiadis.. (1994)   (Correct)

....read quad word memory Hibric asyn acknowledge write quad word Table 3.4: Interactions between PowerStone components 24 DRAFT Simulation Report March 23, 1994 4 Application Modelling Most of the scientific applications deal with large multi dimensional floating point arrays. Quoting from [Hennesy90] on page 352 However, big, long running, scientific programs have very large active data sets that are often accessed with low locality, yielding poor performance from the memory hierarchy. The resulting impact is a decrease in cache performance. The stochastic generator will have to generate ....

....an editor, and a text formatter, with six to seven processes of each class amounting to a total of 50 jobs. Static parameters like sizes of data and text segments were taken from real world systems, and dynamic parameters like average number of jumps and their distances were taken from literature [Hennesy90] and profile runs. Every job has its own private data and stack segment, and does not share data with other jobs. All jobs within one application March 23, 1994 DRAFT Simulation Report 27 class share their text segment. The UNIX kernel is simulated by a small process with shared data and text ....

J. L. Hennesy and D. A. Patterson. Computer Architecture A Quantitive Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990.


Matrix Multiplication: A Case Study of Algorithm Engineering - Eiron (1998)   (1 citation)  (Correct)

....properly choose a simple, but sufficiently accurate machine model. The objective is then to define an abstract model such that an algorithm optimized for it will perform well in practice. In our case, we deliberately decide to ignore the effects of the virtual memory subsystem (see, for example, [7]) Specifically, we ignore the paging mechanism, the use of virtual versus physical addresses for cache indexing, and the use of a Translation Look aside Buffer (TLB) to shorten the address translation process. As a consequence, the negative effects of page faults and TLB misses are not taken into ....

J. L. Hennessy and D. A. Patterson, Computer Architecture: a Quantitive Approach, 2 ed., 1996.


Dynamic Reconfiguration of Field Programmable Gate Arrays - Lysaght, Dunlop (1993)   (1 citation)  (Correct)

....temporal locality are interpreted slightly differently from the meaning ascribed to them in conventional von Neumann environments. Nonetheless, overall performance remains dependent on a high cache hit to miss ratio which, in turn, requires that the principle of locality of reference must apply, (Hennessy and Patterson 1990 and Lysaght 1991) 7 The operation of the logic cache optimises the ratio of active logic with respect to silicon area. This is very important for SRAM based FPGAs because of their logic density limitations. Despite the highly regular architectures which make them ideally suited to VLSI ....

....design across device packages with possible reductions in system speeds. Logic caching, if applicable, could reduce the number of packages saving space, power and ultimately money. The search for suitable applications resembles the task of converting programs to parallel processing environments. Hennessy and Patterson (1990) point out that this is not a trivial activity because of the need to address the complex interaction of three elements: these are the application, the algorithm chosen for implementing it and the architecture of the target system. It is a comparatively easy task to produce a new parallel ....

Hennessy, J. and Patterson, D., Computer Architecture A Quantitive Approach, Morgan Kaufmann, California, USA, 1990.


Design and Evaluation of Feature Detectors - Baker (1998)   (2 citations)  (Correct)

....first method is subjective. The second does not use real images. It seems exceedingly unlikely that there is a single, simple method of fairly evaluating a feature detector. Even in a field as mature as computer architecture, 4 there is no universally agreed upon way of measuring performance [93]. Instead, the usual approach is to apply a large number of benchmarks, each of which is designed to be typical of a certain range of applications. The overall performance on the benchmarks is then used to compare the di#erent architectures. In the final part of this thesis, I advocate a similar ....

....results for a large number of other detectors must be made available. It seems exceedingly unlikely that there is a single, simple method of fairly evaluating an edge detector. Even in a field as mature as computer architecture, there is no universally agreed upon way of measuring performance [93]. Instead, the usual approach is to apply a large number of simple benchmark tests, each of which is designed to be typical of a range applications. The overall performance on the benchmarks is then used to compare the architectures. The results on any specific benchmark are not assumed to ....

D.A. Patterson and J.L. Hennessy. Computer Architecture: A Quantitive Approach. Morgan Kaufman, San Mateo, California, 1990.


Efficient Simulation of Caches under Optimal Replacement.. - Sugumar, Abraham (1993)   (35 citations)  (Correct)

....of the 1993 ACM SIGMETRICS Conference. have been proposed to improve cache performance, such as remapping basic blocks in memory [8, 14] blocking algorithms [4] using two or more levels of caching, miss caches [10] and shadow directories [18] Thiebaut and Stone [21] Agarwal [1] Hill [6, 5] and others have proposed models which can explain or classify misses. These models are useful both for gaining insight in developing caching strategies and also for evaluating these strategies. In Thiebaut and Stone s model and in Agarwal s model, expressions for miss rates are derived in terms ....

....be reduced by up to 32 using replacement schemes more sophisticated than LRU for two way setassociative caches. Some other comments and caveats about the miss components follow. Firstly, the compulsory miss component is negligible in most cases and much smaller than that reported for instance in [5]. Since we simulate much longer (or complete) traces, the cold start effects that contribute toward the compulsory miss component are amortized and are negligible. Furthermore, we simulate a single program at a time and do not account for multiprogramming effects [9] Multiprogramming tends to ....

J. L. Hennessy and D. A. Patterson. Computer Architecture --- A Quantitive Approach. Morgan Kaufmann Publishers Inc., 1990.


Efficient Simulation of Multiple Cache Configurations using.. - Sugumar, Abraham (1991)   (5 citations)  (Correct)

....accessed only if the requested memory location is not present in the cache. Since the time to access the cache is usually much lower than the time to access main memory, caches help decrease the effective memory access time. There are excellent discussions on caches in the literature, for instance [13, 4]. In this paper new techniques for the efficient simulation of caches are presented. These techniques are expected to be helpful in the cache design process. An important aspect of cache design at the architectural level is deciding on the three parameters: cache size, line size and degree of ....

....simulate direct mapped caches of the same size but varying line sizes in one pass. Such a collection of cache designs is expected to occur frequently in the design process, because important constraints for physical (and virtual) caches fixes the primary cache size to the virtual memory page size [4]. The algorithm uses a novel inclusion property between tag stores. A method for simulating set associative caches of varying numbers of sets and varying associativities is described by Mattson et al. 9] This algorithm assumes that the set mapping is done by bit selection. This algorithm is ....

J. L. Hennessy and D. A. Patterson. Computer Architecture --- A Quantitive Approach. Morgan Kaufmann Publishers Inc., 1990.


Reduction of Cache Interference Misses through Selective.. - Santosh Abraham (1994)   (1 citation)  (Correct)

....latency of cache misses will be higher in such future processors with higher clock rates and wider issue widths. Therefore, it is important to develop techniques to reduce cache miss rates. Caches are characterized by the following three major parameters: cache size, line size and associativity [5]. In the standard bit selection mapping, the binary representation of an address is divided into three contiguous fields: tag, set, and line fields. The set field is used to index into one of the sets of the cache. The tag field is compared to the tags of the lines within a set. If the tag field ....

....work on characterizing cache misses into various categories as well as strategies for reducing cache misses in each category. We describe in greater detail hardware and software strategies that have been developed to reduce conflict misses. Thiebaut and Stone [14] 19] Agarwal [1] Hill [6] [5] and others have proposed models which can explain or classify misses. These models are useful both for gaining the insight required to develop caching strategies and for evaluating these strategies. In Thiebaut and Stone s model and in Agarwal s model, expressions for miss rates are derived in ....

J. L. Hennessy and D. A. Patterson. Computer Architecture --- A Quantitive Approach. Morgan Kaufmann Publishers Inc., 1990.


New Svoboda-Tung Division - Montalvo, Parhi, Guyot (1998)   (Correct)

....2, and T clk = i 17:5 W 4 j unit delays 5 . In both designs, the term W 4 is due to the fan out of the control signals to all the W adder subtractor cells that control the division operation. Considering W = 53, to correspond to the double precision of the IEEE Std [15] the Speedup [28] of mr over MROR is Spupmr = T div;MROR T div;mr = 1:19, where T div;MROR (1057.50 unit delays) is the computation time of MROR and T div;mr (891.75 unit delays) is the computation time of mr . The Speedup of mr over the other designs has been computed from the data published in [8] ....

J. Hennessy and D. Patterson, Computer Architecture: A Quantitive Approach. San Mateo, CA: Morgan Kaufmann Publishers, 1994.


Mermaid: Modelling and Evaluation Research in MIMD .. - Pimentel, van.. (1995)   (Correct)

....data will result either in a memory request or in a communication request depending on its distribution scheme. This model requires the issue of synchronization to be carefully considered in order to avoid data hazards. These hazards are similar to the data hazards with instruction pipelining [Hennessy90] At generation level, the model translates the SPMD data requests to communication requests within a pre established virtual communication topology. At the level of virtual communication the connectivity of the unconstrained point to point SPMD data request is reduced to just those connections ....

J. L. Hennessy and D. A. Patterson. Computer Architecture A Quantitive Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990.


Transparent Remote Procedure Calls - Abram (1992)   (Correct)

....in ASCII format in one byte, and floating point numbers are converted to IEEE format for double precision floating point numbers. One byte is used to indicate the type of data to follow, and 64 bits are used to indicate the number of data items of this type. Our network standard uses Big Endian [34] byte ordering because of its popularity in UNIX related systems. Through the cooperative activity of all of these modules and services, we cover nearly all of the issues concerning a complete remote procedure call facility. We designed each component of the package to maximize efficiency, ....

D. A. Patterson and J. L. Hennessy. Computer Architecture: A Quantitive Approach. Morgan Kaufmann Publishers, Inc., San Mateo, California, 1990.


DLX Simulator Directed Profiling - Jagannath (1992)   (Correct)

....particular programs under study. DLXProf has a special feature that it is interactive with the program being simulated, but nonintrusive at the same time, unlike other conventional profilers. This has been achieved by using the DLX simulator, a RISC architecture simulator developed at Berkeley [13]. The profiler code has been embedded in the code for the simulator, and so the profile data can be obtained by the simulator during execution, without interfering with the natural execution of the program. The phases of compilation and execution are modified with the profiling added to look as in ....

Hennessey L. and Patterson D. A. Computer Architecture:A Quantitive Approach. Morgan Kaufman, 1990.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC