142 citations found. Retrieving documents...
H. S. Stone. High Performance Computer Architecture. Addison-Wesley 1993 (3rd ed.). ISBN: 0-201-52688-3.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Effects of Cache Mechanism on Wireless Data Access - Yi-Bing Lin Wei-Ru (2003)   (Correct)

....Figure 4) The typical cache size in a wireless terminal is not large. When the cache is full, some cached objects must be removed to accommodate new objects. We consider the least recently used (LRU) replacement policy. This policy is often utilized to manage cache memory in computer architecture [20], virtual memory in operating systems [19] and location tracking in mobile phone networks [10] LRU uses the recent past as an approximation of the near future, and replaces the cached object that has not been used for the longest period of time. LRU associates with each cached object the time of ....

....application server are potentially accessed by a wireless terminal. Although the objects to be accessed vary from time to time, the number N is not significantly larger than the cache size of the wireless terminal. That is, the data access pattern of a wireless terminal exhibits temporal locality [20], which is the tendency for a wireless terminal to access in the near future those data objects referenced in the recent past. Temporal locality may not be observed in wireline Internet access because the desktop users typically navigate through several web sites at the same time. On the other ....

Stone, H. High-Performance Computer Architecture. Addison-Wesley, Reading, Massachusetts, 1990.


Centering, Anaphora Resolution, and Discourse Structure - Walker (1998)   (1 citation)  (Correct)

....boundaries is not determined by boundary type. Finally, section 5 summarizes the discussion and outlines future work. 2 The Cache Model of Attentional State A cache is an easily accessible temporary location used for storing information that is currently being used by a computational procedure [Stone, 1987] . The fundamental idea of the cache model is that the functioning of the cache when processing discourse is analogous to that of a cache when executing a program on a computer. Just as discourses may be structured into goals and subgoals which contribute to achieving the purpose of the discourse, ....

Harold S. Stone. High Performance Computer Architecture. Addison Wesley, 1987.


Toward a Model of the Interaction of Centering with Global.. - Walker   (Correct)

....future work. The cache model is an extension of the AWM model in [Walker, 1993; Jordan and Walker, 1996] 3 2 The Cache Model of Attentional State A cache is an easily accessible temporary location used for storing information that is currently being used by a computational procedure [Stone, 1987] . The fundamental idea of the cache model is that the functioning of the cache when processing discourse is analogous to that of a cache when executing a program on a computer. Just as discourses may be structured into goals and subgoals which contribute to achieving the purpose of the discourse, ....

Harold S. Stone. High Performance Computer Architecture. Addison Wesley, 1987.


Impact of Memory Hierarchy on Program Partitioning and.. - Kaplow, Maniatty.. (1995)   (Correct)

....tectural simulator. There are both hardware and soft ware methods for capturing the reference traces. Once obtained, the they can be fed into an architectural simulator of the cache. The greatest drawback to this approach is that to be effective, traces have to be mil lions of references long [13]. Another problem, espe cially relevant to scientific numerical codes, is that the identity of the program components and structure that generated the address trace is lost, and therefore it is difficult to make conclusions about how to modify the source code to improve performance. Static ....

....for a Loop 2.1 Architectural model and parameters A MIMD, distributed memory architecture is assumed, with message passing communications between the processors. Each processor contains an execution unit and a memory hierarchy that includes at least one level of cache memory. As defined in [13], a cache is the first level of memory closest to the processor. It generally has access times that are commensurate with the instruction cycle time of the processor, and is therefore several times faster than main mem ory access time. Cache is an associatively addressed memory which at any ....

H. S. Stone. High-Performance Computer Architecture. Addison-Wesley, 1990.


Analyzing Concurrent and Fault-Tolerant Software using.. - Ciardo, Muppala, Trivedi (1992)   (8 citations)  (Correct)

....and reliability availability from computer systems. Higher levels of integration and newer techniques in VLSI design have made hardware with high performance and reliability, relatively inexpensive. Software, on the other hand, is becoming a major component in the overall cost of these systems [27]. Often, though, the software poses performance and reliability bottlenecks which should be discovered and eliminated. Improvements in software assessment methods for the design phase of the software life cycle are required to minimize costly redesigns and changes due to unanticipated performance ....

Stone, H. S. High-Performance Computer Architecture. Addison-Wesley, Reading, Massachusetts, 1987. 26


Memory Consistency Models of Bus-Based - Multiprocessors Jalal Kawash   (Correct)

....concurrently with the cache upon a write. This guarantees that the main memory reflects the last write performed in the system. 10 However, a cached value can be out of date . With write back, a cache line is only written back to main memory when it is replaced and only if it has been modified [10, 8]. Write back and write through may be accompanied with two variations or optimizations: write allocate and write once. Write allocate means that a line is read into cache if an attempted write misses the cache. Otherwise, if write allocate is not used, the cache is bypassed and the copy is ....

Stone HS. High-Performance Computer Architecture Addison-Wesley Publishing Company, 1990.


ParC - An Extension of C for Shared Memory Parallel.. - Ben-Asher, Feitelson..   (Correct)

....In such systems blocking is the only reasonable alternative. Recently there is much interest in wait free primitives, which are more suitable to truly parallel systems [68] Examples include various readmodify write instructions such as test and set [69] fetch and add [15] and compare and swap [70]. ParC provides a repertoire of low level primitives, that cover di erent synchronization behaviors, both blocking (semaphores and barrier) and wait free (fetch and add) High level primitives may be added if programming experience indicates that certain constructs are especially useful. ....

H. S. Stone, High-Performance Computer Architecture. Addison-Wesley, 2nd ed. (1990). 28


Memory Controller Policies for DRAM Power Management - Fan, Ellis, Lebeck (2001)   (12 citations)  (Correct)

....benchmarks have similar distributions regardless of the number of memory chips and physical page placement policy. Related studies trying to statistically characterize cache misses have not been successful because the distributions that best characterize the behavior do not have finite variance [8]. For our purposes it is sufficient to approximate the Benchmark Result Compress95 Pass Go Pass Netscape Pass Acroread Fail PowerPoint Fail Winword Fail Table 2: Chi Square Test Results gap distribution. Thus, Figure 2 also plots the exponential distribution with the same mean gap size. ....

Harold S. Stone. High-Performance Computer Architecture, chapter Memory System Design, pages 76--84. Addison Wesley, 1993.


A New Model for the Data Distribution Problem - Loos, Bramley (1998)   (Correct)

....(a) the size and distribution of the input matrices A and M are not known a priori and (b) residual effects such as the contents of the cache and synchronization delays from previous kernel calls are important. The previous cache contents can and do greatly change the cost of memory accesses [Sto90] because of the change in cache hit ratios. For example, the vector copy low level kernel has the ratio t limit =t small = 8:1. The AET simulator assumes the use of a least recently used cache replacement policy with a correction constant for other policies such as the random replacement policy ....

Stone H. S. (1990) High Performance Computer Architecture. Addison-Wesley, Reading, MA, second edition.


A Comparative Performance Study of a Fine-Grain.. - Prasad Kakulavarapu.. (2000)   (Correct)

....While modern processors can issue multiple instructions per cycle, they lack the features required to address fundamental issues in multiprocessing systems: latency, bandwidth and synchronization overheads. A well designed parallel system must balance the trade off between a fine task granularity [9] and the impact of communication latencies on performance. Coarse grain parallel systems can tolerate long School of Computer Science, McGill University, Montreal, Canada, Email: prasad cs.mcgill.ca y Dept. of Electrical and Computer Engineering, CAPSL, University of Delaware, 140, Evans Hall, ....

....systems do not fully exploit the parallelism existing in irregular parallelism. Finegrain parallelism, on the other hand, enables further parallelization of many applications, but has proved to be difficult to support due to the higher relative cost of communication and synchronization latencies [9]. EARTH Efficient Architecture for Running THreads [5, 10] is a multi threaded architecture and program execution model that supports fine grain, non preemptive, lightweight threads, or fibers. EARTH is designed to allow the implementation of a multi threaded execution model with off the shelf ....

Harold S. Stone. High-Performance Computer Architecture. Addison-Wesley Publishing Company, 3rd edition, 1993.


Fault-Tolerant On-Line Adaptation Of Quorum Assignments For.. - Bearden (1998)   (Correct)

....performance by reducing the amount of concurrent processing. Scalability is improved by reducing the occurrence of communication bottlenecks caused by coordination, through design that evenly distributes the required communication. Thorough treatments of the topic of scalability are given in [56,125,153,154]. 1.2.1 Improving Availability and Scalability The following three techniques in distributed systems research can be applied to the design of distributed coordination protocols to improve availability and scalability. Fault tolerant coordination guarantees correct coordination even if some ....

Stone, H. S., High Performance Computer Architecture, Third Edition, Addison-Wesley, New York, 1993, 512 pages.


Scheduling on Heterogeneous Message Passing Parallel.. - Menascé, Porto   (Correct)

.... is measured by the , previously defined, processor power ratio (PPR) ffl The heterogeneity of the application is measured by its intrinsic serial fraction, F s , which is obtained through the same procedure used in calculating T norm [13] ffl The intrinsic communication processing ratio [15], CPR, which is obtained dividing the overall communication demand of the application by the overall computation demand. 4.3 Evaluation Techniques We use an analytical method to obtain the overall execution time for a parallel application submitted to a certain scheduling policy. This analytical ....

Stone, Harold S., High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.


NPSI Adaptive Synchronization Algorithms for Parallel Discrete.. - Srinivasan (1995)   (Correct)

....proposed here. Chapter 6 presents the results of this performance analysis. It is important to note that reduction networks have been proposed and constructed in practice, to support global computations such as barrier synchronization, summation, determining maxima and parallel prefix computation [Ston90, Hosh89, CrKn28, Blel89]. One such network is the control network in the CM 5 [Ponn93] It is used to perform nonlocal data distribution operations such as broadcasting, combining (reduction and parallel prefix) bit wise operations and barrier synchronizations, very rapidly. For bit wise logical OR operations, it can ....

Stone, H., "High performance computer architecture", Addison-Wesley, Reading, MA, 1990 175


Structural Fault Testing of Embedded Cores Using Pipelining - Nourani, Papachristou (1999)   (3 citations)  (Correct)

....distribution of its t ij k cycle activities is another issue. The whole path, as shown in Figure 6 is similar to a pipeline system with N stages in which each stage requires t ij k cycles. The difference, however, with conventional pipelines is in the scheduling method. In conventional pipelining [Ston90], we define the pipeline clock period to be equal to the slowest stage delay and then schedule the activities accordingly. In our problem, we don t want to devise too many registers in the interface between cores to pile up all data packets. Instead, we have to implement an innovative mechanism by ....

....to packetize (serial to parallel or parallel to serial) the test data. For example, 16 bit test data would be dis assembled into four packets (of 4 bit each) in 4 cycles to transfer through Core 1. Three bypass scheduling choices are shown. We used space time table similar to the reservation table [Ston90] in pipelining. Each row corresponds to a core and each column corresponds to a time step. An entry (C1, C2 or C3) in the table shows that the corresponding core is bypassing a packet of data in that cycle. For example, in all three schedules shown in this figure Core 1 bypasses a packet of 4 bit ....

H. Stone, High Performance Computer Architecture, Addison Wesley, 1990.


Parallelism in Structural Fault Testing of Embedded Cores - Nourani And Papachristou (1998)   (1 citation)  (Correct)

....cycles. However, the distribution of its t ij k cycle activities is another issue. The whole path is similar to a pipeline system with N stages in which each stage requires t ij k cycles. The difference, however, with conventional pipelines is in the scheduling method. In conventional pipelining [Ston90], we define the pipeline clock period to be equal to the slowest stage delay and then schedule the activities accordingly. In our problem, we don t want to devise too many registers in the interface between cores to pile up all data packets. Instead, we have to implement an innovative mechanism by ....

....to packetize (serial to parallel or parallel to serial) the test data. For example, 16 bit test data would be dis assembled into four packets (of 4 bit each) in 4 cycles to transfer through Core 1. Three bypass scheduling choices are shown. We used space time table similar to the reservation table [Ston90] in pipelining. Each row corresponds to a core and each column corresponds to a time step. An entry (C1, C2 or C3) in the table shows that the corresponding core is bypassing a packet of data in that cycle. For example, in all three schedules shown in this figure Core 1 bypasses a packet of 4 bit ....

H. Stone, High Performance Computer Architecture, Addison Wesley, 1990.


A Neural Network Based Algorithm for the Scheduling.. - Nourani..   (Correct)

....pipelining. This section presents brief results for six design examples presented in the literature. This is the main parallelizable formula in our algorithm. There are many parallel architectures (such as mesh, hypercube and connection machine) by which this formula can be computed in parallel [14]. The following tables give a summary of the design results produced by our method. In Table 1, we compare our method with some methods found in the literature. We only consider the results for the fifth order elliptical filter in a non pipelined system. It contains 26 additions and 8 ....

Harold S. Stone, High Performance Computer Architecture,


Communications in Multiprocessor Machines - A Survey - Jean-Marie, Mussi, Syska (1994)   (Correct)

....memory [31, 12] distributed memory units are multiply accessed through software. This mechanism may be seen as an extension to the classical virtual memory mechanism used in modern operating systems. 1. 3 Synchronism The various technological choices led to a great diversity in parallel machines [1, 2, 3, 18, 25, 26, 51]. Flynn introduced a classification [20] still authoritative if slightly dated. The classification is based on only two criteria: the type of instruction flow and the type of data flow treated by elementary processors. The flows are either simple or multiple. It is difficult to classify some ....

H. S. Stone. High-Performance Computer Architecture. Addison-Wesley, 1987.


Applying Programming Language Implementation Techniques to.. - Schnarr (2000)   (2 citations)  (Correct)

.... cold start bias were compared: COLD each sample starts with a cold cache, HALF initialize the cache during the first half of each sample and only collect data from the second half, PRIME only simulate the cache on accesses to sets that are initialized by earlier references in the sample [37][70], STITCH reuse the cache state from the end of the previous sample [1] and INITMR estimate the fraction of cold start misses that would have missed even if the cache state were known [80] The results showed that, for the given traces, INITMR was the most effective at reducing cold start ....

H. S. Stone, High-Performance Computer Architecture, second ed., Reading, MA, AddisonWesley, 1990.


Self-Correcting LRU Replacement Policies - Kampe, Stenstrom, Dubois   (Correct)

No context found.

H. S. Stone. High Performance Computer Architecture. Addison-Wesley 1993 (3rd ed.). ISBN: 0-201-52688-3.


Bounding Loop Iterations for Timing Analysis - Healy, Sjödin, Rustagi, Whalley (1998)   (22 citations)  (Correct)

No context found.

H. S. Stone, High-Performance Computer Architecture, Second Edition, Addison Wesley, Reading, MA (1990).


Latency Tolerant Architectures - Bennett (1998)   (2 citations)  (Correct)

No context found.

H.S. Stone. High-performance Computer Architecture. Addison-Wesley, Reading, MA, 1990.


Job Scheduling for the BlueGene/L System - Elie Krevat Jose (2002)   (2 citations)  (Correct)

No context found.

H. S. Stone. High-Performance Computer Architecture. Addison-Wesley, 1993.


A Parallel Algorithm for Solving a Tridiagonal Linear System.. - Ma, Harris, Jr.   (Correct)

No context found.

Harold S. Stone. High Performance Computer Architecture. Addison-Wesley Publishing Company, 3rd edition, 1993.


Trap-driven Memory Simulation - Uhlig (1995)   (2 citations)  (Correct)

No context found.

Stone, H. High-performance Computer Architecture. Reading, Massachusetts, Addison-Wesley, 1993.


Scheduling Parallel Computations in a Heterogeneous Environment - Weissman (1995)   (5 citations)  (Correct)

No context found.

H.S. Stone, High-Performance Computer Architecture, Addison-Wesley Publishing Company, 1987.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC