37 citations found. Retrieving documents...
J. Singh. J. Hennessy and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples", IEEE Computer, June 1993, pp. 42-50.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:
Systematic Approach for Workload Characterization of Parallel.. - Ferscha, al. (1994)   (Correct)

....was performed to derive application specific expressions for the computation and communication time for each phase of the algorithm. Based on these expressions speedup and the scaling behavior were modeled and compared to measurements on a transputer based system and on a SUPRENUM cluster. In [Sing 93] several application specific parameters are used to model the scaling behavior of an application. The influence of the parameters on various scaling models (time constraint, memory constraint) is investigated. 2.1.2 Signatures, Profiles and Shapes A signature is a (graphical) representation of ....

J. P. Singh, J. L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples". IEEE Computer, pp. 42--50, July 1993.


The Forgotten Factor: Facts on Performance Evaluation and its.. - Feitelson (2002)   (2 citations)  (Correct)

....size may unintentionally also lead to scheduling by duration, if there is some statistical correlation between these two job attributes. As it turns out, the question of whether such a correlation exists is not easy to settle. Three application scaling models have been proposed in the literature [30, 23]: Fixed work. This assumes that the work done by a job is xed, and parallelism is used to solve the same problems faster. Therefore the runtime is assumed to be inversely proportional to the degree of parallelism (negative correlation) This model is the basis for Amdahl s law. Fixed time. ....

J. P. Singh, J. L. Hennessy, and A. Gupta, \Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42-50, Jul 1993.


Future Applicability of Bus-Based Shared Memory Multiprocessors - Sundaram, L.Eager   (Correct)

....size are scaled (holding the number of processors fixed) We assume that the more powerful systems of the future will be used to run larger problems, or obtain results of improved accuracy, rather than to simply solve the same problems in less time. Thus, a time constrained scaling approach [14] is adopted, in which application parameters are scaled concurrently with the system parameters in such a way that the execution times before and after scaling the application and the machine are equal. Our focus is on asymptotic results; specifically, as the machine and the application are ....

....the applications that are considered in this work, and the manner by which the parameters of each are scaled. Section 4 presents our scaling analysis and the results of that analysis. Finally, in Section 5, conclusions are presented. 2 Scaling There has been much previous work on scalability [3 8,11 14,17 21], although all of this work focuses on large scale shared memory multiprocessors and message passing machines, and all but [4] is concerned with scaling the system size (number of processors) In contrast, we consider small scale machines, and keep the number of processors fixed while scaling the ....

[Article contains additional citation context not shown here]

Singh, J.P., Hennessy, J.L., and Gupta, A., "Scaling Parallel Programs for Multiprocessors: Methodology and Examples", IEEE Computer, Vol.26, No.7 (July 1993), pp. 42-50.


Job Characteristics of a Production Parallel Scientific.. - Feitelson, Nitzberg (1995)   (75 citations)  (Correct)

....3.4 4 1116.7 4171.5 3.7 8 705.2 2344.3 3.3 16 569.3 1970.9 3.5 32 1305.3 3311.6 2.5 64 2350.8 4155.8 1.8 128 3280.1 4408.1 1.3 system 44.1 438.0 9.9 Table 2. Runtime statistics for jobs with different degrees of parallelism. increased available parallelism. Three models have been proposed [27, 25]: Fixed work. This assumes that the work done by a job is fixed, and parallelism is used to solve the same problems faster. Therefore the runtime is assumed to be inversely proportional to the degree of parallelism. This model is the basis for Amdahl s law [2] Fixed time [10, 11] Here it ....

J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.


Performance Portability and Scalability in Shared-Address-Space.. - Jiang (2000)   (Correct)

....2. PERF. ON A MODERATE SCALE, HW CC SYSTEM 27 enough processor and node to make application performance reasonable, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [81, 71]. We use the three major models, which may each be applicable in di#erent circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the chapter focuses on PC scaling, under which speedup is defined simply ....

....(Time(p) Time(1) From (2. 1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [71]) We must therefore resort to the full expression in (2.1) so Speedup(p) Increase in Work Done Increase in Time Taken, where each increase is a ratio. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to ....

[Article contains additional citation context not shown here]

J. P. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.


Improving the Performance of Shared Virtual Memory on System Area.. - Bilas (1998)   (5 citations)  (Correct)

.... become increasingly complex and irregular as we try to solve more realistic problems and it has also been shown to deliver very good performance when supported in hardware in tightly coupled multiprocessors, at least up to the 64 128 processor scale where experiments have been performed [58, 57, 48, 87, 88, 96, 89]. It is also the programming model of choice for small scale multiprocessors, so provides a graceful migration path. The last of the programming model possibilities (message passing CHAPTER 1. INTRODUCTION 17 everywhere) not only forces programmers to use explicit message passing but also does ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42--50, july 1993.


Improving the Performance of Shared Virtual Memory on System Area.. - Bilas (1998)   (5 citations)  (Correct)

.... become increasingly complex and irregular as we try to solve more realistic problems and it has also been shown to deliver very good performance when supported in hardware in tightly coupled multiprocessors, at least up to the 64 128 processor scale where experiments have been performed [58, 57, 48, 87, 88, 96, 89]. It is also the programming model of choice for small scale multiprocessors, so provides a graceful migration path. The last of the programming model possibilities (message passing CHAPTER 1. INTRODUCTION 17 everywhere) not only forces programmers to use explicit message passing but also does ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. Computer, 26(7):42--50, July 1993.


Communication Optimizations for Irregular Scientific.. - Das, Uysal, Saltz, Hwang (1993)   (103 citations)  (Correct)

....the increase in system size [10] Finally, in time constrained scaling, the application parameters are scaled so that the running time does not change while the system size changes. All of these models have some deficiencies and none of them are universal for all applications as pointed out by [19]. There are a wide range of optimizations designed to improve the performance of problems on multiprocessor architectures. Any given optimization targets a certain class of problems and the effectiveness of the optimization varies with problem characteristics. We believe that it is also important ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, pages 42--50, July 1993.


Advances in the Dataflow Computational Model - Najjary, Lee, Gao (1999)   (1 citation)  (Correct)

....origin or not) globally addressable memory provides a seamless extension of the memory model viewed by threads assigned to the same processing node. For applications with irregular and unpredictable data needs, the lack of global naming capability can hurt the programmability and efficiency [96, 104]. The ability to name memory objects globally will also facilitate the support of dynamic load balancing of threads. As a result, shared address space is adopted by most multithreaded execution models with dataflow origin. Consistency and Replication A memory consistency model represents a ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 6(26):42--50, 1993.


Cascaded Execution: Speeding Up Unparallelized.. - Anderson, Nguyen.. (1998)   (2 citations)  (Correct)

....[9] The original reference data set provided with wave5 is sized inappropriately for the caches on today s machines: the data set processed by each call to PARMVR is less than 300KB. Larger problem sizes provided with the benchmark grow along the time dimension but not in the space dimension [16]. Since the original data set was too small to be representative of problems likely to be run on today s parallel machines, we enlarged the problem by increasing the amount of data accessed in each loop. In the enlarged problem, the amount of data accessed by each loop ranges from 256KB to 17MB. ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. Computer, 26(7):42--50, 1993.


Scalability and Load Imbalance for Domain Decomposition Based.. - Wilders   (Correct)

....is N . Extensive data with regard to timing and parallel performance (on a SP2) are available for linearly scaled problems, i.e. N = N 0 Q (with N 0 = 3200) and Q = 4; 9; 16. As soon as N varies, either in the analysis or in experiments, it is important to take application parameters into account [SHG93]. From the theory of hyperbolic difference schemes it is known that the Courant number is the vital similarity parameter. Therefore, we fix the Courant number. This means that both the spatial grid size h (N = O(1=h 2 ) and the time step vary with =h constant. For linearly scaled problems this ....

Singh J., Hennessy J., and Gupta A. (1993) Scaling parallel programs for multiprocessors: methodology and examples. IEEE Computer July: 42--50.


Conclusions - Scaling Model And (1997)   (Correct)

....Analysis, IEEE Int l Conf. on Application Specific Systems, Architectures, and Processors, 1997, pp. 304 315. 6] D. Royo, M. Valero Garca, and A. Gonzlez, A Jacobi based Algorithm for Computing Symmetric Eigenvalues and Eigenvectors in a Two dimensional Mesh, Research Report UPC DAC 1997 54. [7] J.P. Singh, J.L. Hennessy and A. Gupta, Scaling Parallel Programs for Multiprocessors: Methodology and Examples, IEEE Computer, pp. 42 50, July 1993. 8] J.H. Wilkinson, The Algebraic Eigenvalue Problem, Claredon Press, Oxford, 1965. 8 the observation that A and U require 2n 2 data ....

....into account in the Min operator to compute n L (p) 2.3. 3 Memory constrained scalability analysis In this case, when increasing the number of nodes the user increases the problem size as much as possible without exceeding the amount of memory of the system (memory constrained scaling model [7]) Therefore, the figure of merit F(p) is simply the size of the greatest problem which can be solved in a system of p nodes: Since the problem size must be greater than or equal to G 1 (p) the constraint that limits the scalability of the system is, in this case: 3 Example of application ....

[Article contains additional citation context not shown here]

J.P. Singh, J.L. Hennessy and A. Gupta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," IEEE Computer, pp. 42-50, July 1993.


Job Scheduling in Multiprogrammed Parallel Systems - Feitelson (1997)   (16 citations)  (Correct)

....While memory availability is indeed growing at an amazing rate, so are processor speeds and user requirements. Therefore requiring jobs to fit into available memory forces users to take memory availability into account when designing programs, and might limit the computations that can be performed [536] 27 . Providing virtual memory is increasingly recognized as a necessity [532] Systems that use preemption, on the other hand, cannot afford not to support memory management as well. When PEs are shared among a number of jobs, so is the memory. The more jobs there are, the harder it is to ....

J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.


Memory Usage in the LANL CM-5 Workload - Feitelson (1997)   (5 citations)  (Correct)

....run longer. However, it seems that all these models are over simplified to the point where it is hard to correlate them with measured results. In particular, users configure their applications according to their needs rather than according to the way resources happen to be packaged in the machine [18]. Thus users rarely use all the memory available, on any size partition. It is true, however, that they tend to use more on larger partitions. Finally, we note that modeling the memory usage distribution itself is not easy, because it does not seem to be similar to commonly used analytical ....

J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples". Computer 26(7), pp. 42--50, Jul 1993.


Developing Parallel Real-Time Applications in the Hamlet.. - van Steen, DAM, VOGEL (1993)   (Correct)

....from development of parallel scientific applications in at least two ways: 1) the reason why parallelism is exploited, and (2) how it is exploited. ad. 1: Reasons for exploiting parallelism. An important reason for exploiting parallelism in the case of scientific applications is due to scaling [12]: we simply want to do more in the same time. This means that the volume of data that is to be processed is enlarged, or that the time step is decreased to reduce the computational error. However, the reason for exploiting parallelism in real time applications originates from the demand to meet ....

J.P. Singh, J.L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples." Computer, 26(7):42--50, July 1993.


Performance of a Fully Parallel Sparse Solver - Heath, Raghavan (1996)   (11 citations)  (Correct)

....number of processors varies. Some plausible invariants are ffl total problem size (Amdahl 1967) ffl work per processor (Gustafson 1988) ffl total execution time (Worley 1990) ffl memory per processor (Sun 1993) ffl efficiency (Grama et al. 1993) ffl computational error (e.g. discretization error) (Singh et al. 1993) While a fixed problem size is generally too restrictive, keeping the amount of memory used per processor constant as the number of processors grows often allows the problem to grow at an impractically high rate, since the amount of Parallel Sparse Solver 29 1 2 4 8 16 32 number of processors ....

Singh, J.P., Hennessy, J.L., and Gupta, A. 1993. Scaling parallel programs for multiprocessors: methodology and examples. IEEE Computer, 26(7):42--50.


Interactive Parallel Volume Rendering Using the PVR System - Silva, Lok, Kaufman   (Correct)

....systems. Algorithm researchers have concentrated on the design of highly efficient parallel volume rendering algorithms [SS91, MPS92, MPHK94, Hsu93, CC93, Neu93, NL92, WS93] Their work has been concerned primarily with optimizing performance with respect to constant problem size (CPS) scaling [SHG93] and has largely ignored systems issues. Lately, some work has been done in developing distributed rendering systems that can make use of available parallel resources [ABSS94, AMD94, RLGB94] Unfortunately, this later work has focused on developing machine dependent, self contained applications ....

J. P. Singh, J. Hennesy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 26(7):42--50, 1993.


Data-Parallel Numerical Weather Forecasting - Wolters, Cats, Gustafsson (1995)   (Correct)

....versions of the HIRLAM model with respect to the pure forecast calculations. As explained in section 2 each method has its own characteristics resulting in different spatial and temporal resolutions. This makes a comparison not trivial. A general discussion about this topic can be found in [9]. Our strategy consists of comparing the execution times of the different methods for performing calculations on the same physical area during the same simulated time span with the same accuracy. A first comparison between the gridpoint and spectral model can be based on the tables 1 and 2. In ....

J.P. Singh, J.L. Hennessy, and A. Gupta, Scaling Parallel Programs for Multiprocessors: Methodology and Examples, IEEE Computer, Vol. 26, No. 7, July 1993, 42--50.


Software-Directed Register Deallocation for.. - Lo, Parekh, Eggers, .. (1999)   (5 citations)  (Correct)

.... today s memory hierarchies, as well as those of tomorrow s low cost processors, such as multimedia co processors, and (2) it provides a more appropriate ratio between data set and cache size, modeling programs with larger data sets or data sets with less data locality than those in our benchmarks [19]. We also examined a variety of register file sizes, ranging between 264 and 352, to gauge the sensitivity of the register file management techniques to register size. With more than 352 registers, other processor resources, such as the instruction queues, become performance bottlenecks. At the ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.


A Methodology for User-Oriented Scalability Analysis - Royo, Valero-García.. (1997)   (1 citation)  (Correct)

....operation. In this work, we assume that W depends exclusively on n and, therefore, increasing the problem size is considered equivalent to increasing the input data size. Other authors have pointed out that the input data size is not always the only parameter which determines the problem size [6]. Examples of other parameters which influence the problem size are the numerical accuracy or the interval between timesteps. Anyway, we believe that the proposals of this paper can be easily extended to a more general view of the problem size. We denote by R the size of the main memory per node ....

J.P. Singh, J.L. Hennessy and A. Gupta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," IEEE Computer, pp. 42-50, July 1993.


The Hamlet Design Entry System - An overview of ADL and its.. - van Steen, al. (1994)   (Correct)

....parallelism has received considerable attention from the scientific research community. In many cases, research has focussed on exploiting parallelism for solving problems of increasing size. In particular, solutions have been targeted towards increase of scalability of scientific applications [22]. However, the reasons for exploiting parallelism in real time applications originate from the demand to meet harder timing constraints rather than from scalability issues. Exploiting parallelism in these cases leads to more intricate models by which one can analyze the actual behavior of the ....

J.P. Singh, J.L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples." Computer, 26(7):42--50, July 1993.


Towards Modeling the Performance of a Fast Connected.. - Lumetta.. (1996)   (9 citations)  (Correct)

....size is obviously immense. We choose to prune the input space by using four graphs varying in mesh dimension and edge probability. For each graph, we scale the size of the graph with the number of processors, so that the nodes per processor is held constant, i.e. memory constrained scaling [29]. For each data point, we average execution time for twenty graph instances with the specified degree and edge probability. The result is presented as a normalized rate: millions of nodes processed per second (Mn s) In this section, we explain each of these choices and build a framework for ....

....Memory constrained scaling reflects the desire to use larger machines to solve larger problems, rather than to solve the same problem in less time. However, if the inherent work in the algorithm increases too rapidly with problem size, problems on large machines run too long to be useful [29]. For connected components, however, the work is roughly proportional to the number of nodes in the graph. A connected components algorithm must mark each node, examine each edge, and label each component in the graph. Intuitively, one expects the total work W required for the algorithm to be a ....

[Article contains additional citation context not shown here]

J. P. Singh, J. Hennessy, A. Gupta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," IEEE Computer 26(7), July 1993, pp. 42-50.


Thread Scheduling Cache Locality - Philbin, Edler, Anshus, Douglas, Li (1996)   (23 citations)  (Correct)

....time saved is about 7.9 seconds, close to the actual time saved (7.4 seconds) The crude analysis again does not take into many important details into account. It only offers some implications of where the time was saved from. 4. 4 N body N body is a program that uses the Barnes Hut algorithm [6, 36] to solve a three dimensional N body problem. Each body has a position, mass, and velocity. Every body exerts a gravitational force on every other body. The program computes the motion of the bodies during some number of time steps. For each time step (iteration) the algorithm creates a Barnes Hut ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling Parallel Programs for Multiprocessors: Methodology and Examples. Computer, 26(7):42--50, July 1993.


Workload Modeling for Parallel Processing Systems - Kotsis (1995)   (Correct)

.... independent way (e.g. specifying the communication demands in terms of the type and size of messages to be transferred [Kats 92] and not in terms of the communication times) In performance modeling, domain oriented parameters are mainly used in scalability studies [Azmy 92] Jako 93] Sing 93] but also in load balancing [Nico 89] or in providing assistance in parallelization of sequential programs [Fahr 93] CHAPTER 3. SURVEY AND COMPARISON 64 The problem size for the assignment problem is given by the dimension of the matrix. In the examples in this chapter, a problem size of 100 ....

J. P. Singh, J. L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples". IEEE Computer, pp. 42--50, July 1993.


A Metric for Parallel Poly-Algorithm Design - Luke, Banicescu, Li   (Correct)

....computational architecture and processors used. Defining work in terms of minimum number of operations avoids the tedious and often difficult measurement of the best sequential execution time. The scaling path was introduced in these definitions to address the concerns of Singh et al. [12]. They have shown that a naive scaling of problem size in N body simulations yielded incorrect scaling analysis. In the case of N body simulations, preserving accuracy while scaling problem size was suggested as a more realistic approach to scaling. In this paper it is assumed that the choice of ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling Parallel Programs for Multiprocessors: Methodology and Examples. In Computer, pages 42--50, July 1993.


The Optimal Effectiveness Metric for Parallel Application.. - Luke, Banicescu, Li   (Correct)

....computational architecture and processors used. Defining work in terms of minimum number of operations avoids the tedious and often difficult measurement of the best sequential execution time. The scaling path was introduced in these definitions to address the concerns of Singh et al. [17]. They have shown that a naive scaling of problem size in N body simulations yielded incorrect scaling analysis. In the case of N body simulations, preserving accuracy while scaling problem size was suggested as a more realistic approach to scaling. In this paper it is assumed that the choice of ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling Parallel Programs for Multiprocessors: Methodology and Examples. In Computer, pages 42--50, July 1993.


Portable and Efficient Parallel Computing Using the BSP .. - Goudreau, Lang, Rao.. (1998)   (7 citations)  (Correct)

....the problem to the number of processors. On the other hand, it can be argued that for many problems increasing the input size to the point of efficiency will eventually become unrealistic as the number of processors increases, due to the resulting increase in the overall execution time (e.g. see [56]) Underlying this argument, however, is an assumption that as we increase the number of processors, the power of each individual processor stays the same. It is important to realize that as the speed and memory size of today s processors continue to increase rapidly, we will be able to run larger ....

J. P. Singh, J. L. Hennessy, and A. Gupta, "Scaling parallel programs for multiprocessors: Methodology and examples," Computer, vol. 26, no. 7, pp. 42--50, July 1993.


Software-Directed Register Deallocation for.. - Lo, Parekh, Eggers, .. (1997)   (5 citations)  (Correct)

.... today s memory hierarchies, as well as those of tomorrow s low cost processors, such as multimedia co processors, and (2) it provides a more appropriate ratio between data set and cache size, modeling programs with larger data sets or data sets with less data locality than those in our benchmarks [19]. We also examined a variety of register file sizes, ranging between 264 and 352, to gauge the sensitivity of the register file management techniques to register size. With more than 352 registers, other processor resources, such as the instruction queues, become performance bottlenecks. At the ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.


Cascaded Execution: Speeding Up Unparallelized.. - Anderson, Nguyen.. (1998)   (2 citations)  (Correct)

....[11] The original reference data set provided with wave5 is sized inappropriately for the caches on today s machines: the data set processed by each call to PARMVR is less than 300 KB. Larger problem sizes provided with the benchmark grow along the time dimension but not in the space dimension [24]. Since the original data set was too small to be representative of problems likely to be run on today s parallel machines, we enlarged the problem by increasing the amount of data accessed in each loop. In the enlarged problem, the amount of data accessed by each loop ranges from 256 KB to 17 MB. ....

J.P. Singh, J.L. Hennessy, and A. Gupta. Scaling Parallel Programs for Multiprocessors: Methodology and Examples. Computer, 26(7), 1993.


Tuning Compiler Optimizations for Simultaneous.. - Lo, Eggers, Levy.. (1997)   (2 citations)  (Correct)

....parameters. When there is a choice of values, the first (the more aggressive) represents a forecast for an SMT implementation roughly three years in the future and is used in all experiments. The second set is more typical of today s memory subsystems and is used to emulate larger data set sizes [29]; it is used in the tiling studies only. Table 2. We model the cache behavior, as well as bank and bus contention. Two TLB sizes were used for the loop distribution experiments (48 and 128 entries) to illustrate how the performance of loop distribution policies is sensitive to TLB size. The ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.


Software-Directed Register Deallocation for.. - Lo, Parekh, Eggers, ..   (5 citations)  (Correct)

.... today s memory hierarchies, as well as those of tomorrow s low cost processors, such as multimedia coprocessors, and (2) it provides a more appropriate ratio between data set and cache size, modeling programs with larger data sets or data sets with less data locality than those in our benchmarks [21]. We also examined a variety of register file sizes, ranging between 264 and 352, to gauge the sensitivity of the register file management techniques to register size. With more than 352 registers, other processor resources, such as the instruction queues, become performance bottlenecks. At the ....

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.


The Myth of Scalable High Performance - Alpern, Carter (1995)   (4 citations)  (Correct)

....(or, at least, be bounded away from zero) as the number of processors increases. However, parallel efficiency cannot remain constant (as P gets very large) on a fix sized problem instance. Some convention must be adopted that tells how to increase problem size with the number of processors [SHG93]. The particular convention chosen can make the make the difference between scalability and non scalability. To get around this problem, Hockney [H91] advocates replacing two dimensional scalability graphs (speed versus P ) by three dimensional performance landscapes of speed as a function of ....

Singh, J. P, J. L. Hennessy, and A. Goopta, "Scaling Parallel Programs for Multiprocessors: Methodology and Examples," Computer, Vol. 26, No. 7, pp 42--50 (July, 1993).


A Methodology and an Evaluation of the SGI Origin2000 - Jiang, Singh   (3 citations)  Self-citation (Singh)   (Correct)

....usually measured as speedup over the best uniprocessor execution. Origin has a fast enough processor and node, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [17, 14]. We use the three major models, which may each be applica ble in di#erent circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the paper focuses on PC scaling, under which speedup is defined simply ....

.... (Time(p) Time(1) From (1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [14]) We must therefore resort to the full expression in (1) so Speedup(p) Increase in Work Done Increase in Time Taken. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to overheads of parallelism, and if there ....

[Article contains additional citation context not shown here]

J. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.


A Methodology and an Evaluation of the SGI Origin2000 - Dongming Jiang   (3 citations)  Self-citation (Singh)   (Correct)

....usually measured as speedup over the best uniprocessor execution. Origin has a fast enough processor and node, and our goal is not to compare with the absolute performance of other processors. As the number of processors changes, we need a scaling model under which to scale the problem size [17, 14]. We use the three major models, which may each be applicable in different circumstances: problem constrained (PC) or constant problem size scaling, time constrained (TC) scaling, and memory constrained (MC) scaling. Most of the paper focuses on PC scaling, under which speedup is defined simply ....

.... (Time(p) Time(1) From (1) speedup can be measured as the increase in the useful work done during that fixed execution time Speedup(p) Work(p) Work(1) ffl Memory constrained scaling(MC) Here neither work nor time remains fixed (in fact, both can increase dramatically as discussed in [14]) We must therefore resort to the full expression in (1) so Speedup(p) Increase in Work Done Increase in Time Taken. For the memory constrained scaling expression, if the increase in execution time were only due to the increase in work and not due to overheads of parallelism, and if there ....

[Article contains additional citation context not shown here]

J. Singh, A. Gupta, and J. Hennessy. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 1994.


Measuring and Analyzing Parallel Computing Scalability - Xiaodong Zhang Yong   (Correct)

No context found.

J. Singh. J. Hennessy and A. Gupta, "Scaling parallel programs for multiprocessors: methodology and examples", IEEE Computer, June 1993, pp. 42-50.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

J. P. Singh, J. L. Hennessy, and A. Gupta. Scaling parallel programs for multiprocessors: Methodology and examples. IEEE Computer, 27(7):42--50, July 1993.


Computer-aided Support for Designing Parallel Real-Time.. - van Steen, Dam, Vogel (1994)   (Correct)

No context found.

J.P. Singh, J.L. Hennessy, and A. Gupta. "Scaling Parallel Programs for Multiprocessors: Methodology and Examples. " Computer, 26(7):42--50, July 1993.

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC