156 citations found. Retrieving documents...
Lowney, P.G., Freudenberger, S.M., Karzes, T.J., "The Multiflow Trace Scheduling Compiler ", The Journal of Supercomputing, vol. 7, number 1-2, pp. 51-142, 1993

 Home/Search   Document Details and Download   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents  Next 50

Microarchitectural Trade-offs in the Design of a.. - Balasubramonian..   (Correct)

....the drop in IPC is only 8 , when it had been 26 without the proposed mechanisms in place. This degradation can now more easily be overcome by the much faster clock expected with the 16 cluster design. 7 Related Work The idea of clustering has been around for a while. The Multiflow architecture [18] and the Limited Connectivity VLIW [9] were among the earlier works that distributed the register file among groups of ALUs. Farkas et al. [13] deal with a dynamically scheduled processor and also distribute the issue queue across the clusters. They do a compile time assignment of instructions to ....

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. Journal of Supercomputing, 7(1-2):51--142, May 1993.


Compiler Support for Scalable and Efficient Memory Systems - Barua, Lee, Amarasinghe.. (2001)   (2 citations)  (Correct)

....many kinds of memory disambiguation. Most of them are unrelated to bank disambiguation, which is concerned with the location of a reference. Rather, they are usually concerned with the dependence relation between references. Disambiguation of this type includes relative memory disambiguation [18], run time disambiguation [23] dynamic memory disambiguation [8, 12] and affine memory disambiguation [2, 4, 19, 34] 7 Conclusion This paper presents Maps, a memory system for bank exposed architectures. Maps provides memory parallelism through a compiler managed set of decentralized memory ....

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51--142, Jan. 1993.


Non-Local Instruction Scheduling with Limited Code Growth - Keith Cooper Philip (1998)   (6 citations)  (Correct)

....Restrictions on moving operations between basic blocks are typically encoded in the dpg for the sequence. The first automated global scheduling technique was trace scheduling, originally described by Fisher [8] The technique has been used successfully in several research and industrial compilers [7, 17]. In trace scheduling, the most frequently executed acyclic path through the function is determined using profile information. This trace is treated like a large basic block. A dpg is created for the trace, and the trace is scheduled using a list scheduler. Restrictions on inter block code ....

P. Geoffrey Lowney, Stephen M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, Robert P. Nix, J. S. O'Donnell, and J. C. Ruttenburg. The Multiflow trace scheduling compiler. Journal of Supercomputing -- Special Issue, 7:51--142, July 1993.


Predicate Prediction for Efficient Out-of-order Execution - Weihaw Chuang Brad (2003)   (2 citations)  (Correct)

....profitability, and when selected uses internal CMOV (selects) to choose the results of the executed path (qualified true) Our prior work [4] examined a light weight ISA extension targeted at if conversion, called Phi predication. Phi prediction is derived from select predication first seen in [14] with features for qualifying memory and predicate assignments, to increase the applicable control flow regions. Select predication always assigns its register destination, and behaves like regular RISC operations, thereby avoiding the multiple definition problem for out of order execution. This ....

P.G.Lowney,S.M.Freudenberger,T.J.Karzes,W.D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.


Non-Local Instruction Scheduling with Limited Code Growth - Cooper, Schielke (1998)   (6 citations)  (Correct)

....Restrictions on moving operations between basic blocks are typically encoded in the dpg for the sequence. The first automated global scheduling technique was trace scheduling, originally described by Fisher [8] The technique has been used successfully in several research and industrial compilers [7, 18]. In trace scheduling, the most frequently executed acyclic path through the function is determined using profile information. This trace is treated like a large basic block. A dpg is created for the trace, and the trace is scheduled using a list scheduler. Restrictions on interblock code motion ....

P. Geo#rey Lowney, Stephen M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, Robert P. Nix, J. S. O'Donnell, and J. C. Ruttenburg. The Multiflow trace scheduling compiler. Journal of Supercomputing -- Special Issue, 7:51--142, July 1993.


Meld Scheduling: A Technique for Relaxing Scheduling.. - Abraham, Kathail, Deitrich (1998)   (1 citation)  (Correct)

....length. Thus, the superscalar meld scheduler does not cause any increase in schedule length while attempting to reduce stalls due to inter region dangles. 5 Related work There is a substantial body of work in the area of instruction scheduling for instruction level parallel (ILP) machines [1, 2, 4 7, 14, 15] . Most of the work, however, is directed at two related areas. The first area is the type of scheduling region, e.g. trace, superblock, hyperblock, general DAG, innermost loops. The motivation here is either to enlarge the scope of scheduling or to simplify compiler engineering. The second area ....

....will not affect the performance significantly. In contrast, both the Multiflow Trace machine and Cydra 5 relied on their respective compilers to manage all resources and latencies. Cydra 5 did have a latency stalling mechanism but only for memory operations. Consequently, the Multiflow compiler [2, 3, 14] and the Cydra 5 compiler [15] used some form of meld scheduling to ensure correctness and to get good performance. However, there is no evaluation of the benefits provided by meld scheduling over simple minded approaches such as padding. This paper generalizes the technique and quantifies the ....

[Article contains additional citation context not shown here]

G. Lowney, S. Freudenberger, T. Karzes, W. D. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg, "The Multiflow Trace Scheduling Compiler," The Journal of Supecomputing, vol. 7, pp. 51-142, 1993.


Bitwidth Cognizant Architecture Synthesis of.. - Mahlke.. (2001)   (3 citations)  (Correct)

....the set of operations and the set of FUs into subsets before scheduling, and constraining the scheduler to bind operations to FUs of the same cluster. Operation clustering has traditionally addressed the problem of compiling programs for predefined hardware clusters of FUs and register files [8]. PICO balances the competing costs of supporting operation width and operation type by width clustering. In width clustering, the set of operations is first partitioned into subsets having similar type or similar width. After operation clusters are formed, FUs are allocated separately for each ....

....operations. As the hardware width is varied, bitwidth information on operations allows the system to determine the precise number of computational steps required for each operation. Scheduling within clusters has been used for VLIW architectures that are implemented as separate physical clusters [8] [27] 28] 29] 30] These clustering heuristics are aimed at compilation for predefined VLIW architectures that have partitioned FUs and register files. In these machines, inter cluster communication is costly and may require the insertion of inter cluster copy operations. The goal is to ....

P. Lowney et al., "The Multiflow Trace scheduling compiler," The Journal of Supercomputing, vol. 7, pp. 51--142, Jan. 1993.


Exploiting Fine-Grain Thread Level Parallelism on.. - Keckler, Dally.. (1998)   (14 citations)  (Correct)

....in different parts of the program than outer loop parallelism, and that the granularity of the inner loop tasks is substantially smaller than outer loop tasks. 4. 1 Benchmarks The applications in this study are compiled using MMCC, the MAP C compiler, a derivative of the Multiflow C compiler [9]. The compiler is able to compile a sequential program across all three arithmetic clusters. However, for the experiments reported in this paper, MMCC produces sequential single cluster code, using all three execution units within a cluster as a 3 instruction wide statically scheduled machine. ....

LOWNEY, P. G., FREUDENBERGER, S. G., KARZES, T. J., LIGHTENSTEIN, W. D., NIX, R. P., O'DONNELL, J. S., AND RUTTENBERG, J. C. The multiflow trace scheduling compiler. The Journal of Supercomputing 7, 1-2 (May 1993), 51-142.


Acceleration of First and Higher Order Recurrences on.. - Schlansker, Kathail (1993)   (4 citations)  (Correct)

....2.1 Interleaved Reduction We first describe an interleaved reduction method which is useful in the associative reduction of a number of terms to a scalar. It has been implemented within the Cydra 5 compiler (see [18] in which the method is called riffled reduction) and within the Trace compiler [12]. Consider the pseudo code of Table 1 which shows the original code for a reduction to scalar loop side by side with the interleaved reduction code. Table 1: a) Original reduction, b) Interleaved reduction enddo = s[1] s[1] s in ,s[2] 0,s[3] 0, L s[b] 0 s = s[b] a i enddo = ....

G. Lowney, et al. The Multiflow Trace Scheduling Compiler. The Journal of Supecomputing 7, 1/2 (1993), 51-142.


Cluster Assignment and Instruction Scheduling for Partitioned.. - He   (Correct)

....working towards leaves (representing input values) at each node the best functional unit available at the time is chosen. The search is guided by the latency weighted depth of the nodes, so that a critical path of the computation is always searched first. The Multiflow Trace Scheduling Compiler [17] studied the ine#ectiveness of bug on highly parallel code and proposed a revised algorithm. The algorithm first partitions code into components, each of which contains relatively little parallelism and a relatively large amount of shared data. It then creates a partitioning of the components into ....

....ordering the list of clusters and showed that uas outperforms bug. Chapter 3 Scope of Instruction A good instruction scheduler for partitioned register set machines must expose sufficient instruction level parallelism to e#ectively utilize the parallel hardware. As pointed out by the literature [11, 12, 17] only severely limited parallelism exists within basic blocks. To keep wide machines busy, we need to find more ilp by looking across basic block boundaries. To select a suitable scheduling scope for our instruction scheduler, we investigated five scheduling methods with di#erent scopes: basic ....

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. The Journal of Supercomputing, 7:51--142, 1993.


Retrospective: - Software Pipelining An   (Correct)

....chips was, of course, just a milestone along the road to singlechip ILP processors. Today, all modern general purpose machines employ ILP and instruction scheduling is needed in all optimizing compilers. 2. CONTRIBUTIONS OF THE PAPER At the time the paper was written, trace scheduling[5] was considered to be the technique of choice for scheduling VLIW (Very Long Instruction Word) machines. This paper establishes software pipelining as a useful static scheduling technique for VLIW processors without requiring specialized architectural support. This paper has three major results. ....

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, Nix R. P., J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow trace scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, 1993.


Phi-Predication for Light-Weight If-Conversion - Chuang, Calder, Ferrante (2003)   (Correct)

....out of order machines with predication. the instruction types are predicated, yet we are still able to efficiently perform if conversion on complex control flow. Our predicate ISA is derived from the predicated mechanisms used in the Multiflow architecture, which used a form of select prediction [14, 5]. It is called Phi predication because we essentially use the same insertion location as # functions found by Static Single Assignment (SSA) 6] The overriding principle for the design of our Phipredication ISA was to make register writing instructions always write a value to the destination ....

....out of order IA64 processor. 2.1 Early Predicated ISA Some of the earliest forms of predication are found in the Multiflow and Cydra machines. These machines focused on fast in order, high ILP execution, as compared to our main interest in out of order execution. The Multiflow 200 and 300 series [14] implements only a select operation, while the 500 series [5] also implements conditional store and floatingpoint instructions. A select operation instruction takes two data inputs registers, an input selector operand that chooses between the two, and always writes to an output destination 2 ....

[Article contains additional citation context not shown here]

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.


Reducing the Complexity of the Register File in.. - Balasubramonian.. (2001)   (4 citations)  (Correct)

....in detail by Moudgill et al. [18] Wallace and Bagherzadeh [27] and Monreal et al. [17] propose delaying the allocation of registers until the time to actually write the value, thereby improving its utilization. Partitioned non hierarchical register file organizations have been proposed in the past [1, 4, 5, 8, 12, 15, 21]. These organizations have clusters of functional units, with each cluster having its own private register file. While these organizations reduce porting requirements per cluster, they still provide dedicated ports per functional unit, and they incur additional latency (in extra cycles) when ....

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. Journal of Supercomputing, 7(1-2):51--142, May 1993.


Energy Estimation and Optimization of Embedded.. - Bona, Sami..   (4 citations)  (Correct)

....(i.e. from 4 to 16 instructions issued per cycle) Lx comes with a complete software tool chain, where no visible changes are exposed to the programmer when the core is scaled and customized. The tool chain includes a sophisticated ILP compiler technology (derived from the Multiflow compiler [18]) and GNU tools and libraries. The Multiflow compiler includes both traditional high level optimization algorithms and aggressive code motion technology based on trace scheduling. A mix of synthesizable RTL and gate level netlist of the core processor has been used to perform the ....

P. Geo#rey Lowney, Stefan M. Freudenberger, Thomas J. Karzes, W. D. Lichtenstein, Robert P. Nix, John S. O'Donnell, and John C. Ruttenberg, "The Multiflow Trace Scheduling compiler," The Journal of Supercomputing, vol. 7, no. 1-2, pp. 51--142, 1993.


Computing Along the Critical Path - Tullsen, Calder (1998)   (1 citation)  (Correct)

.... compiler and processor optimizations, as shown in [2, 5, 25] Traditionally, critical path reduction optimizations have been done through a dynamic analysis of the control flow of a program [3] followed by a static analysis of the data dependences through a single high probability path or trace [14, 7, 19]. The prior work concentrates on finding and optimizing the most popular control trace path through the program, found using either edge or path profiling. In contrast, our approach concentrates on finding and optimizing the critical data paths through the complete execution of a program taking ....

P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. ODonnell, and J.C. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 7(1-2):51--142, May 1993.


Lx: A Technology Platform for Customizable VLIW.. - Faraboschi, Brown, .. (2000)   (41 citations)  (Correct)

....to be competitive with other 32 bit embedded platforms. Lx comes with a commercial software toolchain, where no visible changes are exposed to the programmer when the core is scaled and customized. The toolchain includes sophisticated ILP compiler technology (derived from the Multiflow compiler [7]) coupled with widely accepted GNU tools and libraries. The Multiflow compiler includes most traditional high level optimizations algorithms and aggressive code motion technology based on Trace Scheduling [5] It is considered one of the most optimized ILP compilers commercially available and is ....

Lowney, P. G. et al. (1993). "The Multiflow Trace Scheduling Compiler". The Journal of Supercomputing, 7(1/2):51-142.


Space-Time Scheduling of Instruction-Level.. - Lee, Barua.. (1998)   (46 citations)  (Correct)

....exploit more parallelism and thus require even more resources, the cracks in the view of a monolithic underlying processor can no longer be concealed. An early visible effect of the scalability problem in commercial architectures is apparent in the clustered organization of the Multiflow computer [19]. More recently, the Alpha 21264 [14] duplicates its register file to provide the requisite number of ports at a reasonable clock speed. As the amount of on chip processor resources continues to increase, the pressure toward this type of non uniform spatial structure will continue to mount. ....

....a hierarchy on the organization of hardware resources [22] A processor can be composed from replicated processing units whose pipelines are coupled together at the register level so that they can exploit ILP cooperatively. The VLIW Multiflow TRACE machine is a machine which adopts such a solution [19]. On the other hand, its main motivation for this organization is to provide enough register ports. Communication between clusters are performed via global busses, which in modern and future generation technology would severely degrade the clock speed of the machine. This problem points to the ....

[Article contains additional citation context not shown here]

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51--142, Jan. 1993.


Efficient Backtracking Instruction Schedulers - Abraham, Meleis, Baev (2000)   (Correct)

....within a basic block is limited and not sufficient for modern EPIC processors. Global schedulers use a larger scheduling region and perform code motion between basic blocks. The trace scheduler constructs a trace consisting of a linear chain of basic blocks with multiple entries and exits [5, 12]. Global schedulers restrict the scheduling regions to reduce the complexity of inserting compensation code in the side entries exits due to code motion [13, 14] The schedulers described in this paper use the superblock [15] and hyperblock [16] as the global scheduling region. Wavefront ....

P.G. Lowney, et al., "The Multiflow trace scheduling compiler," J. Supercomputing, vol. 7, no. 1, pp. 51-142, May 1993.


Compiling Regular Computations to Fine-Grained Linear Processor.. - Cronquist   (Correct)

....technique designed to produce efficient, compact object code for inner loops on VLIW machines. Parallelism is extracted by finding the maximum overlap of loop iterations that does not violate resource and dependence constraints. Another popular instruction level technique is trace scheduling [6, 8, 18]. Extra parallelism is found by coalescing multiple basic blocks along most frequently used paths as determined by the programmer or profiler. Given a directed acyclic graph of basic blocks, the most likely to be executed path, or trace, is merged into one super basic block. By applying standard ....

....with the extensions of space time mapping to handle dynamic control by Xue and Lengauer [32] handling software pipelined conditional branches becomes an interesting area for investigation. 7 Trace Scheduling Trace Scheduling is a instruction level compilation technique for VLIW machines [6, 8, 18]. The basic idea is to increase opportunity for parallelism by coalescing multiple basic blocks along most frequently used paths as determined by the programmer or a profiler. Given a directed acyclic graph (DAG) of basic blocks, the most likely to be executed path, or trace, is merged into one ....

[Article contains additional citation context not shown here]

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. O'Donnell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 7(1,2):51--142, 1993.


Maps: A Compiler-Managed Memory System for Raw Machines - Barua, Lee, Amarasinghe.. (1998)   (17 citations)  (Correct)

....and dependence inheritance; e) one possible outcome after partitioning. for i = 0 to 99 step 4 do A[ i 0] endfor A[ i 1] A[ i 2] A[ i 3] c) b) a) for i = 0 to 99 do A[ i ] endfor A[0] A[4] A[8] A[1] A[5] A[9] A[2] A[6] A[10] . A[3] A[7] A[11] Tile 0 Tile 1 Tile 2 Tile 3 Unrolling Modulo Figure 6: Example of modulo unrolling. a) shows the original code; b) shows the distribution of array A on a 4 processor Raw machine; c) shows the code after unrolling. After unrolling, each access refers to ....

....than point to point networks for communication. The lack of point to point VLIWs seems to explain the dearth of work on memory bank disambiguation for compiling for VLIWs. A different type of memory disambiguation is relevant on the more typical bus based VLIW machines such as the Multiflow Trace [10]. Relative memory disambiguation [10] aims to discover whether two memory accesses never refer to the same memory location. Successful disambiguation implies that accesses can be executed in parallel. Hence, relative memory disambiguation is more closely linked to dependence and pointer analysis ....

[Article contains additional citation context not shown here]

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. In Journal of Supercomputing, pages 51--142, Jan. 1993.


Lifetime-sensitive Modulo Scheduling in a Production.. - Llosa, Ayguade.. (2001)   (4 citations)  (Correct)

....schedule them. In total is about twice as fast as the two other schedulers. 6. SMS in a production compiler In this section we describe an industrial implementation of SMS in the Equator Technologies, Inc. ETI) optimizing compiler (introduced in [8] ETI is a descendent of Multiflow Computer [26], Inc. that produces a family of VLIW processors for digital consumer products. 6.1. Target architecture ETI s MAP1000 processor is the target architecture used here. It is the first implementation of ETI s series of Media Accelerated Processors (MAP) The experiments were executed on a ....

P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, J.C. Ruttenberg. The Multiflow trace scheduling compiler. Journal of Supercomputing, 7(1/2):51-142, 1993.


CARS: A New Code Generation Framework for Clustered ILP.. - Kailas, Ebcioglu..   (13 citations)  (Correct)

....contrast, CARS performs global register allocation in a single CARScheduling pass using pre computed use count of DEFs and tries to prevent the mapping mismatches. Cluster Scheduling: Pioneering work in code generation for clustered VLIW processors is done by Ellis [13] The Multiflow compiler [33] performs cluster assignment using a modified version of the Bottom Up Greedy (BUG) algorithm proposed by Ellis in a number of steps and then performs register allocation and instruction scheduling in a combined manner. Desoli s Partial Component Clustering (PCC) algorithm [9] for clustered VLIW ....

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.


CARS: A New Code Generation Framework for Clustered ILP.. - Kailas, Ebcioglu..   (13 citations)  (Correct)

....global register allocation in a single carscheduling pass using pre computed use count of DEFs. Moreover, cars by design tries to prevent the mapping mismatches. Cluster Scheduling: Pioneering work in code generation for clustered VLIW processors is done by Ellis [27] The Multiflow compiler [42] performs cluster assignment using a modified version of the Bottom Up Greedy (BUG) algorithm proposed by Ellis in a number of steps and then performs register allocation and instruction scheduling in a combined manner. Desoli s Partial Component Clustering (PCC) algorithm [43] for clustered VLIW ....

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg, "The Multiflow Trace Scheduling compiler," The Journal of Supercomputing, vol. 7, pp. 51--142, May 1993.


Source Level Static Branch Prediction - Wong (1999)   (Correct)

....done only at runtime. This is undesirable because advanced compilers, especially those that attempt optimizations including some form of code motion and scheduling, global register allocation, inlining etc. need branch prediction information to achieve good results (see for example Lowney et al. [13]) The CRISP compiler [1] was among the first compiler to perform static branch prediction based on the source code. It detected loop based branches and by using previously gathered data, predicted the direction of branches based on the comparison operator and 1 This family of strategies is ....

....as taken . ffl Random. Here, the if and the else branches are given a 50 50 chance of being predicted as taken . The standard Unix random number generator drand48 was used for the generation of the prediction probability. This is precisely the strategy used in a trace scheduling compiler [13] which requires branch prediction at compiletime to perform interprocedural code optimizations. However, there is no data on 4 the effectiveness of this heuristic. ffl Heuristic S. This is based on a scoring system. Both the if and the else branches are examined as follows: if the ....

[Article contains additional citation context not shown here]

P.G.Lowney,S.M.Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donell, and J. C. Ruttenberg. (1993) The MultiflowTrace Scheduling Compiler. J. of Supercomputing. 7-1. 51-142.


A New Framework for Integrated Global Local Scheduling - Mantripragada, Jain, Dehnert (1998)   (7 citations)  (Correct)

....Section 4 describes the IGLS framework in detail. Section 5 presents the OOO issues and the adjustments made. Section 6 describes the compile speed issues that were addressed. Finally, Section 7 presents the experimental results and Section 8 summarizes this work. 2 Related Work Trace scheduling [7, 2] starts by picking the main (or the most frequently executed) trace in an acyclic flow graph. Operations are then allowed to move past branches or other independent operations within the trace. Compensation copies are inserted wherever necessary to maintain the original program order. As a result, ....

P. G. Lowney and et al. The Multiflow Trace Scheduling Compiler. Journal of Supercomputing, pages 51--142, 1993.


Data Dependence Analysis of Assembly Code - Amme, Braun, Zehendner, Thomasset   (5 citations)  (Correct)

....register, the expression is simplified using rules of algebra, and two expressions are compared using the GCD test. The method is implemented in the Bulldog compiler, but it works on an intermediate level close to high level language. Other authors were inspired by Ellis, e.g. Lowney et al. [13], Bockle [4] and Ebcioglu et al. 15] The approach presented by Ebcioglu is implemented in the Chameleon compiler [16] and works on assembly code. First, a procedure is transformed into SSA form [5] and loops are normalized. For gathering possible register values the same Procedure Name LOC ....

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The multiflow trace scheduling compiler. The Journal of Supercomputing, 7:51--142, 1993.


Dynamic Prediction of Critical Path Instructions - Tune, Liang, Tullsen, Calder (2001)   (20 citations)  (Correct)

....look for clues in the pipeline, such as those discussed here. 3 Related Work Compiler based critical path reduction optimizations have used dynamic analysis of the control flow of a program [3] followed by a static analysis of the data dependences through a single high probability path or trace [20, 8, 24]. The prior work in compiler based optimization concentrates on finding the most popular control trace path through the program, using either edge or path profiling. Static profiles assume a certain popular control path based on the training inputs or other heuristics, and cannot account for ....

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. ODonnell, and J. Ruttenberg. The multiflow trace scheduling compiler. Journal of Supercomputing, 7(1-2):51--142, May 1993.


Modulo Scheduling, Machine Representations, and.. - Eichenberger (1997)   (Correct)

....This dissertation focuses primarily on enhancing the first two components. To efficiently utilize the machine resources of a microprocessor, high performance compilers rely on precisely detailed machine models that account for the resources used by the operations of a schedule [19] 27] 42] 45][59][78] Precise modeling of machine resources is critical to avoid resource contentions that may stall some of the pipelines or, in the absence of hardware interlocks, corrupt some of the results. Efficiently modeling the machine resources is important since high performance compilers spend a ....

....machine resources are efficiently modeled without restricting the functionality of the scheduling algorithms. In particular, our approach effectively supports schedulers that achieve high performance by using a backtracking mechanism that reverses a limited number of previous scheduling decisions [59], by using software pipelining to overlap the execution of consecutive loop iterations [48] 55] 57] 81] or a combination of both [27] 49] 78] 83] Thus, by using our reduced machine description, the computational requirements of these scheduling algorithms, and other similar algorithms, should ....

[Article contains additional citation context not shown here]

G. P. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow trace scheduling compiler. In The Journal of Supercomputing, volume 7, pages 51--142, 1993.


Feedback directed optimization in Compaq's compilation tools.. - Cohn, Lowney (1999)   (15 citations)  Self-citation (Lowney)   (Correct)

....schedules a single superblock at a time. Superblock formation also restructures the code so that the compiler can ignore the effects of infrequently executed paths. The tracer uses flow edge counts to select a trace, which is a frequently executed path through the flow graph [11] Trace selection [11,13] starts with a seed flow graph edge. The trace is then grown forwards and backwards using the mutual most likely heuristic. The heuristic requires that for block A to be followed by block B in the trace, A must be B s most likely predecessor and B must be A s most likely successor. Loops ....

....performed poorly in some cases because early exits from the superblock are also early exits from the loop, which drastically reduces the trip count unless the same path is taken through the loop repeatedly. 3. 4 Live on exit renamer The live on exit renamer was adapted from the Multiflow compiler [13]; it tries to remove a constraint that forces the compiler to create long dependent chains of operations in unrolled loops. After loops are unrolled, dependencies between uses and updates of scalar variables may prevent the scheduler from overlapping iterations. A common technique to solve this ....

P. G. Lowney et al., "The Multiflow Trace Scheduling Compiler," The Journal of Supercomputing, vol. 7, no. 1/2 (1993): 51-142.


Architecture-Independent Meta-Optimization by - Aggressive Tail Splitting (2004)   (Correct)

No context found.

Lowney, P.G., Freudenberger, S.M., Karzes, T.J., "The Multiflow Trace Scheduling Compiler ", The Journal of Supercomputing, vol. 7, number 1-2, pp. 51-142, 1993


Advances in Adaptive Computer Technology - Koch (2004)   (Correct)

No context found.

Lowney P.G. et al., "The Multiflow Trace Scheduling Compiler", J. of Supercomputing, Vol. 7, No. 1-2, March 1993 4, 4.2


The Use of Traces for Inlining in Java Programs - Borys Bradel And   (Correct)

No context found.

Lowney, P.G.: The multiflow trace scheduling compiler. The Journal of Supercomputing 7 (1993) 51--142


The Use of Traces for Inlining in Java Programs - Borys Bradel And   (Correct)

No context found.

Lowney, P.G.: The multiflow trace scheduling compiler. The Journal of Supercomputing 7 (1993) 51--142


Operation Tables for Scheduling in the Presence of.. - Shrivastava.. (2004)   (Correct)

No context found.

P.G.Lowney,S.M.Freudenberger,.T.J.Karzes,W.D. Lichtenstein, . R. P. Nix, J. S. O'Donnell, and . J. C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing", 7(1-2):51--142, 1993.


Compiler-Architecture Exploration using - Reservation Tables Generation   (Correct)

No context found.

P. G. Lowney et al. The multiflow trace scheduling compiler. J. Supercomputing, 7:51--142, 1993.


ACRES Architecture and Compilation - Ang, Schlansker (2004)   (Correct)

No context found.

P.G. Lowney, et al. The Multiflow Trace Scheduling Compiler. The Journal of Supercomputing, May. 7(1/2): pp. 51-142.


Efficient Backtracking Instruction Schedulers - Abraham (2000)   (Correct)

No context found.

P.G. Lowney, et al., "The Multiflow trace scheduling compiler," J. Supercomputing, vol. 7, no. 1, pp. 51-142, May 1993.


Clustering on the Move - Roos, Corporaal, Lamberts (2002)   (1 citation)  (Correct)

No context found.

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donell, and J. Ruttenberg. The Multiflow Trace Scheduling Compiler. Journal of Supercomputing, 7(1/2):51--142, Jan. 1993.


Data-Parallel Digital Signal Processors: Algorithm Mapping.. - Rajagopal (2004)   (Correct)

No context found.

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, J. S. O'Donnell, and J. Ruttenberg. The multiflow trace scheduling compiler. The Journal of Supercomputing : Special issue on instruction level parallelism, 7(1-2):51--142, May 1993.


Efficient Modeling of Itanium® Architecture during.. - Chen, Liu, Ju, al. (2004)   (Correct)

No context found.

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg, " The Multiflow Trace Scheduling Compiler," Journal of Supercomputing, vol. 7, no. 1-2, pp. 51-142, May 1993.


Turning Predicate Information to Advantage to Improve Compiler.. - Simon (2002)   (Correct)

No context found.

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. D. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The MultiflowTrace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.


Very Large Instruction Word Architectures - Binu Mathew What   (Correct)

No context found.

P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, and J.C. Ruttenberg. The Multiflow trace scheduling compiler. Journal of Supercomputing, 7, 1993.


Effective Instruction Scheduling with Limited Registers - Chen (2001)   (Correct)

No context found.

P. Lowney, S. Freudenberger, T. Karzes, W. Lichtenstein, R. Nix, J. O'Donnell, and J. Ruttenberg. 1993. "The Multiflow Trace Scheduling Compiler," The Journal of Supercomputing 7(1/2), Kluwer Academic Publishers, May, pp. 51-142.


Exploiting Thread-Level Parallelism On . . . - Lo (1998)   (Correct)

No context found.

P. G. Lowney, S. M. Freudenberger, T. J. Karzes, W. Lichtenstein, R. P. Nix, J. S. O'Donnell, and J. C. Ruttenberg. The Multiflow trace scheduling compiler. Journal of Supercomputing, 7(1/2):51--142, May 1993.


Design and Implementation of a Dynamic - Matthai Philipose University   (Correct)

No context found.

P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, and J.C. Ruttenberg. The Multiflow trace scheduling compiler. Journal of Supercomputing, 7, 1993.


Hardware Support for Dynamic Access Ordering: Performance of Some.. - McKee (1993)   (1 citation)  (Correct)

No context found.

Lowney, et. al., "The Multiflow Trace Scheduling Compiler", Journal of Supercomputing, 7:1,2, May 1993.


Path-Sensitive, Value-Flow Optimizations of Programs - Bodik (1999)   (2 citations)  (Correct)

No context found.

P. Geoffrey Lowney, Stefan M. Freudenberger, Thomas J. Karzes, W. D. Lichtenstein, Robert P. Nix, John S. O'Donnell, and John C. Ruttenberg. The Multiflow Trace Scheduling compiler. The Journal of Supercomputing, 7(1-2):51--142, May 1993.


Automata-Based Symbolic Scheduling - Haynal (2000)   (3 citations)  (Correct)

No context found.

P. G. Lowney, et al., "The Multiflow Trace Scheduling Compiler", J. Supercomputing, vol. 7, no. 1, pp. 51-142, Jan. 1993.


Reducing The Impact Of Register Pressure On Software Pipelined Loops - Llosa (1996)   (8 citations)  (Correct)

No context found.

P.G. Lowney, S.M. Freudenberger, T.J. Karzes, W.D. Lichtenstein, R.P. Nix, J.S. O'Donnell, and J.C. Ruttenberg. The Multiflow trace scheduling compiler. The Journal of Supercomputing, 7(1/2):51--142, 1993.


Compilers for Instruction-Level Parallelism - Schlansker, al. (1997)   (4 citations)  (Correct)

No context found.

P.G. Lowney et al., "The Multiflow Trace Scheduling Compiler," J. Supercomputing, Vol. 7, No. 1/2, 1993, pp. 51-142.

First 50 documents  Next 50

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC