Results 1 - 10
of
68
SSA is Functional Programming
, 1998
"... ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1(212)869-0481, or permissions@acm.org. SIGPLAN ACM Functional Programming i ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
ing with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept, ACM Inc., fax +1(212)869-0481, or permissions@acm.org. SIGPLAN ACM Functional Programming i 3 (i 2 ) j 3 (j 2 ) k 3 (k 2 ) if j 3 < 20 3 i 4 (i 2 ) j 4 (j 2 ) k 4 (k 2 ) return j 4 4 i 7 (i 5 , i 6 ) j 7 (j 8 , j 9 ) k 7 (k 8 ,k 9 ) 7 1 i 1 1 j 1 1 k 1 0 2 i 2 (i 7 , i 1 ) j 2 (j 7 , j 1 ) k 2 (k 7 , k 1 ) if k 2 < 100 i 5 (i 3 ) j 5 (j 3 ) k 5 (k 3 ) j 8 i 5 k 8 k 5 +1 5 i 6 (i 3 ) j 6 (j 3 ) k 6 (k 3 ) j 9 k 6 k 9 k 6 +2 6 Yuck! This isn't "the right number of names!" There are too many variables and useless copies. More about this later. Meanwhile, we can view this program as a set of mutually recursive functions, where each function takes arguments i, j, k: functionf 1 () = let i 1 = 1, j 1 = 1, k 1 = 1 in f 2 (i 1 , j 1 , k 1 ) function f 2 (i...
Lambda-Dropping: Transforming Recursive Equations into Programs with Block Structure
, 2001
"... Lambda-lifting a block-structured program transforms it into a set of recursive equations. We present the symmetric transformation: lambda-dropping. Lambdadropping a set of recursive equations restores block structure and lexical scope. For lack ..."
Abstract
-
Cited by 32 (10 self)
- Add to MetaCart
Lambda-lifting a block-structured program transforms it into a set of recursive equations. We present the symmetric transformation: lambda-dropping. Lambdadropping a set of recursive equations restores block structure and lexical scope. For lack
A Functional Correspondence between Call-by-Need Evaluators and Lazy Abstract Machines
, 2004
"... ..."
Single Assignment C -- efficient support for high-level array operations in a functional setting
, 2003
"... ..."
Silent Stores and Store Value Locality
- IEEE Transactions on Computers
, 2001
"... AbstractÐValue locality, a recently discovered program attribute that describes the likelihood of the recurrence of previously seen program values, has been studied enthusiastically in the recent published literature. Much of the energy has focused on refining the initial efforts at predicting load ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
AbstractÐValue locality, a recently discovered program attribute that describes the likelihood of the recurrence of previously seen program values, has been studied enthusiastically in the recent published literature. Much of the energy has focused on refining the initial efforts at predicting load instruction outcomes, with the balance of the effort examining the value locality of either all registerwriting instructions or a focused subset of them. Surprisingly, there has been very little published characterization of or effort to exploit the value locality of data words stored to memory by computer programs. This paper presents such a characterization, including detailed source-level analysis of the causes of silent stores, proposes both memory-centric (based on message passing) and producer-centric (based on program structure) prediction mechanisms for stored data values, introduces the concept of silent stores and new definitions of multiprocessor false sharing based on these observations, and suggests new techniques for aligning cache coherence protocols and microarchitectural store handling techniques to exploit the value locality of stores. We find that realistic implementations of these techniques can significantly reduce multiprocessor data bus traffic and are more effective at reducing address bus traffic than the addition of Exclusive state to a MSI coherence protocol. We also show that squashing of silent stores can provide uniprocessor speedups greater than the addition of store-to-load forwarding. Index TermsÐValue locality, value prediction, store optimization, false sharing, cache coherence. 1
The pitfalls of verifying floating-point computations
- ACM Transactions on programming languages and systems
"... Current critical systems often use a lot of floating-point computations, and thus the testing or static analysis of programs containing floatingpoint operators has become a priority. However, correctly defining the semantics of common implementations of floating-point is tricky, because semantics ma ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
Current critical systems often use a lot of floating-point computations, and thus the testing or static analysis of programs containing floatingpoint operators has become a priority. However, correctly defining the semantics of common implementations of floating-point is tricky, because semantics may change according to many factors beyond source-code level, such as choices made by compilers. We here give concrete examples of problems that can appear and solutions for implementing in analysis software. 1
Memory Coloring: A Compiler Approach for Scratchpad Memory Management
"... Scratchpad memory (SPM), a fast software-managed onchip SRAM, is now widely used in modern embedded processors. Compared to hardware-managed cache, it is more efficient in performance, power and area cost, and has the added advantage of better time predictability. This paper introduces a general-pur ..."
Abstract
-
Cited by 16 (7 self)
- Add to MetaCart
Scratchpad memory (SPM), a fast software-managed onchip SRAM, is now widely used in modern embedded processors. Compared to hardware-managed cache, it is more efficient in performance, power and area cost, and has the added advantage of better time predictability. This paper introduces a general-purpose compiler approach, called memory coloring, to efficiently allocating the arrays in a program to an SPM. The novelty of our approach lies in partitioning an SPM into a “register file”, splitting the live ranges of arrays to create potential data transfer statements between the SPM and off-chip memory, and finally, adapting an existing graph-colouring algorithm for register allocation to assign the arrays in the program into the register file. Our approach is efficient due to the practical efficiency of graph-colouring algorithms. We have implemented this work in SUIF and machSUIF. Preliminary results over benchmarks show that our approach represents a promising solution to automatic SPM management.
Compactly Representing First-Order Structures for Static Analysis
- In Proceedings of the 9th International Static Analysis Symposium
, 2002
"... A fundamental bottleneck in applying sophisticated static analyses to large programs is the space consumed by abstract program states. This is particularly true when analyzing programs that make extensive use of heap-allocated data. The TVLA (Three-Valued Logic Analysis) program analysis and verific ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
A fundamental bottleneck in applying sophisticated static analyses to large programs is the space consumed by abstract program states. This is particularly true when analyzing programs that make extensive use of heap-allocated data. The TVLA (Three-Valued Logic Analysis) program analysis and verification system models dynamic allocation precisely by representing program states as first-order structures. In such a representation, a finite collection of predicates is used to define states; the predicates range over a universe of individuals that may evolve---expand and contract---during analysis. Evolving first-order structures can be used to encode a wide variety of analyses, including most analyses whose abstract states are represented by directed graphs or maps. This paper addresses the problem of space consumption in such analyses by describing and evaluating two novel structure representation techniques. One technique uses ordered binary decision diagrams (OBDDs); the other uses a variant of a functional map data structure. Using a suite of benchmark analysis problems, we systematically compare the new representations with TVLA's existing state representation.
Efficient Exploitation of Parallelism on Pentium III and Pentium 4 Processor-Based Systems
, 2001
"... Systems based on the Pentium III and Pentium processors enable the exploitation of parallelism at a fineand medium-grained level. Dual- and quad-processor systems, for example, enable the exploitation of mediumgrained parallelism by using multithreaded code that takes advantage of multiple con ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
Systems based on the Pentium III and Pentium processors enable the exploitation of parallelism at a fineand medium-grained level. Dual- and quad-processor systems, for example, enable the exploitation of mediumgrained parallelism by using multithreaded code that takes advantage of multiple control and arithmetic logic units. Streaming Single-Instruction-Multiple-Data (SIMD) extensions, on the other hand, enable the exploitation of fine-grained SIMD parallelism by vectorizing loops that perform a single operation on multiple elements in a data set. This paper provides a high-level overview of the automatic parallelization and vectorization methods used by the Intel C++/Fortran compiler developed at the Microcomputer Software Labs.
Code Generation for Embedded Processors
- 13th International Symposium on System Synthesis
, 2000
"... The increasing use of programmable processors as IP blocks in embedded system design creates a need for C/C++ compilers capable of generating efficient machine code. Many of today's compilers for embedded processors suffer from insufficient code quality in terms of code size and performance. Th ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
The increasing use of programmable processors as IP blocks in embedded system design creates a need for C/C++ compilers capable of generating efficient machine code. Many of today's compilers for embedded processors suffer from insufficient code quality in terms of code size and performance. This violates the tight chip area and realtime constraints often imposed on embedded systems. The reason is that embedded processors typically show architectural features which are not well handled by classical compiler technology. This paper provides a survey of methods and techniques dedicated to efficient code generation for embedded processors. Emphasis is put on DSP and multimedia processors, for which better compiler technology is definitely required. In addition, some frontend aspects and recent trends in research and industry are briefly covered. The goal of these recent efforts in embedded code generation is to facilitate the step from assembly to high-level language progra...

