Results 1 - 10
of
15
POET: a scripting language for applying parameterized source-to-source program transformations
- In Software: Practice and Experience
, 2012
"... Abstract We present POET, a scripting language designed for applying advanced program transformations to code in arbitrary programming languages as well as building adhoc translators between these languages. We have used POET to support a large number of compiler optimizations, including loop inter ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
(Show Context)
Abstract We present POET, a scripting language designed for applying advanced program transformations to code in arbitrary programming languages as well as building adhoc translators between these languages. We have used POET to support a large number of compiler optimizations, including loop interchange, parallelization, blocking, fusion/fission, strength reduction, scalar replacement, SSE vectorization, among others, and to fully support the code generation of several domain-specific languages, including automatic tester/timer generation, and automatically translating a finitestate-machine-based behavior modeling language to C++/Java code. This paper presents key design and implementation decisions of the POET language and show how to use various language features to significantly reduce the difficulty of supporting programmable compiler optimization for high performance computing and supporting ad-hoc code generation for various domain-specific languages.
An effective empirical search method for automatic software tuning
, 2005
"... Empirical software optimization and tuning is an active research topic in the high performance computing research community. It is an adaptive system to generate optimized software using empirically searched parameters. Due to the large parameter search space, an appropriate search heuristic is an e ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Empirical software optimization and tuning is an active research topic in the high performance computing research community. It is an adaptive system to generate optimized software using empirically searched parameters. Due to the large parameter search space, an appropriate search heuristic is an essential part of the system. This paper describes an effective search method that can be generally applied to empirical optimization. We apply this method to ATLAS (Automatically Tuned Linear Algebra Software), which is a system for empirically optimizing dense linear algebra kernels. Our experiments on four different platforms show that the new search scheme can produce parameters that can lead ATLAS to generate a library with better performance. 1
A Comparison of Search Heuristics for Empirical Code Optimization
"... Abstract—This paper describes the application of various search techniques to the problem of automatic empirical code optimization. The search process is a critical aspect of auto-tuning systems because the large size of the search space and the cost of evaluating the candidate implementations makes ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
(Show Context)
Abstract—This paper describes the application of various search techniques to the problem of automatic empirical code optimization. The search process is a critical aspect of auto-tuning systems because the large size of the search space and the cost of evaluating the candidate implementations makes it infeasible to find the true optimum point by brute force. We evaluate the effectiveness of Nelder-Mead Simplex, Genetic Algorithms, Simulated Annealing, Particle Swarm Optimization, Orthogonal search, and Random search in terms of the performance of the best candidate found under varying time limits. I.
Self adapting numerical software (SANS) effort
- University of Tennessee, Computer Science Department
, 2006
"... The challenge for the development of next generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-Adapting Numerical Software (SANS) systems are ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
(Show Context)
The challenge for the development of next generation software is the successful management of the complex computational environment while delivering to the scientist the full power of flexible compositions of the available algorithmic alternatives. Self-Adapting Numerical Software (SANS) systems are intended to meet this significant challenge. The process of arriving at an efficient numerical solution of problems in computational science involves numerous decisions by a numerical expert. Attempts to automate such decisions distinguish three levels: • Algorithmic decision; • Management of the parallel environment; • Processor-specific tuning of kernels. Additionally, at any of these levels we can decide to rearrange the user’s data. In this paper we look at a number of efforts at the University of Tennessee that are investigating these areas. 1
Classification and utilization of abstractions for optimization
- In Proc. 1st International Symposium on Leveraging Applications of Formal Methods, Paphos
, 2004
"... Abstract. We define a novel approach to optimize the use of libraries within applications. We propose that library-defined abstractions be classified to support their automated optimization and by leveraging these additional semantics we enable the library specific optimization of application codes. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
(Show Context)
Abstract. We define a novel approach to optimize the use of libraries within applications. We propose that library-defined abstractions be classified to support their automated optimization and by leveraging these additional semantics we enable the library specific optimization of application codes. We believe that such an approach entails the use of formal methods. We describe ROSE, a framework for building source-to-source translators, used for the high-level optimization of scientific applications. It is a common perception that performance is inversely proportional to the level of abstraction. Our work shows that this is not the case if the additional semantics of library-defined abstractions can be leveraged. ROSE allows the recognition of such abstractions and the optimization of their use in applications. We show how ROSE can be used to utilize the additional semantics within the compile-time optimization and present promising results. 1
T.Panas. Extending automatic parallelization to optimize high leel abstraction for multicore
- In Proceedingsof 5th international workshop on openMP IWOMP (2009
"... Abstract. Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focu ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Automatic introduction of OpenMP for sequential applications has attracted significant attention recently because of the proliferation of multicore processors and the simplicity of using OpenMP to express parallelism for shared-memory systems. However, most previous research has only focused on C and Fortran applications operating on primitive data types. C++ applications using high-level abstractions, such as STL containers and complex user-defined types, are largely ignored due to the lack of research compilers that are readily able to recognize high-level object-oriented abstractions and leverage their associated semantics. In this paper, we automatically parallelize C++ applications using ROSE, a multiple-language source-to-source compiler infrastructure which preserves the high-level abstractions and allows us to unambiguously leverage their known semantics. Several representative parallelization candidate kernels are used to explore semantic-aware parallelization strategies for high-level abstractions, combined with extended compiler analyses. Those kernels include an array-based computation loop, a loop with task-level parallelism, and a domain-specific tree traversal. Our work extends the applicability of automatic parallelization to modern applications using high-level abstractions and exposes more opportunities to take advantage of multicore processors.
Annotating User-Defined Abstractions for Optimization
"... Although conventional compilers implement a wide range of optimization techniques, they frequently miss opportunities to optimize the use of abstractions, largely because they are not designed to recognize and use the relevant semantic information about such abstractions. In this position paper, we ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Although conventional compilers implement a wide range of optimization techniques, they frequently miss opportunities to optimize the use of abstractions, largely because they are not designed to recognize and use the relevant semantic information about such abstractions. In this position paper, we propose a set of annotations to help communicate high-level semantic information about abstractions to the compiler, thereby enabling the large body of traditional compiler optimizations to be applied to the use of those abstractions. Our annotations explicitly describe properties of abstractions that are needed to guarantee the applicability and profitability of a broad variety of such optimizations, including memoization, reordering, data layout transformations, and inlining and specialization. 1
Applying data copy to improve memory performance of general array computations
- In LCPC
, 2005
"... Abstract. Data copy is an important compiler optimization which dynamically rearranges the layout of arrays by copying their elements into local buffers. Traditionally, array copy is considered expensive and has been applied only to the working sets of fully blocked computations. This paper presents ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
Abstract. Data copy is an important compiler optimization which dynamically rearranges the layout of arrays by copying their elements into local buffers. Traditionally, array copy is considered expensive and has been applied only to the working sets of fully blocked computations. This paper presents an algorithm which automatically applies data copy to optimize the performance of general computations independent of blocking. The algorithm automatically decides where to insert copy operations and which regions of arrays to copy. In addition, when specialized, it is equivalent to a general scalar replacement algorithm on arbitrary array computations. The algorithm is fully implemented and has been applied to optimize several scientific kernels. The results show that the algorithm is highly effective and that data copy can significantly improve the performance of scientific computations, both when combined with blocking and when applied alone without blocking. 1
An extensible open-source compiler infrastructure for testing
- In Proc. IBM Haifa Verification Conference, volume LNCS 3875
, 2005
"... Abstract. Testing forms a critical part of the development process for large-scale software, and there is growing need for automated tools that can read, represent, analyze, and transform the application’s source code to help carry out testing tasks. However, the support required to compile applicat ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Abstract. Testing forms a critical part of the development process for large-scale software, and there is growing need for automated tools that can read, represent, analyze, and transform the application’s source code to help carry out testing tasks. However, the support required to compile applications written in common general purpose languages is generally inaccessible to the testing research community. In this paper, we report on an extensible, open-source compiler infrastructure called ROSE, which is currently in development at Lawrence Livermore National Laboratory. ROSE specifically targets developers who wish to build sourcebased tools that implement customized analyses and optimizations for large-scale C, C++, and Fortran90 scientific computing applications (on the order of a million lines of code or more). However, much of this infrastructure can also be used to address problems in testing, and ROSE is by design broadly accessible to those without a formal compiler background. This paper details the interactions between testing of applications and the ways in which compiler technology can aid in the understanding of those applications. We emphasize the particular aspects of ROSE, such as support for the general analysis of whole programs, that are particularly well-suited to the testing research community and the scale of the problems that community solves. 1
A ROSE-based OpenMP 3.0 research compiler supporting multiple runtime libraries
- Beyond Loop Level Parallelism in OpenMP: Accelerators, Tasking and More
, 2010
"... Abstract. OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers to target modern hard-ware architectures for optimal performance. A variety of extensible and robust research compilers are key to OpenMP’s sustainable success in the future. In this pape ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
Abstract. OpenMP is a popular and evolving programming model for shared-memory platforms. It relies on compilers to target modern hard-ware architectures for optimal performance. A variety of extensible and robust research compilers are key to OpenMP’s sustainable success in the future. In this paper, we present our efforts to build an OpenMP 3.0 research compiler for C, C++, and Fortran using the ROSE source-to-source compiler framework. Our goal is to support OpenMP research for ourselves and others. We have extended ROSE’s internal representation to handle all OpenMP 3.0 constructs, thus facilitating experimenting with them. Since OpenMP research is often complicated by the tight coupling of the compiler translation and the runtime system, we present a set of rules to define a common OpenMP runtime library (XOMP) on top of multiple runtime libraries. These rules additionally define how to build a set of translations targeting XOMP. Our work demonstrates how to reuse OpenMP translations across different runtime libraries. This work simplifies OpenMP research by decoupling the problematic depen-dence between the compiler translations and the runtime libraries. We present an evaluation of our work by demonstrating an analysis tool for OpenMP correctness. We also show how XOMP can be defined using both GOMP and Omni. Our comparative performance results against other OpenMP compilers demonstrate that our flexible runtime support does not incur additional overhead. 1