Results 11  20
of
13,307
Superscalar GEMMbased level 3 BLAS – the ongoing evolution of a portable and highperformance library
 APPLIED PARALLEL COMPUTING, LARGE SCALE SCIENTIFIC AND INDUSTRIAL PROBLEMS
, 1998
"... Recently, a first version of our GEMMbased level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1 cac ..."
Abstract

Cited by 17 (5 self)
 Add to MetaCart
Recently, a first version of our GEMMbased level 3 BLAS for superscalar type processors was announced. A new feature is the inclusion of DGEMM itself. This DGEMM routine contains inline what we call a level 3 kernel routine, which is based on register blocking. Additionally, it features level 1
GEMMW: A PORTABLE LEVEL 3 BLAS WINOGRAD VARIANT OF STRASSEN'S MATRIX{MATRIX MULTIPLY ALGORITHM
"... Abstract. Matrix{matrix multiplication is normally computed using one of the BLAS or a reinvention of part of the BLAS. Unfortunately, the BLAS were designed with small matrices in mind. When huge, well conditioned matrices are multiplied together, the BLAS perform like the blahs, even on vector mac ..."
Abstract
 Add to MetaCart
machines. For matrices where the coe cients are well conditioned, Winograd's variant of Strassen's algorithm o ers some relief, but is rarely available in a quality form on most computers. We reconsider this method and o er a highly portable solution based on the Level 3 BLAS interface. Key words
Implementing Level3 BLAS with BLIS: Early Experience FLAME Working Note #69
, 2013
"... BLIS is a new software framework for instantiating highperformance BLASlike dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework ..."
Abstract
 Add to MetaCart
BLIS is a new software framework for instantiating highperformance BLASlike dense linear algebra libraries. We demonstrate how BLIS acts as a productivity multiplier by using it to implement the level3 BLAS on a variety of current architectures. The systems for which we demonstrate the framework
HyperSystolic Algorithms for NBody Computations and Parallel Level3 BLAS Libraries
, 1998
"... Hypersystolic algorithms repesent a new class of parallel computing structures. Because of their regular communication and compute patterns they are well suited for implementation on most parallel architectures, in particular, high performance SIMD machines can benefit considerably. After a short e ..."
Abstract
 Add to MetaCart
explanation of the concept of hypersystolic algorithms, their application to Nbody computations and distributed matrix multiplication is discussed. Results from real implementations are presented. Keywords: systolic, hypersystolic, parallel computing, SIMD, HPC, Nbody problem, level3 BLAS 1
Automatically tuned linear algebra software
 CONFERENCE ON HIGH PERFORMANCE NETWORKING AND COMPUTING
, 1998
"... This paper describes an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. The production of such software for machines ranging from desktop workstations to embedded processors can be a tedious and ..."
Abstract

Cited by 477 (30 self)
 Add to MetaCart
much ofthe technology and approach developed here can be applied to the other Level 3 BLAS and the general strategy can have an impact on basic linear algebra operations in general and may be extended to other important kernel operations.
An Extended Set of Fortran Basic Linear Algebra Subprograms
 ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE
, 1986
"... This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers. ..."
Abstract

Cited by 526 (72 self)
 Add to MetaCart
This paper describes an extension to the set of Basic Linear Algebra Subprograms. The extensions are targeted at matrixvector operations which should provide for efficient and portable implementations of algorithms for high performance computers.
LogP: Towards a Realistic Model of Parallel Computation
, 1993
"... A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding developme ..."
Abstract

Cited by 562 (15 self)
 Add to MetaCart
A vast body of theoretical research has focused either on overly simplistic models of parallel computation, notably the PRAM, or overly specific models that have few representatives in the real world. Both kinds of models encourage exploitation of formal loopholes, rather than rewarding development of techniques that yield performance across a range of current and future parallel machines. This paper offers a new parallel machine model, called LogP, that reflects the critical technology trends underlying parallel computers. It is intended to serve as a basis for developing fast, portable parallel algorithms and to offer guidelines to machine designers. Such a model must strike a balance between detail and simplicity in order to reveal important bottlenecks without making analysis of interesting problems intractable. The model is based on four parameters that specify abstractly the computing bandwidth, the communication bandwidth, the communication delay, and the efficiency of coupling communication and computation. Portable parallel algorithms typically adapt to the machine configuration, in terms of these parameters. The utility of the model is demonstrated through examples that are implemented on the CM5.
An Overview of the C++ Programming Language
, 1999
"... This overview of C++ presents the key design, programming, and languagetechnical concepts using examples to give the reader a feel for the language. C++ is a generalpurpose programming language with a bias towards systems programming that supports efficient lowlevel computation, data abstraction, ..."
Abstract

Cited by 1766 (15 self)
 Add to MetaCart
This overview of C++ presents the key design, programming, and languagetechnical concepts using examples to give the reader a feel for the language. C++ is a generalpurpose programming language with a bias towards systems programming that supports efficient lowlevel computation, data abstraction
The embryonic cell lineage of the nematode Caenorhabditis elegans
 Dev. Biol
, 1983
"... The number of nongonadal nuclei in the freeliving soil nematode Caenorhabditis elegans increases from about 550 in the newly hatched larva to about 810 in the mature hermaphrodite and to about 970 in the mature male. The pattern of cell divisions which leads to this increase is essentially invarian ..."
Abstract

Cited by 503 (16 self)
 Add to MetaCart
The number of nongonadal nuclei in the freeliving soil nematode Caenorhabditis elegans increases from about 550 in the newly hatched larva to about 810 in the mature hermaphrodite and to about 970 in the mature male. The pattern of cell divisions which leads to this increase is essentially invariant among individuals; rigidly determined cell lineages generate a fixed number of progeny cells of strictly specified fates. These lineages range in length from one to eight sequential divisions and lead to significant developmental changes in the neuronal, muscular, hypodermal, and digestive systems. Frequently, several blast cells follow the same asymmetric program of divisions; lineally equivalent progeny of such cells generally differentiate into functionally equivalent cells. We have determined these cell lineages by direct observation of the divisions, migrations, and deaths of individual cells in living nematodes. Many of the cell lineages are involved in sexual maturation. At hatching, the hermaphrodite and male are almost identical morphologically; by the adult stage, gross anatomical differences are obvious. Some of these sexual differences arise from blast cells whose division patterns are initially identical in the male and in the hermaphrodite but later diverge. In the hermaphrodite, these cells produce structures used in egglaying and mating, whereas, in the male, they produce morphologically different structures which function before and during copulation. In addition, development of the male involves a number of lineages derived from cells which do not divide in the hermaphrodite. Similar postembryonic developmental events occur in other nematode species.
Bundle Adjustment  A Modern Synthesis
 VISION ALGORITHMS: THEORY AND PRACTICE, LNCS
, 2000
"... This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics c ..."
Abstract

Cited by 555 (12 self)
 Add to MetaCart
This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problem of refining a visual reconstruction to produce jointly optimal structure and viewing parameter estimates. Topics covered include: the choice of cost function and robustness; numerical optimization including sparse Newton methods, linearly convergent approximations, updating and recursive methods; gauge (datum) invariance; and quality control. The theory is developed for general robust cost functions rather than restricting attention to traditional nonlinear least squares.
Results 11  20
of
13,307