Results 1  10
of
10
WSMP: Watson Sparse Matrix Package
, 2000
"... Part II – direct solution of general sparse systems Version 10.9 ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
(Show Context)
Part II – direct solution of general sparse systems Version 10.9
SWEEPING PRECONDITIONERS FOR ELASTIC WAVE PROPAGATION WITH SPECTRAL ELEMENT METHODS
, 2013
"... Abstract.We present a parallel preconditioning method for the iterative solution of the timeharmonic elastic wave equation which makes use of higherorder spectral elements to reduce pollution error. In particular, the method leverages perfectly matched layer boundary conditions to efficiently appr ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
(Show Context)
Abstract.We present a parallel preconditioning method for the iterative solution of the timeharmonic elastic wave equation which makes use of higherorder spectral elements to reduce pollution error. In particular, the method leverages perfectly matched layer boundary conditions to efficiently approximate the Schur complement matrices of a block LDLT factorization. Both sequential and parallel versions of the algorithm are discussed and results for largescale problems from exploration geophysics are presented.
A PARALLEL SWEEPING PRECONDITIONER FOR HETEROGENEOUS 3D HELMHOLTZ EQUATIONS∗
"... Abstract. A parallelization of a sweeping preconditioner for 3D Helmholtz equations without internal resonance is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) ..."
Abstract

Cited by 4 (3 self)
 Add to MetaCart
(Show Context)
Abstract. A parallelization of a sweeping preconditioner for 3D Helmholtz equations without internal resonance is introduced and benchmarked for several challenging velocity models. The setup and application costs of the sequential preconditioner are shown to be O(γ2N4/3) and O(γN logN), where γ(ω) denotes the modestly frequencydependent number of grid points per Perfectly Matched Layer. Several computational and memory improvements are introduced relative to using blackbox sparsedirect solvers for the auxiliary problems, and competitive runtimes and iteration counts are reported for highfrequency problems distributed over thousands of cores. Two opensource packages are released along with this paper: Parallel Sweeping Preconditioner (PSP) and the underlying distributed multifrontal solver, Clique.
M.: Realtime stochastic optimization of complex energy systems on high performance computers
 Comput. Sci. Eng. 99(PrePrints
, 2014
"... ABSTRACT We present a scalable framework that computes in operationally compatible time the optimal energy dispatch under uncertainty for complex energy systems of realistic sizes. In the US, power grid optimization problems are solved by each of the 10 independent system operators (ISOs). In the f ..."
Abstract

Cited by 3 (2 self)
 Add to MetaCart
ABSTRACT We present a scalable framework that computes in operationally compatible time the optimal energy dispatch under uncertainty for complex energy systems of realistic sizes. In the US, power grid optimization problems are solved by each of the 10 independent system operators (ISOs). In the form of unit commitment (UC), such problems are the main component of dayahead planning of generators and electricity markets and they currently are solved faster than one hour We focus on the computing challenges stemming from one such evolutionary imperative: accounting for the variability in energy supply availability that occurs when renewable energy sources such as wind are used by using optimization * presenting author. . This results in vastly larger optimization problems, with several billion variables and constraints, because a large number of possible realizations of the uncertainty need to be considered to accurately capture the stochastic component of the problem. In addition, our models incorporate the transmission network of the State of Illinois, which contains approximately 2,000 transmission nodes, 2,500 transmission lines, 900 demand nodes, and 300 generation nodes (illustrated in To bridge the space between scalability and performance, needed for the realtime solution of stochastic power grid optimization problems, we have recently proposed several
unknown title
"... based on an implicit numerical scheme and a nonlinear constitutive model. We illustrate our methodology with an application to regional scale modeling in the French Riviera, level algorithm mixing a graphcoloring algorithm for the upper nonlinear layer and a classical mesh partitioning approach for ..."
Abstract
 Add to MetaCart
(Show Context)
based on an implicit numerical scheme and a nonlinear constitutive model. We illustrate our methodology with an application to regional scale modeling in the French Riviera, level algorithm mixing a graphcoloring algorithm for the upper nonlinear layer and a classical mesh partitioning approach for the rest of the domain we obtain a speedup of 3.6 in terms of elapsed time. We analyse the scaling of our algorithms on up to
Kinetic Dependence Graphs ∗
"... Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph anal ..."
Abstract
 Add to MetaCart
(Show Context)
Task graphs or dependence graphs are used in runtime systems to schedule tasks for parallel execution. In problem domains such as dense linear algebra and signal processing, dependence graphs can be generated from a program by static analysis. However, in emerging problem domains such as graph analytics, the set of tasks and dependences between tasks in a program are complex functions of runtime values and cannot be determined statically. In this paper, we introduce a novel approach for exploiting parallelism in such programs. This approach is based on a data structure called the kinetic dependence graph (KDG), which consists of a dependence graph together with update rules that incrementally update the graph to reflect changes in the dependence structure whenever a task is completed. We have implemented a simple programming model that allows programmers to write these applications at a high level of abstraction, and a runtime within the Galois system [15] that builds the KDG automatically and executes the program in parallel. On a suite of programs that are difficult to parallelize otherwise, we have obtained speedups of up to 33 on 40 cores, outperforming thirdparty implementations in many cases.
Efficient Enforcement of Hard Articulation Constraints in the Presence of Closed Loops and Contacts
, 2014
"... In rigid body simulation, one must distinguish between contacts (socalled unilateral constraints) and articulations (bilateral constraints). For contacts and friction, iterative solution methods have proven most useful for interactive applications, often in combination with ShockPropagation in cas ..."
Abstract
 Add to MetaCart
(Show Context)
In rigid body simulation, one must distinguish between contacts (socalled unilateral constraints) and articulations (bilateral constraints). For contacts and friction, iterative solution methods have proven most useful for interactive applications, often in combination with ShockPropagation in cases with strong interactions between contacts (such as stacks), prioritizing performance and plausibility over accuracy. For articulation constraints, direct solution methods are preferred, because one can rely on a factorization with linear time complexity for treelike systems, even in illconditioned cases caused by large massratios or high complexity. Despite recent advances, combining the advantages of direct and iterative solution methods wrt. performance has proven difficult and the intricacy of articulations in interactive applications is often limited by the convergence speed of the iterative solution method in the presence of closed kinematic loops (i.e. auxiliary constraints) and contacts. We identify common performance bottlenecks in the dynamic simulation of unilateral and bilateral constraints and are able to present a simulation method, that scales well in the number of constraints even in illconditioned cases with frictional contacts, collisions and closed loops in the kinematic graph. For cases where many joints are connected to a single body, we propose a technique to increase the sparsity of the positive definite linear system. A solution to these bottlenecks is presented in this paper to make the simulation of a wider range of mechanisms possible in realtime without extensive parameter tuning.
Mechanics
, 2011
"... A novel, hybrid parallel C++ framework for computational solid mechanics is developed and presented. The modular and extensible design of this framework allows it to support a wide variety of numerical schemes including discontinuous Galerkin formulations and higher order methods, multiphysics prob ..."
Abstract
 Add to MetaCart
(Show Context)
A novel, hybrid parallel C++ framework for computational solid mechanics is developed and presented. The modular and extensible design of this framework allows it to support a wide variety of numerical schemes including discontinuous Galerkin formulations and higher order methods, multiphysics problems, hybrid meshes made of different types of elements and a number of different linear and nonlinear solvers. In addition, native, seamless support is included for hardware acceleration by Graphics Processing Units (GPUs) via NVIDIA’s CUDA architecture for both single GPU workstations and heterogenous clusters of GPUs. The capabilities of the framework are demonstrated through a series of sample problems, including a laser induced cylindrical shock propagation, a dynamic problem involving a microtruss array made of millions of elements, and a tension problem involving a shape memory alloy with a