Results 1  10
of
82
OptiX: A General Purpose Ray Tracing Engine
"... Figure 1: Images from various applications built with OptiX. Top: Physically based light transport through path tracing. Bottom: Ray tracing of a procedural Julia set, photon mapping, largescale line of sight and collision detection, Whittedstyle ray tracing of dynamic geometry, and ray traced amb ..."
Abstract

Cited by 86 (3 self)
 Add to MetaCart
Figure 1: Images from various applications built with OptiX. Top: Physically based light transport through path tracing. Bottom: Ray tracing of a procedural Julia set, photon mapping, largescale line of sight and collision detection, Whittedstyle ray tracing of dynamic geometry, and ray traced ambient occlusion. All applications are interactive. The NVIDIA ® OptiX ™ ray tracing engine is a programmable system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key observation that most ray tracing algorithms can be implemented using a small set of programmable operations. Consequently, the core of OptiX is a domainspecific justintime compiler that generates custom ray tracing kernels by combining usersupplied programs for ray generation, material shading, object intersection, and scene traversal. This enables the implementation of a highly diverse set of ray tracingbased algorithms and applications, including interactive rendering, offline rendering, collision detection systems, artificial intelligence queries, and scientific simulations such as sound propagation. OptiX achieves high performance through a compact object model and application of several ray tracingspecific compiler optimizations. For ease of use it exposes a singleray programming model with full support for recursion and a dynamic dispatch mechanism similar to virtual function calls.
Fast bvh construction on gpus
 IN PROC. EUROGRAPHICS ’09
, 2009
"... We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a topdown approach that uses th ..."
Abstract

Cited by 53 (9 self)
 Add to MetaCart
We present two novel parallel algorithms for rapidly constructing bounding volume hierarchies on manycore GPUs. The first uses a linear ordering derived from spatial Morton codes to build hierarchies extremely quickly and with high parallel scalability. The second is a topdown approach that uses the surface area heuristic (SAH) to build hierarchies optimized for fast ray tracing. Both algorithms are combined into a hybrid algorithm that removes existing bottlenecks in the algorithm for GPU construction performance and scalability leading to significantly decreased build time. The resulting hierarchies are close in to optimized SAH hierarchies, but the construction process is substantially faster, leading to a significant net benefit when both construction and traversal cost are accounted for. Our preliminary results show that current GPU architectures can compete with CPU implementations of hierarchy construction running on multicore systems. In practice, we can construct hierarchies of models with up to several million triangles and use them for fast ray tracing or other applications.
Realtime parallel hashing on the gpu
 In ACM SIGGRAPH Asia 2009 papers, SIGGRAPH ’09
, 2009
"... Figure 1: Overview of our construction for a voxelized Lucy model, colored by mapping x, y, and z coordinates to red, green, and blue respectively (far left). The 3.5 million voxels (left) are input as 32bit keys and placed into buckets of ≤ 512 items, averaging 409 each (center). Each bucket then ..."
Abstract

Cited by 26 (6 self)
 Add to MetaCart
Figure 1: Overview of our construction for a voxelized Lucy model, colored by mapping x, y, and z coordinates to red, green, and blue respectively (far left). The 3.5 million voxels (left) are input as 32bit keys and placed into buckets of ≤ 512 items, averaging 409 each (center). Each bucket then builds a cuckoo hash with three subtables and stores them in a larger structure with 5 million entries (right). Closeups follow the progress of a single bucket, showing the keys allocated to it (center; the bucket is linear and wraps around left to right) and each of its completed cuckoo subtables (right). Finding any key requires checking only three possible locations. We demonstrate an efficient dataparallel algorithm for building large hash tables of millions of elements in realtime. We consider two parallel algorithms for the construction: a classical sparse perfect hashing approach, and cuckoo hashing, which packs elements densely by allowing an element to be stored in one of multiple possible locations. Our construction is a hybrid approach that uses both algorithms. We measure the construction time, access time, and memory usage of our implementations and demonstrate realtime performance on large datasets: for 5 million keyvalue pairs, we construct a hash table in 35.7 ms using 1.42 times as much memory as the input data itself, and we can access all the elements in that hash table in 15.3 ms. For comparison, sorting the same data requires 36.6 ms, but accessing all the elements via binary search requires 79.5 ms. Furthermore, we show how our hashing methods can be applied to two graphics applications: 3D surface intersection for moving data and geometric hashing for image matching.
HardwareAccelerated Global Illumination by Image Space Photon Mapping
"... Figure 1: Imagespace photon mapping can compute global illumination at interactive rates for scenes with multiple lights, caustics, shadows, and complex BSDFs. This scene renders at 26 Hz at 1920×1080. (Indirect and ambient intensity are amplified for comparison in this image.) We describe an exten ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
Figure 1: Imagespace photon mapping can compute global illumination at interactive rates for scenes with multiple lights, caustics, shadows, and complex BSDFs. This scene renders at 26 Hz at 1920×1080. (Indirect and ambient intensity are amplified for comparison in this image.) We describe an extension to photon mapping that recasts the most expensive steps of the algorithm – the initial and final photon bounces – as imagespace operations amenable to GPU acceleration. This enables global illumination for realtime applications as well as accelerating it for offline rendering. Image Space Photon Mapping (ISPM) rasterizes a lightspace bounce map of emitted photons surviving initialbounce Russian roulette sampling on a GPU. It then traces photons conventionally on the CPU. Traditional photon mapping estimates final radiance by gathering photons from a kd tree. ISPM instead scatters indirect illumination by rasterizing an array of photon volumes. Each volume bounds a filter kernel based on the a priori probability density of each photon path. These two steps exploit the fact that initial path segments from point lights and final ones into a pinhole camera each have a common center of projection. An optional step uses joint bilateral upsampling of irradiance to reduce the fill requirements of rasterizing photon volumes. ISPM preserves the accurate and physicallybased nature of photon mapping, supports arbitrary BSDFs, and captures both high and lowfrequency illumination effects such as caustics and diffuse color interreflection. An implementation on a consumer GPU and 8core CPU renders highquality global illumination at up to 26 Hz at HD (1920×1080) resolution, for complex scenes containing moving objects and lights.
DataParallel Octrees for Surface Reconstruction
 IEEE TRANSACTIONS ON VISUALIZATION & COMPUTER GRAPHICS
"... We present the first parallel surface reconstruction algorithm that runs entirely on the GPU. Like existing implicit surface reconstruction methods, our algorithm first builds an octree for the given set of oriented points, then computes an implicit function over the space of the octree, and finally ..."
Abstract

Cited by 23 (0 self)
 Add to MetaCart
We present the first parallel surface reconstruction algorithm that runs entirely on the GPU. Like existing implicit surface reconstruction methods, our algorithm first builds an octree for the given set of oriented points, then computes an implicit function over the space of the octree, and finally extracts an isosurface as a watertight triangle mesh. A key component of our algorithm is a novel technique for octree construction on the GPU. This technique builds octrees in realtime and uses levelorder traversals to exploit the finegrained parallelism of the GPU. Moreover, the technique produces octrees that provide fast access to the neighborhood information of each octree node, which is critical for fast GPU surface reconstruction. With an octree so constructed, our GPU algorithm performs Poisson surface reconstruction, which produces high quality surfaces through a global optimization. Given a set of 500K points, our algorithm runs at the rate of about five frames per second, which is over two orders of magnitude faster than previous CPU algorithms. To demonstrate the potential of our algorithm, we propose a userguided surface reconstruction technique which reduces the topological ambiguity and improves reconstruction results for imperfect scan data. We also show how to use our algorithm to perform onthefly conversion from dynamic point clouds to surfaces as well as to reconstruct fluid surfaces for realtime fluid simulation.
Line Space Gathering for Single Scattering in Large Scenes
"... We present an efficient technique to render single scattering in large scenes with reflective and refractive objects and homogeneous participating media. Efficiency is obtained by evaluating the final radiance along a viewing ray directly from the lighting rays passing near to it, and by rapidly ide ..."
Abstract

Cited by 22 (1 self)
 Add to MetaCart
We present an efficient technique to render single scattering in large scenes with reflective and refractive objects and homogeneous participating media. Efficiency is obtained by evaluating the final radiance along a viewing ray directly from the lighting rays passing near to it, and by rapidly identifying such lighting rays in the scene. To facilitate the search for nearby lighting rays, we convert lighting rays and viewing rays into 6D points and planes according to their Plücker coordinates and coefficients, respectively. In this 6D line space, the problem of closest lines search becomes one of closest points to a plane query, which we significantly accelerate using a spatial hierarchy of the 6D points. This approach to lighting ray gathering supports complex light paths with multiple reflections and refractions, and avoids the use of a volume representation, which is expensive for largescale scenes. This method also utilizes far fewer lighting rays than the number of photons needed in traditional volumetric photon mapping, and does not discretize viewing rays into numerous steps for ray marching. With this approach, results similar to volumetric photon mapping are obtained efficiently in terms of both storage and computation.
The State of the Art in Interactive Global Illumination
 COMPUTER GRAPHICS FORUM
"... The interaction of light and matter in the world surrounding us is of striking complexity and beauty. Since the very beginning of computer graphics, adequate modeling of these processes and efficient computation is an intensively studied research topic and still not a solved problem. The inherent c ..."
Abstract

Cited by 21 (7 self)
 Add to MetaCart
The interaction of light and matter in the world surrounding us is of striking complexity and beauty. Since the very beginning of computer graphics, adequate modeling of these processes and efficient computation is an intensively studied research topic and still not a solved problem. The inherent complexity stems from the underlying physical processes as well as the global nature of the interactions that let light travel within a scene. This article reviews the state of the art in interactive global illumination computation, that is, methods that generate an image of a virtual scene in less than one second with an as exact as possible, or plausible, solution to the light transport. Additionally, the theoretical background and attempts to classify the broad field of methods are described. The strengths and weaknesses of different approaches, when applied to the different visual phenomena, arising from light interaction are compared and discussed. Finally, the article concludes by highlighting design patterns for interactive global illumination and a list of open problems.
gproximity: Hierarchical gpubased operations for collision and distance queries
 In Proceedings of Eurographics 2010
, 2010
"... We present novel parallel algorithms for collision detection and separation distance computation for rigid and deformable models that exploit the computational capabilities of manycore GPUs. Our approach uses thread and data parallelism to perform fast hierarchy construction, updating, and traversa ..."
Abstract

Cited by 20 (5 self)
 Add to MetaCart
We present novel parallel algorithms for collision detection and separation distance computation for rigid and deformable models that exploit the computational capabilities of manycore GPUs. Our approach uses thread and data parallelism to perform fast hierarchy construction, updating, and traversal using tightfitting bounding volumes such as oriented bounding boxes (OBB) and rectangular swept spheres (RSS). We also describe efficient algorithms to compute a linear bounding volume hierarchy (LBVH) and update them using refitting methods. Moreover, we show that tightfitting bounding volume hierarchies offer improved performance on GPUlike throughput architectures. We use our algorithms to perform discrete and continuous collision detection including selfcollisions, as well as separation distance computation between nonoverlapping models. In practice, our approach (gProximity) can perform these queries in a few milliseconds on a PC with NVIDIA GTX 285 card on models composed of tens or hundreds of thousands of triangles used in cloth simulation, surgical simulation, virtual prototyping and Nbody simulation. Moreover, we observe more than an order of magnitude performance improvement over prior GPUbased algorithms.
Speeding up LargeScale PointinPolygon Test Based Spatial Join on GPUs. Technical report online at http://geoteci.engr.ccny.cuny.edu/pub/pipsp_tr.pdf
"... PointinPolygon (PIP) test is fundamental to spatial databases and GIS. Motivated by the slow response times in joining largescale point locations with polygons using traditional spatial databases and GIS and the massively data parallel computing power of commodity GPU devices, we have designed an ..."
Abstract

Cited by 20 (15 self)
 Add to MetaCart
PointinPolygon (PIP) test is fundamental to spatial databases and GIS. Motivated by the slow response times in joining largescale point locations with polygons using traditional spatial databases and GIS and the massively data parallel computing power of commodity GPU devices, we have designed and developed an endtoend system completely on GPUs to associate points with the polygons that they fall within. The system includes an efficient module to generate point quadrants that have at most K points from largescale unordered points, a simple gridfile based spatial filtering approach to associate point quadrants and polygons, and, a PIP test module to assign polygons to points in a GPU computing block using both the block and thread level parallelisms. Experiments on joining 170 million points with more than 40 thousand polygons have resulted in a runtime of 11.165 seconds on an Nvidia Quadro 6000 GPU device. Compared with a baseline serial CPU implementation using stateoftheart open source GIS packages which requires 15.223 hours to complete, a speedup of 4,910X has been achieved. We further discuss several factors and parameters that may affect the system performance. 1.
Micropolygon ray tracing with defocus and motion blur
"... (b) Motion blur + defocus Figure 1: A car rendered with defocus, motion blur, mirror reflection and ambient occlusion at 1280 × 720 resolution with 23 × 23 supersampling. The scene is tessellated into 48.9M micropolygons (i.e., 53.1 micropolygons per pixel). The blurred image is rendered in 4 minute ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
(b) Motion blur + defocus Figure 1: A car rendered with defocus, motion blur, mirror reflection and ambient occlusion at 1280 × 720 resolution with 23 × 23 supersampling. The scene is tessellated into 48.9M micropolygons (i.e., 53.1 micropolygons per pixel). The blurred image is rendered in 4 minutes on an NVIDIA GTX 285 GPU. The image rendered in perfect focus takes 2 minutes and is provided to help the reader to assess the defocus and motion blur effects. We present a micropolygon ray tracing algorithm that is capable of efficiently rendering high quality defocus and motion blur effects. A key component of our algorithm is a BVH (bounding volume hierarchy) based on 4D hypertrapezoids that project into 3D OBBs (oriented bounding boxes) in spatial dimensions. This acceleration structure is able to provide tight bounding volumes for scene geometries, and is thus efficient in pruning intersection tests during ray traversal. More importantly, it can exploit the natural coherence on the time dimension in motion blurred scenes. The structure can be quickly constructed by utilizing the micropolygon grids generated during micropolygon tessellation. Ray tracing of defocused and motion blurred scenes is efficiently performed by traversing the structure. Both the BVH construction and ray traversal are easily implemented on GPUs and integrated into a GPUbased micropolygon renderer. In our experiments, our ray tracer performs up to an order of magnitude faster than the stateofart rasterizers while consistently delivering an image quality equivalent to a maximumquality rasterizer. We also demonstrate that the ray tracing algorithm can be extended to handle a variety of effects, such as secondary ray effects and transparency.