• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Sorting networks and their applications (1968)

by K. E. Batcher
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 657
Next 10 →

A survey of general-purpose computation on graphics hardware

by John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, Tim Purcell , 2007
"... The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the l ..."
Abstract - Cited by 545 (18 self) - Add to MetaCart
The rapid increase in the performance of graphics hardware, coupled with recent improvements in its programmability, have made graphics hardware acompelling platform for computationally demanding tasks in awide variety of application domains. In this report, we describe, summarize, and analyze the latest research in mapping general-purpose computation to graphics hardware. We begin with the technical motivations that underlie general-purpose computation on graphics processors (GPGPU) and describe the hardware and software developments that have led to the recent interest in this field. We then aim the main body of this report at two separate audiences. First, we describe the techniques used in mapping general-purpose computation to graphics hardware. We believe these techniques will be generally useful for researchers who plan to develop the next generation of GPGPU algorithms and techniques. Second, we survey and categorize the latest developments in general-purpose application development on graphics hardware.

Performance analysis of k-ary n-cube interconnection networks

by William J. Dally - IEEE Transactions on Computers , 1990
"... Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes commu-nication networks of varying dimension under the assumption of co ..."
Abstract - Cited by 359 (18 self) - Add to MetaCart
Abstmct-VLSI communication networks are wire-limited. The cost of a network is not a function of the number of switches required, but rather a function of the wiring density required to construct the network. This paper analyzes commu-nication networks of varying dimension under the assumption of constant wire bisection. Expressions for the latency, average case throughput, and hot-spot throughput of k-ary n-cube networks with constant bisection are derived that agree closely with experi-mental measurements. It is shown that low-dimensional networks (e.g., tori) have lower latency and higher hot-spot throughput than high-dimensional networks (e.g., binary n-cubes) with the same bisection width. Index Terms- Communication networks, concurrent comput-ing, interconnection networks, message-passing multiprocessors, parallel processing, VLSI. I.
(Show Context)

Citation Context

...onal networks. 2 Preliminaries 2.1 k-ary n-cubes Many different network topologies have been proposed for use in concurrent computers: trees [6] [15] [21], Benes networks[4], Batcher sorting networks =-=[2]-=-, shuffle exchange networks [23], Omega networks [14], indirect binary n-cube or flip networks [3] [22], and direct binary n-cubes Figure 1: A Binary 6-Cube Embedded in the Plane [19], [17], [24]. The...

Software Protection and Simulation on Oblivious RAMs

by Oded Goldreich, Rafail Ostrovsky , 1993
"... Software protection is one of the most important issues concerning computer practice. There exist many heuristics and ad-hoc methods for protection, but the problem as a whole has not received the theoretical treatment it deserves. In this paper we provide theoretical treatment of software protectio ..."
Abstract - Cited by 317 (15 self) - Add to MetaCart
Software protection is one of the most important issues concerning computer practice. There exist many heuristics and ad-hoc methods for protection, but the problem as a whole has not received the theoretical treatment it deserves. In this paper we provide theoretical treatment of software protection. We reduce the problem of software protection to the problem of efficient simulation on oblivious RAM. A machine is oblivious if the sequence in which it accesses memory locations is equivalent for any two inputs with the same running time. For example, an oblivious Turing Machine is one for which the movement of the heads on the tapes is identical for each computation. (Thus, it is independent of the actual input.) What is the slowdown in the running time of any machine, if it is required to be oblivious? In 1979 Pippenger and Fischer showed how a two-tape oblivious Turing Machine can simulate, on-line, a one-tape Turing Machine, with a logarithmic slowdown in the running time. We s...

Parallel merge sort

by Richard Cole - SIAM Journal of Computing , 1988
"... Abstract. We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the con-stant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and O(logn) time. The constant in the runnin ..."
Abstract - Cited by 316 (3 self) - Add to MetaCart
Abstract. We give a parallel implementation of merge sort on a CREW PRAM that uses n processors and O(logn) time; the con-stant in the running time is small. We also give a more complex version of the algorithm for the EREW PRAM; it also uses n processors and O(logn) time. The constant in the running time is still moderate, though not as small. 1.

High Speed Switch Scheduling for Local Area Networks

by Thomas E. Anderson, Susan S. Owicki, James B. Saxe, Charles P. Thacker - ACM Transactions on Computer Systems , 1993
"... Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for an arbitrary topology point-to-point network with link speeds of up to one gigabit per second. The s ..."
Abstract - Cited by 247 (3 self) - Add to MetaCart
Current technology trends make it possible to build communication networks that can support high performance distributed computing. This paper describes issues in the design of a prototype switch for an arbitrary topology point-to-point network with link speeds of up to one gigabit per second. The switch deals in fixed-length ATM-style cells, which it can process at a rate of 37 million cells per second. It provides high bandwidth and low latency for datagram traffic. In addition, it supports real-time traffic by providing bandwidth reservations with guaranteed latency bounds. The key to the switch's operation is a technique called parallel iterative matching, which can quickly identify a set of conflict-free cells for transmission in a time slot. Bandwidth reservations are accommodated in the switch by building a fixed schedule for transporting cells from reserved flows across the switch; parallel iterative matching can fill unused slots with datagram traffic. Finally, we note that pa...

Scans as primitive parallel operations

by G E BLELLOCH - IEEE Trans. Comput , 1989
"... ..."
Abstract - Cited by 185 (13 self) - Add to MetaCart
Abstract not found
(Show Context)

Citation Context

...ter all, our architectural justification claimed that the scan primitives bring the P-RAM models closer to reality. Table 4 compares implementations of the split radix sort and Batcher's bitonic sort =-=[4]-=- on the Connection Machine. We choose the bitonic sort for comparison because it is commonly cited as the most practical parallel sorting algorithm. I have also looked into implementing Cole's sort [1...

Translating pseudo-boolean constraints into SAT

by Niklas Eén, Niklas Sörensson - Journal on Satisfiability, Boolean Modeling and Computation , 2006
"... In this paper, we describe and evaluate three different techniques for translating pseudoboolean constraints (linear constraints over boolean variables) into clauses that can be handled by a standard SAT-solver. We show that by applying a proper mix of translation techniques, a SAT-solver can perfor ..."
Abstract - Cited by 181 (2 self) - Add to MetaCart
In this paper, we describe and evaluate three different techniques for translating pseudoboolean constraints (linear constraints over boolean variables) into clauses that can be handled by a standard SAT-solver. We show that by applying a proper mix of translation techniques, a SAT-solver can perform on a par with the best existing native pseudo-boolean solvers. This is particularly valuable in those cases where the constraint problem of interest is naturally expressed as a SAT problem, except for a handful of constraints. Translating those constraints to get a pure clausal problem will take full advantage of the latest improvements in SAT research. A particularly interesting result of this work is the efficiency of sorting networks to express pseudo-boolean constraints. Although tangential to this presentation, the result gives a suggestion as to how synthesis tools may be modified to produce arithmetic circuits more suitable for SAT based reasoning. Keywords: pseudo-Boolean, SAT-solver, SAT translation, integer linear programming
(Show Context)

Citation Context

... computation to buckets of 1-bits, 2-bits, 4-bits, and so on for each power of 2— the unary representation permits us the use of any base for the coefficients, 7. MINISAT+ uses odd-even merge sorters =-=[9]-=-. 13sN. Eén and N. Sörensson adderTree(vec〈queue〈signal〉〉 buckets, vec〈signal〉 result) { for (i = 0; i < buckets.size(); i++) { while (buckets[i].size() ≥ 3) { (x,y,z) = buckets[i].dequeue3 () buckets...

The Vesta Parallel File System

by Peter F. Corbett, Dror G. Feitelson - ACM TRANSACTIONS ON COMPUTER SYSTEMS , 1996
"... ..."
Abstract - Cited by 167 (2 self) - Add to MetaCart
Abstract not found

Photon Mapping on Programmable Graphics Hardware

by Timothy J. Purcell, Craig Donner, Mike Cammarano, Henrik Wann Jensen, Pat Hanrahan - GRAPHICS HARDWARE , 2003
"... We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadth-first photon tracing to distribute photons using the GPU. The photons are stored in a grid-based photon map that is constructed directly on the graphics hardware using one of two met ..."
Abstract - Cited by 153 (4 self) - Add to MetaCart
We present a modified photon mapping algorithm capable of running entirely on GPUs. Our implementation uses breadth-first photon tracing to distribute photons using the GPU. The photons are stored in a grid-based photon map that is constructed directly on the graphics hardware using one of two methods: the first method is a multipass technique that uses fragment programs to directly sort the photons into a compact grid. The second method uses a single rendering pass combining a vertex program and the stencil buffer to route photons to their respective grid cells, producing an approximate photon map. We also present an efficient method for locating the nearest photons in the grid, which makes it possible to compute an estimate of the radiance at any surface location in the scene. Finally, we describe a breadth-first stochastic ray tracer that uses the photon map to simulate full global illumination directly on the graphics hardware. Our implementation demonstrates that current graphics hardware is capable of fully simulating global illumination with progressive, interactive feedback to the user.

GPUTeraSort: High Performance Graphics Coprocessor Sorting for Large Database Management

by Naga K. Govindaraju, Jim Gray, Ritesh Kumar, Dinesh Manocha , 2006
"... We present a new algorithm, GPUTeraSort, to sort billionrecord wide-key databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We ..."
Abstract - Cited by 148 (10 self) - Add to MetaCart
We present a new algorithm, GPUTeraSort, to sort billionrecord wide-key databases using a graphics processing unit (GPU) Our algorithm uses the data and task parallelism on the GPU to perform memory-intensive and computeintensive tasks while the CPU is used to perform I/O and resource management. We therefore exploit both the highbandwidth GPU memory interface and the lower-bandwidth CPU main memory interface and achieve higher memory bandwidth than purely CPU-based algorithms. GPUTera-Sort is a two-phase task pipeline: (1) read disk, build keys, sort using the GPU, generate runs, write disk, and (2) read, merge, write. It also pipelines disk transfers and achieves near-peak I/O performance. We have tested the performance of GPUTeraSort on billion-record files using the standard Sort benchmark. In practice, a 3 GHz Pentium IV PC with $265 NVIDIA 7800 GT GPU is significantly faster than optimized CPU-based algorithms on much faster processors, sorting 60GB for a penny; the best reported PennySort price-performance. These results suggest that a GPU co-processor can significantly improve performance on large data processing tasks. 1.
(Show Context)

Citation Context

...ray is then swapped with the input array, and the comparisons are iteratively performed until the whole array is sorted. These sorting network algorithms map well to GPUs. The bitonic sorting network =-=[10]-=- sorts bitonic sequences in multiple merge steps. A bitonic sequence is a monotonic ascending or descending sequence. Given an input array a = (a0, a1, . . . , an), the bitonic sorting algorithm proce...

Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University