Results 1  10
of
63
Evolving feature selection
 Tuv E., Peng H., Ding C., Long F., Berens M., Parsons L., Zhao Z., Yu L., Forman G
"... twohybrid junk sequences contain ..."
Parallel and Distributed Graph Cuts by Dual Decomposition
, 2010
"... Graph cuts methods are at the core of many stateoftheart algorithms in computer vision due to their efficiency in computing globally optimal solutions. In this paper, we solve the maximum flow/minimum cut problem in parallel by splitting the graph into multiple parts and hence, further increase th ..."
Abstract

Cited by 25 (3 self)
 Add to MetaCart
(Show Context)
Graph cuts methods are at the core of many stateoftheart algorithms in computer vision due to their efficiency in computing globally optimal solutions. In this paper, we solve the maximum flow/minimum cut problem in parallel by splitting the graph into multiple parts and hence, further increase the computational efficacy of graph cuts. Optimality of the solution is guaranteed by dual decomposition, or more specifically, the solutions to the subproblems are constrained to be equal on the overlap with dual variables. We demonstrate that our approach both allows (i) faster processing on multicore computers and (ii) the capability to handle larger problems by splitting the graph across multiple computers on a distributed network. Even though our approach does not give a theoretical guarantee of speedup, an extensive empirical evaluation on several applications with many different data sets consistently shows good performance. An open source implementation of the dual decomposition method is also made publicly available.
P.J.: Singular value decomposition on gpu using cuda
 In: IPDPS ’09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing
, 2009
"... Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high performance coprocessors due to their tremendous computing power. In this paper, we present the implementation of singular ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Linear algebra algorithms are fundamental to many computing applications. Modern GPUs are suited for many general purpose processing tasks and have emerged as inexpensive high performance coprocessors due to their tremendous computing power. In this paper, we present the implementation of singular value decomposition (SVD) of a dense matrix on GPU using the CUDA programming model. SVD is implemented using the twin steps of bidiagonalization followed by diagonalization. It has not been implemented on the GPU before. Bidiagonalization is implemented using a series of Householder transformations which map well to BLAS operations. Diagonalization is performed by applying the implicitly shifted QR algorithm. Our complete SVD implementation outperforms the MATLAB and Intel R○Math Kernel Library (MKL) LAPACK implementation significantly on the CPU. We show a speedup of upto 60 over the MATLAB implementation and upto 8 over the Intel MKL implementation on a Intel Dual Core 2.66GHz PC on NVIDIA GTX 280 for large matrices. We also give results for very large matrices on NVIDIA Tesla S1070. 1.
Parallel graphcuts by adaptive bottomup merging
 In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
, 2010
"... Graphcuts optimization is prevalent in vision and graphics problems. It is thus of great practical importance to parallelize the graphcuts optimization using today’s ubiquitous multicore machines. However, the current best serial algorithm by Boykov and Kolmogorov [4] (called the BK algorithm) ..."
Abstract

Cited by 15 (0 self)
 Add to MetaCart
(Show Context)
Graphcuts optimization is prevalent in vision and graphics problems. It is thus of great practical importance to parallelize the graphcuts optimization using today’s ubiquitous multicore machines. However, the current best serial algorithm by Boykov and Kolmogorov [4] (called the BK algorithm) still has the superior empirical performance. It is nontrivial to parallelize as expensive synchronization overhead easily offsets the advantage of parallelism. In this paper, we propose a novel adaptive bottomup approach to parallelize the BK algorithm. We first uniformly partition the graph into a number of regularlyshaped disjoint subgraphs and process them in parallel, then we incrementally merge the subgraphs in an adaptive way to obtain the global optimum. The new algorithm has three benefits: 1) it is more cachefriendly within smaller subgraphs; 2) it keeps balanced workloads among computing cores; 3) it causes little overhead and is adaptable to the number of available cores. Extensive experiments in common applications such as 2D/3D image segmentations and 3D surface fitting demonstrate the effectiveness of our approach. 1.
Interactive Texture Segmentation using Random Forests and Total Variation
, 2009
"... Common methods for interactive texture segmentation rely on probability maps based on low dimensional features such as e.g. intensity or color, that are usually modeled using basic learning algorithms such as histograms or Gaussian Mixture Models. The use of low level features allows for fast genera ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
Common methods for interactive texture segmentation rely on probability maps based on low dimensional features such as e.g. intensity or color, that are usually modeled using basic learning algorithms such as histograms or Gaussian Mixture Models. The use of low level features allows for fast generation of these hypotheses but limits applicability to a small class of images. We address this problem by learning complex descriptors with Random Forests and exploiting their inherent parallelism in a GPU implementation. The segmentation itself is based on a convex energy functional that uses weighted Total Variation regularization and a pointwise data term allowing for continuous foreground/background membership hypotheses. Its globally optimal solution is obtained by a fast primaldual algorithm providing a reasonable convergence criterion. As a result, we present a versatile interactive texture segmentation framework. We show experiments with natural, artificial and medical data and demonstrate superior results compared to two recent approaches.
K.Srinathan, "A performance prediction model for the CUDA GPGPU platform
 the 16th IEEE International Conference on High Performance Computing (HiPC
, 2009
"... The significant growth in computational power of modern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. However, despite their popularity, there is no perfor ..."
Abstract

Cited by 14 (2 self)
 Add to MetaCart
(Show Context)
The significant growth in computational power of modern Graphics Processing Units(GPUs) coupled with the advent of general purpose programming environments like NVIDA’s CUDA, has seen GPUs emerging as a very popular parallel computing platform. However, despite their popularity, there is no performance model of any GPGPU programming environment. The absence of such a model makes it difficult to definitively assess the suitability of the GPU for solving a particular problem and is a significant impediment to the mainstream adoption of GPUs as a massively parallel (super)computing platform. In this paper we present a performance prediction model for the CUDA GPGPU platform. This model encompasses the various facets of the GPU architecture like scheduling, memory hierarchy and pipelining among others. We also perform experiments that demonstrate the effects of various memory access strategies. The proposed model can be used to analyze pseudo code for a CUDA kernel to obtain a performance estimate, in a way that is similar to performing asymptotic analysis. We illustrate the usage of our model and its accuracy, with three case studies: Matrix Multiplication, List Ranking, and histogram generation. 1
Hardwareefficient belief propagation
 in Proc. CVPR
, 2009
"... Abstract—Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixelwise, and sequential operations of BP make ..."
Abstract

Cited by 14 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Loopy belief propagation (BP) is an effective solution for assigning labels to the nodes of a graphical model such as the Markov random field (MRF), but it requires high memory, bandwidth, and computational costs. Furthermore, the iterative, pixelwise, and sequential operations of BP make it difficult to parallelize the computation. In this paper, we propose two techniques to address these issues. The first technique is a new message passing scheme named tilebased belief propagation that reduces the memory and bandwidth to a fraction of the ordinary BP algorithms without performance degradation by splitting the MRF into many tiles and only storing the messages across the neighboring tiles. The tilewise processing also enables data reuse and pipeline, resulting in efficient hardware implementation. The second technique is an O(L) parallel message construction algorithm that exploits the properties of robust functions for parallelization. We apply these two techniques to a VLSI circuit for stereo matching that generates highresolution disparity maps in near realtime. We also implement the proposed schemes on GPU which is fourtime faster than standard BP on GPU. Index Terms—Belief propagation, Markov random field, energy minimization, embedded systems, VLSI circuit design, generalpurpose computation on GPU (GPGPU). M I.
Medusa: Simplified Graph Processing on GPUs
, 2013
"... Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to ..."
Abstract

Cited by 13 (4 self)
 Add to MetaCart
Graphs are common data structures for many applications, and efficient graph processing is a must for application performance. Recently, the graphics processing unit (GPU) has been adopted to accelerate various graph processing algorithms such as BFS and shortest paths. However, it is difficult to write correct and efficient GPU programs and even more difficult for graph processing due to the irregularities of graph structures. To simplify graph processing on GPUs, we propose a programming framework called Medusa which enables developers to leverage the capabilities of GPUs by writing sequential C/C++ code. Medusa offers a small set of userdefined APIs, and embraces a runtime system to automatically execute those APIs in parallel on the GPU. We develop a series of graphcentric optimizations based on the architecture features of GPUs for efficiency. Additionally, Medusa is extended to execute on multiple GPUs within a machine. Our experiments show that (1) Medusa greatly simplifies implementation of GPGPU programs for graph processing, with many fewer lines of source code written by developers; (2) The optimization techniques significantly improve the performance of the runtime system, making its performance comparable with or better than manually tuned GPU graph operations.
Fast Joint Estimation of Silhouettes and Dense 3D Geometry from Multiple Images " Ieee transactionson pattern analysis and machine intelligence ,Digital Object Indentifier .1109/TPAMI.2011.150 0162
, 2011
"... Abstract—We propose a probabilistic formulation of joint silhouette extraction and 3D reconstruction given a series of calibrated 2D images. Instead of segmenting each image separately in order to construct a 3D surface consistent with the estimated silhouettes, we compute the most probable 3D shape ..."
Abstract

Cited by 12 (1 self)
 Add to MetaCart
(Show Context)
Abstract—We propose a probabilistic formulation of joint silhouette extraction and 3D reconstruction given a series of calibrated 2D images. Instead of segmenting each image separately in order to construct a 3D surface consistent with the estimated silhouettes, we compute the most probable 3D shape that gives rise to the observed color information. The probabilistic framework, based on Bayesian inference, enables robust 3D reconstruction by optimally taking into account the contribution of all views. We solve the arising maximum a posteriori shape inference in a globally optimal manner by convex relaxation techniques in a spatially continuous representation. For an interactively provided user input in the form of scribbles specifying foreground and background regions, we build corresponding color distributions as multivariate Gaussians and find a volume occupancy that best fits to this data in a variational sense. Compared to classical methods for silhouettebased multiview reconstruction, the proposed approach does not depend on initialization and enjoys significant resilience to violations of the model assumptions due to background clutter, specular reflections, and camera sensor perturbations. In experiments on several realworld data sets, we show that exploiting a silhouette coherency criterion in a multiview setting allows for dramatic improvements of silhouette quality over independent 2D segmentations without any significant increase of computational efforts. This results in more accurate visual hull estimation, needed by a multitude of imagebased modeling approaches. We made use of recent advances in parallel computing with a GPU implementation of the proposed method generating reconstructions on volume grids of more than 20 million voxels in up to 4.41 seconds. Index Terms—Shape from silhouettes, interactive segmentation, convex optimization. Ç 1