Results 1  10
of
36
Going deeper with convolutions
, 2014
"... We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet LargeScale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improv ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
We propose a deep convolutional neural network architecture codenamed Inception, which was responsible for setting the new state of the art for classification and detection in the ImageNet LargeScale Visual Recognition Challenge 2014 (ILSVRC14). The main hallmark of this architecture is the improved utilization of the computing resources inside the network. This was achieved by a carefully crafted design that allows for increasing the depth and width of the network while keeping the computational budget constant. To optimize quality, the architectural decisions were based on the Hebbian principle and the intuition of multiscale processing. One particular incarnation used in our submission for ILSVRC14 is called GoogLeNet, a 22 layers deep network, the quality of which is assessed in the context of classification and detection.
GraphX: Graph Processing in a Distributed Dataflow Framework
 USENIX ASSOCIATION 11TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION (OSDI ’14)
, 2014
"... In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In thi ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
In pursuit of graph processing performance, the systems community has largely abandoned generalpurpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern generalpurpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, groupby). To achieve performance parity with specialized graph systems, GraphX recasts graphspecific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings lowcost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.
HYPERGRAPH PARTITIONINGBASED FILLREDUCING ORDERING
, 2009
"... A typical first step of a direct solver for linear system Mx = b is reordering of symmetric matrix M to improve execution time and space requirements of the solution process. In this work, we propose a novel nesteddissectionbased ordering approach that utilizes hypergraph partitioning. Our approac ..."
Abstract

Cited by 13 (6 self)
 Add to MetaCart
A typical first step of a direct solver for linear system Mx = b is reordering of symmetric matrix M to improve execution time and space requirements of the solution process. In this work, we propose a novel nesteddissectionbased ordering approach that utilizes hypergraph partitioning. Our approach is based on formulation of graph partitioning by vertex separator (GPVS) problem as a hypergraph partitioning problem. This new formulation is immune to deficiency of GPVS in a multilevel framework hence enables better orderings. In matrix terms, our method relies on the existence of a structural factorization of the input M matrix in the form of M = AAT (or M = AD2AT). We show that the partitioning of the rownet hypergraph representation of rectangular matrix A induces a GPVS of the standard graph representation of matrix M. In the absence of such factorization, we also propose simple, yet effective structural factorization techniques that are based on finding an edge clique cover of the standard graph representation of matrix M, and hence applicable to any arbitrary symmetric matrix M. Our experimental evaluation has shown that the proposed method achieves better ordering in comparison to stateoftheart graphbased ordering tools even for symmetric matrices where structural M = AAT factorization is not provided as an input. For matrices coming from linear programming problems, our method enables even faster and better orderings.
Combinatorial problems in solving linear systems
, 2009
"... Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Numerical linear algebra and combinatorial optimization are vast subjects; as is their interaction. In virtually all cases there should be a notion of sparsity for a combinatorial problem to arise. Sparse matrices therefore form the basis of the interaction of these two seemingly disparate subjects. As the core of many of today’s numerical linear algebra computations consists of the solution of sparse linear system by direct or iterative methods, we survey some combinatorial problems, ideas, and algorithms relating to these computations. On the direct methods side, we discuss issues such as matrix ordering; bipartite matching and matrix scaling for better pivoting; task assignment and scheduling for parallel multifrontal solvers. On the iterative method side, we discuss preconditioning techniques including incomplete factorization preconditioners, support graph preconditioners, and algebraic multigrid. In a separate part, we discuss the block triangular form of sparse matrices.
Multithreaded Clustering for Multilevel Hypergraph Partitioning
 in IPDPS
, 2012
"... Abstract—Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The currentstateofthe art software libraries that provide tool support for the hypergraph partitioning problem are designed and implemented before the ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
(Show Context)
Abstract—Requirements for efficient parallelization of many complex and irregular applications can be cast as a hypergraph partitioning problem. The currentstateofthe art software libraries that provide tool support for the hypergraph partitioning problem are designed and implemented before the gamechanging advancements in multicore computing. Hence, analyzing the structure of those tools for designing multithreaded versions of the algorithms is a crucial tasks. The most successful partitioning tools are based on the multilevel approach. In this approach, a given hypergraph is coarsened to a much smaller one, a partition is obtained on the the smallest hypergraph, and that partition is projected to the original hypergraph while refining it on the intermediate hypergraphs. The coarsening operation corresponds to clustering the vertices of a hypergraph and is the most time consuming task in a multilevel partitioning tool. We present three efficient multithreaded clustering algorithms which are very suited for multilevel partitioners. We compare their performance with that of the ones currently used in today’s hypergraph partitioners. We show on a large number of real life hypergraphs that our implementations, integrated into a commonly used partitioning library PaToH, achieve good speedups without reducing the clustering quality. KeywordsMultilevel hypergraph partitioning; coarsening; multithreaded clustering algorithms; multicore programming I.
On sharedmemory parallelization of a sparse matrix scaling algorithm
"... Abstract—We discuss efficient shared memory parallelization of sparse matrix computations whose main traits resemble to those of the sparse matrixvector multiply operation. Such computations are difficult to parallelize because of the relatively small computational granularity characterized by smal ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
(Show Context)
Abstract—We discuss efficient shared memory parallelization of sparse matrix computations whose main traits resemble to those of the sparse matrixvector multiply operation. Such computations are difficult to parallelize because of the relatively small computational granularity characterized by small number of operations per each data access. Our main application is a sparse matrix scaling algorithm which is more memory bound than the sparse matrix vector multiplication operation. We take the application and parallelize it using the standard OpenMP programming principles. Apart from the common race condition avoiding constructs, we do not reorganize the algorithm. Rather, we identify associated performance metrics and describe models to optimize them. By using these models, we implement parallel matrix scaling algorithms for two wellknown sparse matrix storage formats. Experimental results show that simple parallelization attempts which leave data/work partitioning to the runtime scheduler can suffer from the overhead of avoiding race conditions especially when the number of threads increases. The proposed algorithms perform better than these algorithms by optimizing the identified performance metrics and reducing the overhead. KeywordsSharedmemory parallelization, sparse matrices, hypergraphs, matrix scaling I.
UMPa: A Multiobjective, multilevel partitioner for communication minimization
"... Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication m ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
(Show Context)
Abstract. We propose a directed hypergraph model and a refinement heuristic to distribute communicating tasks among the processing units in a distributed memory setting. The aim is to achieve load balance and minimize the maximum data sent by a processing unit. We also take two other communication metrics into account with a tiebreaking scheme. With this approach, task distributions causing an excessive use of network or a bottleneck processor which participates to almost all of the communication are avoided. We show on a large number of problem instances that our model improves the maximum data sent by a processor up to 34 % for parallel environments with 4, 16, 64 and 256 processing units compared to the state of the art which only minimizes the total communication volume.
01ER25509. Autotuning Sparse MatrixVector Multiplication for Multicore
"... All rights reserved. ..."