#### DMCA

## The Tao of Parallelism in Algorithms (2011)

### Cached

### Download Links

Venue: | In PLDI |

Citations: | 41 - 12 self |

### Citations

10619 | Introduction to Algorithms.
- Cormen, Leiserson, et al.
- 1990
(Show Context)
Citation Context ...red to as the Church-Rosser property since the most famous example of this behavior is β-reduction in λ-calculus. Dataflow graph execution [4, 15] and the preflow-push algorithm for computing maxflow =-=[12]-=- also exhibit this behavior. In other algorithms, the output may be different for different choices of active nodes, but all such outputs are acceptable, so the implementation can still pick any activ... |

3439 | Mapreduce: Simplified data processing on large clusters.
- Dean, Ghemawat
- 2008
(Show Context)
Citation Context ...SCM) algorithm for phylogeny reconstruction implements this algorithm for a set of trees. These algorithms can be expressed using an ordered set iterator. The reduce operation in the map-reduce model =-=[13]-=- can be implemented in many ways; an in-place reduction can be implemented by an unordered iterator in which the operator replaces randomly chosen pairs of elements from the set with the result of app... |

1936 | Information Theory, Inference and Learning Algorithms.
- MacKay
- 2003
(Show Context)
Citation Context ...etc. are used to extract network properties [10]. • Machine-learning algorithms like belief propagation and survey propagation are based on message-passing in a factor graph, a sparse bipartite graph =-=[44]-=-. • Data-mining algorithms like k-means and agglomerative clustering operate on sets and multisets [59]. • Simulations of electrical circuits and battlefields often use event-driven (discrete-event) s... |

1707 |
A Discipline of Programming.
- Dijkstra
- 1976
(Show Context)
Citation Context ...atisfy the quality constraints and are acceptable outcomes of the refinement process [11]. This is an example of Dijkstra’s don’t-care non-determinism (also known as committed-choice non-determinism) =-=[16]-=-. Figure 2 shows the pseudocode for mesh refinement. Each iteration of the while-loop refines one bad triangle; we call this computation an activity. DMR can be performed in parallel since bad triangl... |

1106 |
Introduction to Data Mining
- Tan, Steinbach, et al.
- 2006
(Show Context)
Citation Context ... and survey propagation are based on message-passing in a factor graph, a sparse bipartite graph [44]. • Data-mining algorithms like k-means and agglomerative clustering operate on sets and multisets =-=[59]-=-. • Simulations of electrical circuits and battlefields often use event-driven (discrete-event) simulation [49] over networks of nodes. • Optimizing compilers perform iterative and elimination-based d... |

1031 | Transactional Memory: Architectural Support for Lock-free Data Structures.
- Herlihy, Moss
- 1993
(Show Context)
Citation Context ... each graph API method that modifies the graph makes a copy of the data before modification, as is done in other systems that use speculation such as transactional memory and thread-level speculation =-=[25, 29, 55, 61]-=-. If active elements are not ordered, the activity commits when the application of the operator is complete, and all acquired locks are then released. If active elements are ordered, active nodes can ... |

980 | Virtual time.
- Jefferson
- 1983
(Show Context)
Citation Context ...nted the development of techniques and tools that make it easier to produce parallel implementations. Domain specialists have written parallel programs for some of the algorithms discussed above (see =-=[5, 20, 31, 33]-=- among others). There are also parallel graph libraries such as Boost [22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these imp... |

763 | Cilk: An Efficient Multithreaded Runtime System.
- Blumofe, Joerg, et al.
- 1996
(Show Context)
Citation Context ... If all data dependences are subsumed by call/return control dependences, the recursive calls can be executed safely in parallel. The Cilk project has explored this approach to exploiting parallelism =-=[9]-=-. This approach obviously requires an efficient partitioner for the data structure. Partitioning is straightforward for structured and semi-structured topologies, so most task-parallelism studies have... |

722 |
An Introduction to Parallel Algorithms,
- JaJa
- 1992
(Show Context)
Citation Context ...that we call tao-analysis after the three key dimensions of the analysis. In the literature on parallel programming, there are many abstractions of parallel machines, such as a variety of PRAM models =-=[32]-=-. Tao-analysis can be viewed as an abstraction of algorithms that distills out properties important for parallelization, hiding unnecessary detail. Tao-analysis reveals that a generalized data-paralle... |

634 | Multilevel k-way partitioning scheme for irregular graphs, - Karypis, Kumar - 1998 |

630 | Program Analysis and Specialization for the C Programming Language.
- Andersen
- 1994
(Show Context)
Citation Context .... In contrast to Prim’s algorithm, this algorithm is unordered because particles can be inserted into the tree in any order. Andersen-style inclusion-based points-to analysis: This compiler algorithm =-=[3]-=- is an example of a refinement morph on general graphs. It builds a points-to graph in which nodes represent pro-1 Graph g = / / read i n i n p u t graph 2 Tree mst ; / / c r e a t e empty t r e e 3 ... |

482 | Language support for lightweight transactions.
- Harris, Fraser
- 2003
(Show Context)
Citation Context ... each graph API method that modifies the graph makes a copy of the data before modification, as is done in other systems that use speculation such as transactional memory and thread-level speculation =-=[25, 29, 55, 61]-=-. If active elements are not ordered, the activity commits when the application of the operator is complete, and all acquired locks are then released. If active elements are ordered, active nodes can ... |

450 | A simple parallel algorithm for the maximal independent set problem.
- Luby
- 1986
(Show Context)
Citation Context ...nodes represent the active nodes from the algorithm, and edges represent conflicts between activities. Luby’s randomized parallel algorithm can be used to find a maximal independent set of activities =-=[43]-=-, and this set of activities can be executed in parallel without synchronization. This approach can be viewed as building and exploiting the dependence graph level by level, with barrier synchronizati... |

360 |
Compilers - principles, techniques, and tools,
- Aho
- 1986
(Show Context)
Citation Context ...driven (discrete-event) simulation [49] over networks of nodes. • Optimizing compilers perform iterative and elimination-based dataflow analysis on structures like inter-procedural controlflow graphs =-=[1]-=-. • Even in computational science, n-body methods use spatial decomposition trees [6], and finite-element methods use 2D and 3D meshes produced using algorithms like Delaunay mesh generation and refin... |

294 |
A hierarchical o(n log n) force-calculation algorithm.
- Barnes, Hut
- 1986
(Show Context)
Citation Context ...s perform iterative and elimination-based dataflow analysis on structures like inter-procedural controlflow graphs [1]. • Even in computational science, n-body methods use spatial decomposition trees =-=[6]-=-, and finite-element methods use 2D and 3D meshes produced using algorithms like Delaunay mesh generation and refinement [11]. Unfortunately, we currently have few insights into the structure of paral... |

288 | Distributed discrete-event simulation,”
- Misra
- 1986
(Show Context)
Citation Context ...ining algorithms like k-means and agglomerative clustering operate on sets and multisets [59]. • Simulations of electrical circuits and battlefields often use event-driven (discrete-event) simulation =-=[49]-=- over networks of nodes. • Optimizing compilers perform iterative and elimination-based dataflow analysis on structures like inter-procedural controlflow graphs [1]. • Even in computational science, n... |

267 |
Optimizing Compilers for Modern Architectures.
- Allen, Kennedy
- 2002
(Show Context)
Citation Context ... dependence analysis in the literature, and methods based on integer linear programming aresuccessful for regular, dense array programs in which array subscripts are affine functions of loop indices =-=[35]-=-; LU with pivoting requires fractal symbol analysis [48]. Compile-time coordination is also possible regardless of the topology if the algorithm is a topology-driven local computation and the neighbor... |

261 | The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization. In
- Rauchwerger, Padua
- 1995
(Show Context)
Citation Context ... each graph API method that modifies the graph makes a copy of the data before modification, as is done in other systems that use speculation such as transactional memory and thread-level speculation =-=[25, 29, 55, 61]-=-. If active elements are not ordered, the activity commits when the application of the operator is complete, and all acquired locks are then released. If active elements are ordered, active nodes can ... |

244 | Petri Nets,"
- Peterson
- 1977
(Show Context)
Citation Context ...erent for different choices of active nodes, but all such outputs are acceptable, so the implementation can still pick any active node for execution. Delaunay mesh refinement and Petri net simulation =-=[52]-=- are examples. The preflow-push algorithm also exhibits this behavior if the algorithm outputs both the min-cut and the max-flow. Even for unordered algorithms, iteration execution order may affect ca... |

237 | Programming Parallel Algorithms.
- Blelloch
- 1996
(Show Context)
Citation Context ...orhood is a large part of the graph, activities are likely to conflict and intra-operator parallelism may be more important. Inter/intra-operator parallelism is an example of nested data-parallelism =-=[8]-=-. The sparsity of the graph usually plays a major role in this balance between inter- and intra-operator parallelism. For many operators, neighborhoods include all the neighbors of the active node, so... |

232 | T.C.: "A Scalable Approach to Thread-Level Speculation". In:
- Steffan, Colohan, et al.
- 2000
(Show Context)
Citation Context ... each graph API method that modifies the graph makes a copy of the data before modification, as is done in other systems that use speculation such as transactional memory and thread-level speculation =-=[25, 29, 56, 60, 64]-=-. If active elements are not ordered, the activity commits when the application of the operator is complete, and all acquired locks are then released. If active elements are ordered, active nodes can ... |

209 |
Is it a tree, a dag, or a cyclic graph? a shape analysis for heap-directed pointers in c,”
- Ghiya, Hendren
- 1996
(Show Context)
Citation Context ...[22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these implementations. Another approach is to use points-to and shape analysis =-=[19, 27, 30]-=- to find data structure invariants that might be used to prove independence of computations. This ap-proach has been successful in parallelizing n-body methods like Barnes-Hut that are organized arou... |

207 | Speculative versioning cache.
- Gopal, Vijaykumar, et al.
- 1998
(Show Context)
Citation Context |

203 |
Executing a program on the MIT tagged-token dataflow architecture.
- ARVIND, NIKHIL
- 1990
(Show Context)
Citation Context ...in which active nodes are processed, a property that is referred to as the Church-Rosser property since the most famous example of this behavior is β-reduction in λ-calculus. Dataflow graph execution =-=[4, 15]-=- and the preflow-push algorithm for computing maxflow [12] also exhibit this behavior. In other algorithms, the output may be different for different choices of active nodes, but all such outputs are ... |

179 | Optimistic Parallelism Requires Abstractions. In
- Kulkarni, Pingali, et al.
- 2007
(Show Context)
Citation Context ...gorithms. To address this problem, we can use the Galois programming model, which is a sequential, object-oriented programming model (such as sequential Java), augmented with two Galois set iterators =-=[40]-=-: DEFINITION 1. Galois set iterators: • Unordered-set iterator: foreach (e in Set S) {B(e)} The loop body B(e) is executed for each element e of set S. The order in which iterations execute is indeter... |

178 |
Randomized incremental construction of Delaunay and voronoy diagrams.
- Guibas, Knuth, et al.
- 1990
(Show Context)
Citation Context ...e natural implementations of most operators such as Delaunay mesh refinement are cautious 2 . In contrast, the operator for the well-known Delaunay triangulation algorithm of Guibas, Knuth and Sharir =-=[23]-=- does not have a naturally cautious implementation. It performs graph mutations called edge flips, which are done incrementally. Unordered algorithms with cautious operator implementations can be exec... |

177 |
Guaranteed-quality mesh generation for curved surfaces, in:
- Chew
- 1993
(Show Context)
Citation Context ...en in computational science, n-body methods use spatial decomposition trees [6], and finite-element methods use 2D and 3D meshes produced using algorithms like Delaunay mesh generation and refinement =-=[11]-=-. Unfortunately, we currently have few insights into the structure of parallelism and locality in irregular algorithms, and this has stunted the development of techniques and tools that make it easier... |

168 | Parallelizing programs with recursive data structures
- Hendren, Nicolau
- 1990
(Show Context)
Citation Context ...[22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these implementations. Another approach is to use points-to and shape analysis =-=[19, 27, 30]-=- to find data structure invariants that might be used to prove independence of computations. This ap-proach has been successful in parallelizing n-body methods like Barnes-Hut that are organized arou... |

163 |
Programming with sets: and introduction to SETL.
- Schwartz, Dewar, et al.
- 1986
(Show Context)
Citation Context ... be useful for search problems. Some unordered algorithms can also be written using divide-and-conquer, as discussed in Section 7. Set iterators were first introduced in the SETL programming language =-=[57]-=- and can now be found in most object-oriented languages such as Java and C++. However, new elements cannot be added to sets while iterating over them, which is possible with Galois set iterators. Note... |

144 |
Dependence analysis for pointer variables.
- HORWITZ, PFEIFFER, et al.
- 1989
(Show Context)
Citation Context ...[22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these implementations. Another approach is to use points-to and shape analysis =-=[19, 27, 30]-=- to find data structure invariants that might be used to prove independence of computations. This ap-proach has been successful in parallelizing n-body methods like Barnes-Hut that are organized arou... |

133 | Exploiting coarse-grained task, data, and pipeline parallelism in stream programs.
- Gordon, Thies, et al.
- 2006
(Show Context)
Citation Context ...This is a topology-driven, unordered algorithm, which is a refinement morph for So since elements are added incrementally to So during execution. Streams: A stream operator in languages like StreamIt =-=[21]-=- is a refinement morph from its input streams to its output streams (streams are non-strict sequences [53]). Stateless and stateful stream operators can be expressed using unordered and ordered iterat... |

132 |
Network Analysis Methodological Foundations,
- Brandes, Erlebach, et al.
- 2005
(Show Context)
Citation Context ...a structures are extremely sparse graphs in which nodes represent people and edges represent relationships. Algorithms for betweenness-centrality, maxflow, etc. are used to extract network properties =-=[10]-=-. • Machine-learning algorithms like belief propagation and survey propagation are based on message-passing in a factor graph, a sparse bipartite graph [44]. • Data-mining algorithms like k-means and ... |

117 | Rendering complex scenes with memory-coherent ray tracing
- Pharr, Kolb, et al.
- 1997
(Show Context)
Citation Context ...me direction but may eventually diverge. Rather than using an a priori grouping of rays, groups can be dynamically updated as rays propagate through a scene, maintaining locality throughout execution =-=[52]-=-. 5.4 Discussion Some irregular applications have multiple phases, and each phase may use a different operator. N-body methods like Barnes-Hut are good examples; the tree-building phase uses a refinem... |

113 |
Algorithms + Data Structures = Programs.
- Wirth
- 1978
(Show Context)
Citation Context ...actions, Section 3 introduces a data-centric formulation of algorithms, called the operator formulation of algorithms. In the spirit of Niklaus Wirth’s aphorism “Program = Algorithm + Data Structure” =-=[63]-=-, we express algorithms in terms of operations on abstract data types (ADTs), independently of the concrete data structures used to implement the abstract data types. This formulation is the basis of ... |

110 |
Patterns for parallel programming,
- Mattson, Sanders, et al.
- 2004
(Show Context)
Citation Context ... Section 6.1. The structural analysis of algorithms presented in this section is different from the many efforts in the literature to identify parallelism patterns, such as the work of Mattson et al. =-=[45]-=-, Snir’s Parallel Processing Patterns [36] and the Berkeley motifs [51]. These approaches place related algorithms into categories (dense linear algebra, n-body methods, etc.), whereas this paper prop... |

89 | Transactional boosting: a methodology for highly-concurrent transactional objects.
- Herlihy, Koskinen
- 2008
(Show Context)
Citation Context ...ads synchronize using transactions, and the overheads of executing this synchronization construct are reduced using optimistic synchronization. In addition, most TM systems other than boosted systems =-=[28]-=- perform memory-level conflict checking rather than ADT-level conflict checking. This results in spurious conflicts that prevent efficient parallel execution. For example, if an iteration of an unorde... |

87 | Commutativity Analysis: A New Analysis Technique for Parallelizing Compilers.
- Rinard, Diniz
- 1997
(Show Context)
Citation Context ...data structure invariants that might be used to prove independence of computations. This ap-proach has been successful in parallelizing n-body methods like Barnes-Hut that are organized around trees =-=[17]-=-, but most of the applications discussed above use sparse graphs with no particular structure, so shape analysis techniques fail to find any parallelism. These difficulties have seemed insurmountable,... |

82 |
Sur la sphère vide, Izvestia Akademii Nauk SSSR,
- Delaunay
- 1934
(Show Context)
Citation Context ...ment meshing and graphics. The Delaunay triangulation for a set of points in a plane is the triangulation in which each triangle satisfies a certain geometric constraint called the Delaunay condition =-=[14]-=-. In many applications, triangles are required to satisfy additional quality constraints, and this is accomplished by a process of iterative refinement that re1 Available from http://iss.ices.utexas.e... |

75 | Load balancing and data locality in adaptive hierarchical nbody methods: Barnes-hut, fast multipole, and radiosity.
- Singh, Holt, et al.
- 1995
(Show Context)
Citation Context ...an be taken in Barnes-Hut: particles in the same region of space are likely to traverse similar parts of the octree during the force computation and thus can be processed together to improve locality =-=[58]-=-. In some cases, reader algorithms can be more substantially transformed to further enhance locality. In ray-tracing, bundled rays begin propagating through the scene in the same direction but may eve... |

68 | Distributed memory compiler design for sparse problems
- Wu, Das, et al.
- 1995
(Show Context)
Citation Context ...runtime after the input graph is given but before the program is executed. We call this strategy just-in-time coordination, and it is a generalization of the inspector-executor method of Saltz et al. =-=[64]-=-. For topology-driven algorithms, active nodes are known once the input is given, so the remaining problems are the determination of neighborhoods and ordering. In local computation algorithms amenabl... |

59 | The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code.
- Hardekopf, Lin
- 2007
(Show Context)
Citation Context ...ded, reducing synchronization costs. Figure 15 shows the performance of this approach on the eight core machine compared to a highly optimized sequential reference implementation written by Hardekopf =-=[24]-=-. The results are for two of the benchmarks from Hardekopf’s suite. This is the first successful parallelization of Andersen-style points-to analysis. 7. Extensions to the amorphous data-parallelism m... |

48 | Highly parallel sparse Cholesky factorization,
- Gilbert
- 1990
(Show Context)
Citation Context ...nted the development of techniques and tools that make it easier to produce parallel implementations. Domain specialists have written parallel programs for some of the algorithms discussed above (see =-=[5, 20, 31, 33]-=- among others). There are also parallel graph libraries such as Boost [22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these imp... |

36 | Stapl: an adaptive, generic parallel c++ library.
- An, Jula, et al.
- 2003
(Show Context)
Citation Context .... Domain specialists have written parallel programs for some of the algorithms discussed above (see [5, 20, 31, 33] among others). There are also parallel graph libraries such as Boost [22] and STAPL =-=[2]-=-. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these implementations. Another approach is to use points-to and shape analysis [19, 27, 30] to fi... |

34 |
Lifting sequential graph algorithms for distributed-memory parallel computation.”
- Gregor, Lumsdaine
- 2005
(Show Context)
Citation Context ...implementations. Domain specialists have written parallel programs for some of the algorithms discussed above (see [5, 20, 31, 33] among others). There are also parallel graph libraries such as Boost =-=[22]-=- and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these implementations. Another approach is to use points-to and shape analysis [19,... |

33 | Fast Shared-Memory Algorithms for Computing the Minimum Spanning Forest of Sparse Graphs
- Bader, Cong
(Show Context)
Citation Context ...nted the development of techniques and tools that make it easier to produce parallel implementations. Domain specialists have written parallel programs for some of the algorithms discussed above (see =-=[5, 20, 31, 33]-=- among others). There are also parallel graph libraries such as Boost [22] and STAPL [2]. However, it is difficult to extract broadly applicable abstractions, principles, and mechanisms from these imp... |

32 | How much parallelism is there in irregular applications?
- Kulkarni, Burtscher, et al.
- 2009
(Show Context)
Citation Context ...n be generated, the available parallelism at any step corresponds to the width of the dependence graph at that step. In this paper, we will present parallelism profiles produced by the ParaMeter tool =-=[38]-=-. Figure 8 shows the parallelism profiles of Boruvka’s and Prim’s minimal spanning tree (MST) algorithms; the input is a random graph. As explained in Section 5.1, Boruvka’s algorithm is unordered whi... |

16 |
Algebraic Approach to Graph Transformation Based on Single Pushout Derivations
- Löwe
- 1990
(Show Context)
Citation Context ...cribe a notation and an implementation for doing this [50]. The metaphor of operators acting on neighborhoods is reminiscent of notions in term-rewriting systems, such as graph grammars in particular =-=[18, 42]-=-. The semantics of functional language programs are usually specified using term rewriting systems that describe how expressions can be replaced by other expressions within the context of the function... |

16 | Fractal symbolic analysis
- Mateev, Menon, et al.
- 2001
(Show Context)
Citation Context ...d on integer linear programming aresuccessful for regular, dense array programs in which array subscripts are affine functions of loop indices [35]; LU with pivoting requires fractal symbol analysis =-=[48]-=-. Compile-time coordination is also possible regardless of the topology if the algorithm is a topology-driven local computation and the neighborhood of an activity is just the active node itself, igno... |

15 | Structure-Driven Optimizations for Amorphous Data-Parallel Programs
- Méndez-Lojo, Nguyen, et al.
- 2010
(Show Context)
Citation Context ...imization in their DMR implementation; phrasing this optimization in terms of algorithmic structure permits a general-purpose system like Galois to use it for other algorithms (see Méndez-Lojo et al. =-=[47]-=-). Prountzos et al. [54] present a shape analysis for determining cautiousness using static analysis. 4.3 Exploiting structure for coordinated scheduling Scheduling Strategy Coordinated Autonomous Com... |

14 |
Exploiting the commutativity lattice
- Kulkarni, Nguyen, et al.
- 2011
(Show Context)
Citation Context ...ven if they perform reduction operations on shared variables, for example. To keep the discussion simple, we do not describe these alternatives here but refer the interested reader to Kulkarni et al. =-=[39]-=-. The literature on PRAM algorithms has explored similar variations such as the EREW and combining CRCW models [32, 62]. 4.1.1 Parallelism profiles For an algorithm like matrix multiplication for whic... |

13 |
The Combining DAG: A Technique for Parallel Data Flow Analysis
- Kramer, Gupta, et al.
- 1994
(Show Context)
Citation Context ...alues. This idea can be used recursively on the reduced graph. Sub-graph contraction can be performed in parallel. This approach to parallel dataflow analysis has been studied by Ryder [41] and Soffa =-=[37]-=-. 5.1.3 General morph Some applications make structural updates that are neither refinements nor coarsenings, but many of these updates may nevertheless be performed in parallel. Some algorithms build... |

12 |
Parallel and distributed derivations in the single-pushout approach
- Ehrig, Löwe
- 1993
(Show Context)
Citation Context ...cribe a notation and an implementation for doing this [50]. The metaphor of operators acting on neighborhoods is reminiscent of notions in term-rewriting systems, such as graph grammars in particular =-=[18, 42]-=-. The semantics of functional language programs are usually specified using term rewriting systems that describe how expressions can be replaced by other expressions within the context of the function... |

9 |
Ghost Cell Pattern
- Kjolstad, Snir
- 2010
(Show Context)
Citation Context ...lgorithms presented in this section is different from the many efforts in the literature to identify parallelism patterns, such as the work of Mattson et al. [45], Snir’s Parallel Processing Patterns =-=[36]-=- and the Berkeley motifs [51]. These approaches place related algorithms into categories (dense linear algebra, n-body methods, etc.), whereas this paper proposes structural decompositions of irregula... |

9 |
Efficient demand-driven evaluation. part 1
- Pingali, Arvind
- 1985
(Show Context)
Citation Context ...d incrementally to So during execution. Streams: A stream operator in languages like StreamIt [21] is a refinement morph from its input streams to its output streams (streams are non-strict sequences =-=[53]-=-). Stateless and stateful stream operators can be expressed using unordered and ordered iteration on sequences. Prim’s MST algorithm: Most algorithms that build trees in a top-down fashion use refinem... |

8 | Ordered vs. unordered: a comparison of parallelism and work-efficiency in irregular algorithms
- Hassaan, Burtscher, et al.
- 2011
(Show Context)
Citation Context ...can be solved by both ordered and unordered algorithms, but the critical path is shorter for the unordered algorithm although it may perform more work than its ordered counterpart (see Hassaan et al. =-=[26]-=- for more details). 4.2 Exploiting structure to reduce overheads The overheads of the baseline system can be reduced by exploiting structure when it is present. The following structure is very importa... |

8 | Sparse parallel delaunay mesh refinement
- Hudson, Miller, et al.
- 2007
(Show Context)
Citation Context |

8 |
Synthesizing concurrent schedulers for irregular algorithms.
- Nguyen, Pingali
- 2011
(Show Context)
Citation Context ...may affect cache performance and the number of executed iterations, so control of iteration order is useful for efficiency. Nguyen and Pingali describe a notation and an implementation for doing this =-=[50]-=-. The metaphor of operators acting on neighborhoods is reminiscent of notions in term-rewriting systems, such as graph grammars in particular [18, 42]. The semantics of functional language programs ar... |

6 | Engineering a compact parallel Delaunay algorithm in 3D
- Blandford, Blelloch, et al.
- 2006
(Show Context)
Citation Context ... be executed speculatively without buffering updates or making backup copies of modified data because all conflicts are detected during the read-only phase of the operator execution. Blandford et al. =-=[7]-=- exploit this optimization in their DMR implementation; phrasing this optimization in terms of algorithmic structure permits a general-purpose system like Galois to use it for other algorithms (see Mé... |

6 | A comprehensive approach to parallel data flow analysis
- Lee, Ryder
- 1992
(Show Context)
Citation Context ...mine dataflow values. This idea can be used recursively on the reduced graph. Sub-graph contraction can be performed in parallel. This approach to parallel dataflow analysis has been studied by Ryder =-=[41]-=- and Soffa [37]. 5.1.3 General morph Some applications make structural updates that are neither refinements nor coarsenings, but many of these updates may nevertheless be performed in parallel. Some a... |

4 | A shape analysis for optimizing parallel graph programs
- Prountzos, Manevich, et al.
- 2011
(Show Context)
Citation Context ...mplementation; phrasing this optimization in terms of algorithmic structure permits a general-purpose system like Galois to use it for other algorithms (see Méndez-Lojo et al. [47]). Prountzos et al. =-=[54]-=- present a shape analysis for determining cautiousness using static analysis. 4.3 Exploiting structure for coordinated scheduling Scheduling Strategy Coordinated Autonomous Compile-time Just-in-time R... |

2 |
Dataflow ideas for supercomputers
- Dennis
- 1984
(Show Context)
Citation Context ...in which active nodes are processed, a property that is referred to as the Church-Rosser property since the most famous example of this behavior is β-reduction in λ-calculus. Dataflow graph execution =-=[4, 15]-=- and the preflow-push algorithm for computing maxflow [12] also exhibit this behavior. In other algorithms, the output may be different for different choices of active nodes, but all such outputs are ... |

2 |
Efficient parallel algorithms for 2-dimensional Ising spin models
- Santos, Feng, et al.
- 2002
(Show Context)
Citation Context ...assing algorithms such as belief propagation and survey propagation [44], Petri nets, and discrete-event simulation [33, 49]. In other algorithms, such as some approaches to solving spin Ising models =-=[56]-=-, the pattern of node updates is determined by externally-generated data such as random numbers. Although graph topology is typically less important for datadriven algorithms than it is in topology-dr... |

2 | A Parallel Solution Strategy for Irregular, Dynamic Problems
- Verbrugge
- 2006
(Show Context)
Citation Context ...write rule schemas (it is not clear that Delaunay mesh refinement, for example, can be specified using graph grammars, although some progress along these lines is reported by Panangaden and Verbrugge =-=[60]-=-). Nevertheless, the terminology of graph grammars may be useful for providing a theoretical foundation for the operator formulation of algorithms. For example, in the context of Figure 7, the graph s... |

2 |
Dascal et al., “Explicit multi-threading (XMT) bridging models for instruction parallelism
- Vishkin, S
- 1998
(Show Context)
Citation Context ... describe these alternatives here but refer the interested reader to Kulkarni et al. [39]. The literature on PRAM algorithms has explored similar variations such as the EREW and combining CRCW models =-=[32, 62]-=-. 4.1.1 Parallelism profiles For an algorithm like matrix multiplication for which a static dependence graph can be generated, it is possible to give closed-form estimates of the critical path length ... |

2 |
Cellular automata neighborhood survey. http:// cell-auto.com/neighbourhood/index.html
- Tyler
(Show Context)
Citation Context ...he states at time t−1 of cells in some neighborhood around c. This iterative scheme can be written using unordered iterators. A large variety of neighborhoods (stencils) are used in cellular automata =-=[62]-=-. Finite-differences: A similar state update scheme is used in finite-difference methods for the numerical solution of partial differential equations (PDEs) where it is known as Jacobi iteration. In t... |

1 |
Parallel Anderson-style points-to analysis
- Méndez-Lojo, Mathew, et al.
- 2010
(Show Context)
Citation Context ...ementation. Best speedup over reference for gimp and mplayer are 3.63 and 3.62 respectively. constraints can be viewed in terms of the application of three graph rewrite rules to the constraint graph =-=[46]-=-. Each rewrite rule adds edges to the graph but does not remove existing nodes or edges, so the operator is a strong refinement morph. The baseline implementation described in Section 4.1 would acquir... |