| S. Smith, M. Mercer, and B. Underwood. An analysis of several approaches to circuit partitioning for parallel logic simulation. In Proc. Int. Conference on Computer Design, IEEE, pages 664--667, 1987. |
....Warp simulators; optimizations that address communication overhead have focused almost exclusively on minimizing the number of messages generated by the simulation. For example, improved partitioning strategies have been developed to minimize the number of messages generated by the application [1,6,36,57]. Similarly, several optimizations that minimize messages generated by the kernel have been studied (lazy cancellation [49] efficient GVT estimations [41] bounding optimism [20, 47] In this thesis, DyMA has been proposed to reduce the average communication overhead experienced by the ....
Smith, S. P., Underwood, B., and Mercer, M. R. An analysis of several approaches to circuit partitioning for parallel logic simulation. In In Proceedings of the
....concurrency and better performance over other partitioning algorithms, if it can achieve a good compromise among three competing goals: interprocessor communication, load balancing, and concurrency. The only partitioning algorithm that considers concurrency is the string partitioning algorithm [89]. However, because it does not consider factors other than concurrency, it does not produce good performance [89] Different parallel simulation protocola lead to major performance differences. For example, oblivious simulation that is not PDES is better for some circuits while optimistic ....
....three competing goals: interprocessor communication, load balancing, and concurrency. The only partitioning algorithm that considers concurrency is the string partitioning algorithm [89] However, because it does not consider factors other than concurrency, it does not produce good performance [89]. Different parallel simulation protocola lead to major performance differences. For example, oblivious simulation that is not PDES is better for some circuits while optimistic event driven simulation is better for others. The reason is that optimistic 4 event driven simulation algorithms have ....
[Article contains additional citation context not shown here]
S. P. Smith, B. Underwood, and M. R. Mercer, "An Analysis of Several Approaches to Circuit Partitioning for Parallel Logic Simulation," IEEE International Conference on Computer Design, pp. 664-667, 1987.
....optimizations: a) partitioning; b) rollback relaxation; and (c) ne grained communication optimizations. The following sub sections detail each of these optimizations. 5.1. Partitioning To extract better performance from parallel logic simulators, partitioning techniques are necessary [16, 41, 45]. The partitioning techniques can exploit either the parallelism inherent in (i) the simulation algorithm, or (ii) the circuit being Please write titlerunninghead (Shortened) Article Title in le 15 simulated. The amount of parallelism that can be gained from the former method is limited ....
.... algorithms, in general, concentrate on achieving speedup by improving concurrency, minimizing inter processor communication, and balancing the processor workload based on the circuit being simulated [2] Several techniques have been developed to partition logic circuits for parallel simulation [2, 16, 28, 41, 45]. The algorithms address various issues related to concurrency, communication and load balancing. In addition to investigating and implementing existing partitioning algorithms, we have developed a new partitioning algorithm based on a multilevel heuristic. The new multilevel approach [46] to ....
S. P. Smith, B. Underwood, and M. R. Mercer. An analysis of several approaches to circuit partitioning for paral lel logic simulation. In In Proceedings of the 1987 International Conference on Computer Design., pages 664-667. IEEE, NewYork, 1987.
....large circuits. Hence, dividing and assigning the workload equally across the processors such that the overheads are minimized is critical for maximum performance. Hence, partitioning, the process which computes such an assignment, is a key factor in achieving speedup in parallel simulations [10, 31, 35]. Traditionally, partitioning techniques were designed to exploit either the parallelism inherent in (i) the simulation algorithm, or (ii) the model being simulated [27] The amount of parallelism that can be gained from the former techniques is limited Support for this work was provided in ....
....communication but places no emphasis on concurrency. In any case, load balance can be maintained in both strategies. 3.3. Topological Level Partitioning. The techniques described above do not directly address the problem of concurrency. Unlike such techniques, the topological partitioning scheme [10, 31] attempts to improve concurrency by a level sorting scheme. In this strategy, the nodes in the circuit graph are rst level sorted according to their topological level in the graph. Nodes at the same topological level are then assigned to di erent partitions. Load balancing does not pose a problem ....
[Article contains additional citation context not shown here]
S. P. Smith, B. Underwood, and M. R. Mercer, An analysis of several approaches to circuit partitioning for paral lel logic simulation., in In Proceedings of the 1987 International Conference on Computer Design., IEEE, NewYork, 1987, pp. 664-667.
....been extended. Optimizations that address the communication overhead have focused almost exclusively on minimizing the number of messages generated by the simulation. For example, improved partitioning strategies have been developed to minimize the number of messages generated by the application [1, 2, 10, 18]. Similarly, several optimizations that minimize messages generated by the kernel have been studied (lazy cancelation [16] efficient GVT calculations [12] bounding optimism [5, 15] In contrast, there have been no efforts to optimize the operation of the communication subsystem of the simulator ....
S. P. Smith, B. Underwood, and M. R. Mercer. An analysis of several approaches to circuit partitioning for parallel logic simulation. In In Proceedings of the 1987 International Conference on Computer Design., pages 664--667. IEEE, NewYork, 1987.
....algorithms included in the study. 1 Introduction Parallel simulation tools are frequently used to simulate large and complex digital circuits in order to reduce time for simulation [3] To extract better performance from parallel logic simulators, partitioning techniques are necessary [5, 18, 19]. The partitioning techniques can exploit either the parallelism inherent in (i) the simulation algorithm, or (ii) the circuit being simulated. The amount of parallelism that can be gained from the former method is limited by the algorithm used for simulation. The latter method attempts to improve ....
....instantaneous workload for load balancing, was presented by Hong [14] Bagrodia et al. [2] have illustrated the use of an acyclic multi way partitioning scheme for gate level simulations. A partitioning scheme based on fanout fanin cone clustering starting from the input gates was studied by Smith [18]. A random partitioning scheme that assigns nodes to partitions in a random and load balanced manner was reported in [15] A major bottleneck for the random partitioner is communication. A Depth First traversal of the circuit graph can also be utilized for partitioning by assigning nodes to ....
[Article contains additional citation context not shown here]
S. P. Smith, B. Underwood, and M. R. Mercer. An analysis of several approaches to circuit partitioning for paral lel logic simulation. In In Proceedings of the 1987 International Conference on Computer Design., pages 664--667. IEEE, NewYork, 1987.
....one fan out and one fan in. This algorithm ensures that there is at least one fanout of a gate in the same partition. The scheme aims at maximizing the concurrency but without any consideration of the communication between partitions. Fan in cone and Fan out cone partitioning by Smith, et al. [4] tends towards reducing communication. The Fan in cone of a gate A is the set of all gates which affect the output of gate A. The Fan out cone of A is set of all gates affected by the output of A. In fan in cone partitioning, first the fan in cones of each gate are found. Then, the gates driven by ....
Smith S. P., Underwood B., and Mercer M. R., "An analysis of several approaches to circuit partitioning for parallel logic simulation ", in Proceedings of the 1987 International Conference on Computer Design, IEEE, New York, 664-667, 1987.
....based on asynchronous logic simulation techniques that, while novel, falls short of achieving high processor efficiency. In [143] Patil et al. present a circuit partitioned approach applicable to shared memory machines machines that incorporate techniques from parallel logic simulation [139] [144]. 7.3.3 Pattern partitioning For combinational circuits, fault simulation can be trivially parallelized by partitioning the test vector set. The only significant issue is load balance, similar to the fault partitioning case. For sequential circuits, the problem is much more difficult, because ....
S. P. Smith, W. Underwood, and M. R. Mercer, "An analysis of several approaches to circuit partitioning for parallel logic simulation," in Proceedings of the International Conference on Computer Design, pp. 664--667, 1987.
....version. This directly in uences model partitioning. Assigning a logical element to a model part requires the same assignment for all logical elements of the whole model which are able to contribute to a change of an input value of the considered element during one cycle. If special fan in cones [SUM87, Man92, MTSDA93] are taken as basic building blocks for model partitioning, the demand mentioned above is ful lled. A model partition is directly related to certain workloads of the processors involved in later parallel simulation and communication overhead between co operating TEXSIM instances. ....
....the underlying hardware. Due to our parallelization approach, cutting signals of M S during a partitioning of M is only permitted at cycle boundaries related to the clock cycle algorithm mentioned above. Therefore, we are forced to de ne basic units for partitioning which are known as cones [SUM87, Man92, MTSDA93] These units comprise elements of MB with a limiting head element out of MO [ M L . For further investigations of the partitioning problem we x the following formal de nitions with respect to an arbitrarily chosen hardware model M : De nition 2 The fan in cone co I (x) of an ....
S. P. Smith, B. Underwood, and M. R. Mercer. An analysis of several approaches to circuit partitioning for parallel logic simulation. In Proceedings IEEE International Conference on Computer Design (ICCD), pages 664-667, 1987.
....its fanout components, etc. until a primary output is reached. The string of components formed above is assigned to a processor, and the process repeats, forming another string. Analogous to the depth first search implicit in string partitioning, fanin and fanout cones (proposed by Smith et al. [25]) spread out from an initial gate in a breadth first manner. Many logic partitioning algorithms borrow ideas from physical partitioning algorithms originally developed to address the placement problem. For example, Fiduccia and Mattheyses [12] min cut algorithm and other graph based bisection ....
E.J. Smith, B. Underwood, and M.R. Mercer. An analysis of several approaches to circuit partitioning for parallel logic simulation. In Proc. of the Int'l Conf. on Computer Design, pages 664--667, IEEE, 1987.
....specifically for logic simulation have been reported in the past. A method called partitioning by element strings was proposed in [18] which tries to assign linear strings of logic blocks as single entity to different processors. Several approaches to circuit partitioning were evaluated in [19] by using two different metrics: one which measures the concurrency and another which measures the communication. The partitioning strategies were finally evaluated by actually counting the number of messages in an actual eventdriven simulator. An analysis of parallel logic simulation on several ....
S. P. Smith, W. Underwood, and M. R. Mercer, "An analysis of several approaches to circuit partitioning for parallel logic simulation," Proc. Int. Conf. Computer Design, pp. 664-667, 1987.
....neglected. Our current work focuses on the optimization of Time Warp, the application of the oracle log method to examine and to improve the conservative algorithms, and the integration of additional partitioning schemes that use application specific knowledge (e.g. strings and cone partitioning) [29]. Additionally, conservative and optimistic methods will be combined to hybrid schemes, compared, and tuned using our trace based analysis and visualization tools. Finally, we are implementing a dynamic load balancing scheme. This makes sense because in static partitioning schemes the criteria for ....
Smith, S., Mercer, M., and Underwood, B. (1987) An Analysis of Several Approaches to Circuit Partitioning for Parallel Logic Simulation. Proc. of Int. Conference on Computer Design, IEEE, pp. 664 -- 667
....with large circuits from the ISCAS89 benchmark suite yielded speed up factors between 2 and 4.2 on 7 processors. The speed up is calculated relative to a true sequential simulator. According to the authors, an important role is played by the applied partitioning scheme. They used cone partitioning [SMU87a] with enhancements to incorporate the estimated circuit behavior into the partitioning algorithm. 5.2.7 The DACAPO III Project The DACAPO III system is a multi level, mixed mode simulation system for logic simulation [GRA90a] It has been developed at the University of Dortmund and runs as a ....
....elements for partitioning. The problem with these methods is whether there is enough inherent parallelism within such coarse grained divisions. In particular for the hierarchical designs often only one component out of several might be active at a time. 6.1. 6 Cone Partitioning Cone Partitioning [SMU87a] is well suited for clocked circuits that contain latches. Starting with the latches of a circuit, all units that feed a latch are added to the input cone of that latch. This procedure is repeated recursively for the added units until either a primary input or the output of a latch is reached. As ....
Smith, S., Mercer, M., and Underwood, B. (1987) An Analysis of Several Approaches to Circuit Partitioning for Parallel Logic Simulation, Proc. Int. Conference on Computer Design, IEEE, pp. 664 -- 667
No context found.
S. Smith, M. Mercer, and B. Underwood. An analysis of several approaches to circuit partitioning for parallel logic simulation. In Proc. Int. Conference on Computer Design, IEEE, pages 664--667, 1987.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC