| G. C. Sih. Multiprocessor Scheduling to Account for Interprocessor Communication. PhD thesis, University of California, Berkeley, 1991. |
....DIF specifications of ran domly generated, synthetic benchmarks. This can be useful for more extensive test 11 ing of tools and algorithms beyond the set of available application models. The benchmark generator is based on an implementation of Sih s dataflow graph genera tion algorithm [15], which constructs application like graphs by mimicking patterns found in practical dataflow models. DIF specifications and intermediate representations can also be converted automatically into the input format of dot [10] a well known graph visualization Random Symbol Sourc luare Root Raised ....
G. C. Sih. Multiprocessor Scheduling to account for Interprocessor Communi- cation. Ph.D. thesis, Department of EECS, UC Berkeley, April 1991. 17
....Alternatively, given a constraint on the number of links, the best result minimizes the makespan. EXPERIMENTS Our techniques for scheduling and interconnection pattern synthesis operate in conjunction with a given list scheduling strategy. In these experiments, we employed the DLS algorithm [11] as the underlying list scheduling strategy, although, as described in Section 5, any list scheduling algorithm could have been used. We examined a set of DSP application benchmarks and scheduled them using two different scheduling modes, one that incorporates only feasibility information (to ....
G. C. Sih, Multiprocessor Scheduling to Account for Interprocessor Communication, Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of Califomia at Berkeley, April 1991.
....APG. G A A ( q i 12. q A ( iA A i B j A B j A 1 A 2 B 2 B 3 C 1 C 2 C 3 Figure 5. a) An SDF graph, and (b) its equivalent APG. A B C 3211 For an efficient algorithm that systematically constructs the equivalent APG from a consistent SDF graph, we refer the reader to [47]. Similar techniques can be employed to map CSDF specifications into equivalent APG representations. We refer to an APG representation of an SDF or CSDF application specification as a dataflow application graph, or simply, an application graph. In other words, an application graph is an ....
.... associated with syn L # v ( out v ( E M # ( L # v ( chronization, and can be used as post processing steps to any of the partitioning algorithms discussed in Section 4, as well as to a wide variety of multiprocessor scheduling algorithms for dataflow graphs, such as those described in [19, 24, 47]. In this section, we present an overview of these approaches to synchronization optimization. Specifically, we discuss two closely related graph theoretic models, the IPC graph [48] and the synchronization graph [9] that are used to model the self timed execution of a given parallel schedule ....
G. C. Sih, "Multiprocessor Scheduling to account for Interprocessor Communication, ", Ph.D. Thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, 1991.
....processor targets, some reasonable scheduling objectives might include minimization of data or program memory requirements. For multiprocessor targets, minimizing makespan, or maximizing flowtime are more likely objectives. For some examples of scheduling heuristics that have been applied, see [Sih91]. 3. Dynamic Dataflow Although SDF is adequate for representing large parts of many algorithms, it is rarely sufficient for expressing an entire program. A more general dataflow model is needed in order to express data dependent iteration, conditionals, and recursion. The addition of two actors, ....
Gilbert C. Sih, "Multiprocessor Scheduling to Account for Inter-processor Communication", Ph.D. Thesis, Memorandum No. UCB/ERL M91/29, UC Berkeley, CA 94720, April 22, 1991.
....which are similar to RSFGs, and proposed a method to derive a periodic admissible schedule which does not require unbounded storage. Sih and Lee have studied the scheduling of acyclic synchronous dataflow graphs, and proposed several heuristic solution methods for scheduling the acyclic graphs [17]. Compile time scheduling of dataflow program graphs with dynamic constructs was reported in [7] The multiprocessor scheduling with inter processor communication delays has been studied in [17] Except for [17] which deals with acyclic graphs, all other work follow the block scheduling approach. ....
.... dataflow graphs, and proposed several heuristic solution methods for scheduling the acyclic graphs [17] Compile time scheduling of dataflow program graphs with dynamic constructs was reported in [7] The multiprocessor scheduling with inter processor communication delays has been studied in [17]. Except for [17] which deals with acyclic graphs, all other work follow the block scheduling approach. In this approach the nodes in one iteration must be scheduled before the scheduling of any nodes in the subsequent iteration. The reader can verify that an optimal schedule using the block ....
[Article contains additional citation context not shown here]
G.C. Sih. Multiprocessor scheduling to account for interprocessor communication. Memorandum, UCB/ERL M91/29 (Ph.D. Dissertation), EECS Department, University of California, Berkeley, 1991.
....of actors on processors, and determining when each actor fires are performed at compile time [5] Although determining an optimal fully static schedule is NP hard, several heuristics have been proposed for this problem. Some of these heuristics generate a non overlapped blocked schedule [6], whereas others generate overlapped schedules using a modified list scheduling technique [7] 8] Thus, if processors are available, A B E D C Proc 1 Proc 4 Proc 3 Proc 2 Execution Times A , B, F : 3 C, H : 5 . Figure 1. Fully static schedule on five processors : 6 D : 2 G t 5 10 15 20 0 ....
....can be inferred from the schedule, and hence these buffers can be statically allocated. In some cases it is advantageous to unfold a graph by a certain unfolding factor, say , and schedule iterations of the graph together in order to exploit inter iteration parallelism more effectively [1][6]. The unfolded graph contains copies of each actor of the original graph. In that case and are defined for all the nodes of the unfolded graph (i.e. and are defined for the first invocations of each actor) and is the iteration s p v ( 1 2 . P , v k s v k , Z s v k , s ....
[Article contains additional citation context not shown here]
G. C. Sih, "Multiprocessor Scheduling to account for Interprocessor Communication," Ph. D. Thesis, Department of EECS, University of California Berkeley, April 1991.
....The design of heterogeneous multiprocessor targets, in which more than one type of programmable processor can be used, is also underway. Parallel schedulers that use different cost functions for partitioning and scheduling the code on heterogeneous programmable components have been developed [27]. 4.3.2 Code Generation for Hardware Synthesis A Silage[14] code generation domain has also been developed under Ptolemy. Silage is a functional or applicative language (each operation can be thought of as applying a function on a set of inputs and generating a set of outputs) It is possible to ....
G. C. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", Ph.D. Thesis, Electronic Research Laboratory, University of California, Berkeley, CA 94720, April 1991.
....advantage of the processing power of the distributed architecture, each processing graph will be analyzed at compile time. This analysis will identify connected segments in the processing graph which contain transitions that can be statically assigned and scheduled using established techniques [5, 6, 7, 8]. Directed cycles in the processing graph will also be identified. The results will be used to assign the transitions of the processing graph to the processors in the target architecture. The goal in this assignment is to provide maximum throughput within specified latency constraints. This ....
....transitions. If a segment contains only ordinary transitions, which always consume and produce exactly one token at each adjacent place, then the segment returns to its previous state after each transition fires once. While partitioning of the segments and assignment to processors is not trivial, [5] provides a partial analysis of partitioning and assignment. Analyzing the more coarse grained structure of the segments and connections between them is more difficult, because with the special transitions, the amount of data produced and consumed varies over time. Therefore, the scheduling and ....
G. C. Sih, Multiprocessor Scheduling to Account for Interprocessor Communication, Ph.D. thesis, Memorandum No. UCB/ERL M91/29, Electronics Research Laboratory, University of California at Berkeley, April, 1991.
....Systems, and Computers October a manner that simplifies the DAG without hiding much exploitable parallelism. It is important to note that each resultant cluster is mapped onto a single processor. This observation motivates the modification of execution time minimizing clustering heuristics [5, 6, 7, 8] for use on the SDF graph. With these multiprocessor scheduling clustering heuristics, the resultant clusters are to be mapped onto a single processor. By clustering the SDF graph we also have the opportunity to use specialized uniprocessor SDF schedulers, which can optimize for such parameters as ....
....not deadlock if and only if it has an acyclic precedence graph. Likewise, an SDF graph that does deadlock must have at least one cycle in the precedence graph. Therefore, we must not introduce cycles into the precedence graph by the clusterings we do. SDF precedence graph expansion is detailed in [7]. We have developed a theorem, called the SDF composition theorem, which establishes four clustering criteria that together provide a sufficient condition that a given clustering operation involving two adjacent nodes does not introduce deadlock. The first condition prevents the introduction of ....
G. C. Sih, Multiprocessor scheduling to account for interprocessor communication, Ph.D. Dissertation, UCB/ ERL M91/29, Electronics Research Laboratory, University of California at Berkeley, 1991.
....for HSDF graphs if there are no resource constraints, there has been little work so far for constructing these schedules under processor and communication constraints. In contrast, there are many heuristics that are able to construct good non overlapped schedules when these constraints are present [7] [12] Hence, a practical reason for studying blocked, nonoverlapped schedules is to get upper bounds on the performance we can expect when we use these heuristics. In [13] an algorithm is given that systematically increases the blocking factor until the critical path in the unfolded graph is ....
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", Ph.D. Thesis, Memo. No. UCB/ERL M91/29, Electronics Research Lab., UC Berkeley, April 1991
....derive a periodic admissible schedule which does not require unbounded storage [26] However, the schedules follow the block scheduling approach in which the operations of nodes in one iteration must be scheduled before the scheduling of any nodes in the next iteration. This work was extended in [40], where the study was focused on scheduling acyclic graphs. Both optimality and complexity of the scheduling methods remain as the issues. In [33] Parhi and Messerschmitt have studied the static rate optimal scheduling of iterative dataflow flow programs via optimal unfolding. For iterative ....
....know that actors a and c need to fire thrice while b fires only twice. Let the firings of a be named as a0, a1, and a2, of b as actors b0 and b1, and of c as c0, c1, and c2. Fig. 3 depicts the equivalent homogeneous graph. A procedure to construct an equivalent homogeneous graph is presented in [40]. As mentioned earlier, we do not allow two firings of an actor to take place concurrently. Therefore, e.g. the three instances of actor a fire sequentially. That is, the firing of a1 in the k th iteration can start only after a0 of k th iteration has completed its firing. This dependency, ....
[Article contains additional citation context not shown here]
G. C. Sih. Multiprocessor Scheduling to Account for Interprocessor Communication. PhD thesis, U. of Calif. at Berkeley, 1991. Publ. as Memorandum UCB/ERL M91/29.
....vectorized, as discussed in [27] Some other consequences of the choice of blocking factor in uniprocessor code generation are discussed later in this paper. Several scheduling problems for SDF and related models have been addressed: constructing efficient multiprocessor schedules is discussed in [26, 28]; Ritz et. al discuss vectorization [27] the problem or organizing loops is examined in [2] and compiler scheduling techniques for efficient register allocation are presented in [25] In this paper, we assume that a schedule has been constructed under one or more of these criteria. In other ....
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication," Memorandum UCB/ERL M91/29, Electronics Research Laboratory, University of California at Berkeley, May 1991.
....and there are no data dependent actors in the program graph. Even in this restricted domain of applications, algorithms that accomplish an optimal scheduling have combinatorial complexity, except in certain trivial cases. Therefore, good heuristic methods have been developed over the years [4] [6], 7] 8] Also, most of the scheduling techniques are applied to a completely expanded dataflow graph and assume that an actor is assigned to a processor as an indivisible unit. It is simpler, however, to treat a data dependent actor as a schedulable indivisible unit. Regarding a macro actor as ....
G.C.Sih, "Multiprocessor Scheduling to Account for Interprocessor Communications", Ph.D. Thesis, University of California, Berkeley, April, 1991.
....then it is possible in principle to construct an optimal schedule for the execution of a dataflow graph on a given system of parallel processors with no run time overhead for scheduling purposes. In practice, the multi processor scheduling problem is NP complete even in the simplest cases (see [Sih91] for a detailed discussion of NP completeness as it applies to multiprocessor scheduling) so that most researchers use heuristic methods to obtain near optimal schedules with various definitions of goodness. Many of these techniques are elaborations on Hu s list scheduling ( Hu61] ....
....output nodes. The output node is connected to the source actor of the removed arc, and the input node is connected to the destination actor of the removed arc. A unified algorithm for expansion of the graph to the homogeneous form together with construction of the APG is given in an appendix of [Sih91]. A1 B1 B2 B3 A2 3 2 A B Figure 2.4 A simple regular dataflow graph and its associated acyclic precedence graph. Numbers adjacent to arcs specify the numbers of tokens produced and consumed. 42 Given a specific schedule for a regular dataflow graph, memory requirements for each arc may be ....
[Article contains additional citation context not shown here]
G. C. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication, " Memorandum No. UCB/ERL M91/29 (Ph.D. Thesis), U. C. Berkeley, 1991.
....arc of an actor. Then, the actor may fire: the node consumes these input tokens, performs its associated action (e.g. computing a FIR filter output) and transfers the result tokens to the outgoing arcs. The design systems named above may also generate code for simulation in software (e.g. [20, 18, 1]) or for prototyping an application in hardware (e.g. by generating VHDL code [23] However, a pure software solution or a pure hardware solution may not be the optimal choice for implementing a dataflow model. For example, a software solution may be too slow to satisfy real time data rate ....
....dataflow graphs. The approach described by Bhattacharyya [1] e.g. treats the problem of optimally generating code for uniprocessor (all nodes implemented in software) implementations of SDF graphs. The construction of multiprocessor schedules has also been attacked by several authors, e.g. [20]. However, not at the level of rapidly prototyping such solutions on heterogeneous architectures. The construction of mixed hardware software schedules involving the generation of a partition of an SDF graph into actors realized in hardware and actors for which a processor schedule is derived, ....
G. C. Sih. Multiprocessor scheduling to account for interprocessor communication. Technical Report UCB/ERL 91/29, Ph.D dissertation, Dept. of EECS, UC Berkeley, Berkeley, CA 94720, U.S.A., April 1991.
.... namely (i) the periodic admissible parallel schedules, also known as block schedules, proposed by Lee and Messerschmitt [12] ii) the optimal scheduling buffer allocation (OSBA) algorithm of Ning and Gao [15] and (iii) our own multi rate software pipelining (MRSP) algorithm [7] As in [19], to obtain general trends over a broad range of test inputs, we propose to evaluate our scheduling method on randomly generated RSFGs. We will supplement these results with those obtained from a set of sample DSP applications namely, 1) phase locked loop, 2) voice band modulation, 3) power ....
....found in [5] Due to the lack of availability of standard multi rate benchmark programs and to extract general trends over a broad range of test inputs, we propose to evaluate our scheduling method on randomly generated well behaved RSFGs. A similar approach was followed in other studies, e.g. in [13, 19]. We supplement this study with results obtained from a set of DSP algorithms. The DSP algorithms considered in our experiments are: 1) phase locked loop, 2) voice band modulation, 3) power spectrum, 4) auto correlation, 5) periodogram, 6) comb filters, 7) spectrum analyzer, and (8) a ....
[Article contains additional citation context not shown here]
G.C. Sih. Multiprocessor scheduling to account for interprocessor communication. Memorandum, UCB/ERL M91/29 (Ph.D. Dissertation), EECS Department, University of California, Berkeley, 1991.
....several nice properties are obtained for these graphs. In particular, selftimed scheduling, i.e. assigning actors to processors and specifying their firing order, can be done efficiently at compile time. Nearly optimal static periodic multi processor schedules can be obtained for these graphs [6]. Also, given approximate execution times for the actors, one can constrain the order in which processors would need to access shared resources at run time without unduly sacrificing performance. It is seen that a large set of DSP algorithms fall into the SDF paradigm. In our group at Berkeley, we ....
G. C. Sih, and E. A. Lee, "Multiprocessor Scheduling to Account for Inter-processor Communication", Ph.D. Thesis, ERL, UC Berkeley, April 1991.
....IPC delay needs to be considered in multiprocessor scheduling. Multiprocessor scheduling with non zero IPC delay has received considerable attention in recent years. There are a number of proposed heuristics to extend the list scheduling technique to multiprocessors with non zero IPC delay models [16, 18, 17, 8, 15, 3, 12]. List scheduling heuristics with non zero IPC delay are also widely used as basic scheduling routines in more complex multiprocessor scheduling heuristics [14, 10, 6, 13] Unfortunately, there is little quantitative data to compare the performance of list scheduling heuristics with non zero IPC ....
....of scheduling. Another measure of graphs size is average node degree. In our preliminary study we tested graphs with varying sparseness and found no impact on performance. Therefore we adopted the number of nodes as the measure of graph size. Graph parallelism is measured by the following formula [15]: Parallelism = P n i=1 W i max j L(N j ) 2) This is the lower bound on the number of processors required to execute the graph in time bounded by the critical path (the longest path from an initial task to an exit task) if IPC costs are not included. When no confusions may occur, we use the ....
[Article contains additional citation context not shown here]
G. C. Sih. Multiprocessor scheduling to account for interprocessor communication. Memorandum No. UCB/ERL M91/29, Electronics Research Laboratory, University of California at Berkeley, 1991. PhD thesis.
....its predecessor system, Gabriel [7] The looped structure produced by the clustering algorithm, with a bit of postprocessing to eliminate duplicate tests, is suitable for code generation for a single processor. Extension to multiple processor scheduling by extending techniques such as those of [6] to support data dependent actors are also contemplated. For the special case of bounded cycle length graphs, minimax scheduling is the logical criterion, especially in hard real time systems. For the more general case of data dependent iteration, we plan to apply techniques from [9] ....
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", Ph.D. Thesis, Memorandum No. UCB/ERL M91/29, UC Berkeley, CA 94720, April 22, 1991
No context found.
G. C. Sih. Multiprocessor Scheduling to Account for Interprocessor Communication. PhD thesis, University of California, Berkeley, 1991.
No context found.
G. C. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", Ph.D. Dissertation, ERL, University of California, Berkeley, CA 94720, April 22, 1991.
No context found.
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", Memorandum No. UCB/ ERL M91/29, Electronics Research Laboratory, University of California at Berkeley, April, 1991.
No context found.
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", PhD Thesis, University of California at Berkeley, 1991.
No context found.
G. Sih, "Multiprocessor Scheduling to Account for Interprocessor Communication", PhD Thesis, University of California at Berkeley, 1991.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC