26 citations found. Retrieving documents...
H. Printz, "Automatic mapping of large signal processing systems to a parallel machine," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 1991.

 Home/Search   Document Not in Database   Summary   Related Articles   Check  

This paper is cited in the following contexts:

First 50 documents

Resynchronization for Multiprocessor DSP Systems - Bhattacharyya, Sriram, Lee (2000)   (Correct)

....by each actor is fixed. Although the model is too restricted for many general purpose applications, iterative SDF has proven to be a useful framework for representing a significant class of DSP algorithms, and it has been used as a foundation for numerous DSP design environments [10] 26] [40], 42] A wide variety of techniques have been developed to schedule SDF specifications for efficient multiprocessor implementation (e.g. 1] 2] 11] 17] 18] 30] 36] 40] 44] and [47] The techniques developed in this paper can be used as a post processing step to improve the ....

.... class of DSP algorithms, and it has been used as a foundation for numerous DSP design environments [10] 26] 40] 42] A wide variety of techniques have been developed to schedule SDF specifications for efficient multiprocessor implementation (e.g. 1] 2] 11] 17] 18] 30] 36] [40], 44] and [47] The techniques developed in this paper can be used as a post processing step to improve the performance of implementations that use any of these scheduling techniques. Each SDF edge has associated a nonnegative integer delay. SDF delays represent initial tokens, and specify ....

H. Printz, "Automatic mapping of large signal processing systems to a parallel machine," Ph.D. dissertation, School of Computer Science, Carnegie Mellon Univ., May 1991.


A Hierarchical Multiprocessor Scheduling Framework For.. - Pino, Bhattacharyya, Lee (1995)   (Correct)

....SDF graph as the precedence DAG (directed acyclic graph) of , or simply as the DAG of . Unfortunately, the expansion due to the repetition count of each SDF node can lead to an exponential growth of nodes in the DAG. This growth has been overlooked in previous SDF multiprocessor scheduling work [9, 10]. An SDF graph that exhibits this growth is shown in figure 3. It is easily seen that the number of nodes in the corresponding DAG is . B 1 A 1 Figure 2. The precedence graph for the SDF graph of figure 1. A 2 A 3 B 2 i 1 ( r a d a j 1 ( k a ir a d a j 1 ( k a ....

H. Printz, Automatic mapping of large signal processing systems to a parallel machine, Ph.D. Dissertation CMU-CS-91-101, Carnegie Mellon, 1991.


Dataflow Process Networks - Lee, Parks (1995)   (103 citations)  (Correct)

....express nondeterminism, and may or may not be as expressive other languages. Granular Lucid, for example, is a coordination language with the semantics of Lucid [56] Coordination languages with dataflow semantics are described by Suhler et al. 94] Gifford and Lucassen [44] Onanian [77] Printz [82], and Rasure and Williams [84] Contrast these to the approach of Reekie [85] and the DSP Station from Mentor Graphics [41] where new actors are defined in a language with semantics identical to those of the visual language. There are compelling advantages to that approach, in that all compiler ....

....by contrast, parallelism has always been implicit. This is, in part, due to the scarce use of recursion. A dataflow graph typically reveals a great deal of parallelism that can be exploited either by run time hardware [5] or, if the firing sequence is sufficiently predictable, a compiler [45][82][90] 91] Dataflow process networks can combine the best of these. Parallelism can be implicit, and higher order functions can be used to simplify the syntax of the graphical specification. The phased execution, in which the static higher order functions are evaluated during a setup phase, is ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine," Memorandum CMU-CS-91-101, School of Computer Science, Carnegie Mellon University, Ph.D. Thesis, May 15, 1991.


Dataflow Process Networks - Lee (1994)   (103 citations)  (Correct)

.... Experimenting with Language Design DATAFLOW PROCESS NETWORKS 23 of 52 Granular Lucid, for example, is a coordination language with the semantics of Lucid [41] Coordination languages with dataflow semantics are described by Suhler et al. 76] Gifford and Lucassen [29] Onanian [61] Printz [65], and Rasure and Williams [67] Contrast these to the approach of Reekie [68] and the DSP Station from Mentor Graphics [26] where new actors are defined in a language with identical semantics to the visual language. There are compelling advantages to that approach, in that all compiler ....

....by contrast, parallelism has always been implicit. This is, in part, due to the scarce use of recursion. A dataflow graph typically reveals a great deal of parallelism that can be exploited either by run time hardware [3] or, if the firing sequence is sufficiently predictable, a compiler [30][65][73] 74] Dataflow process networks can combine the best of these. Parallelism can be implicit, and higher order functions can be used to simplify the syntax of the graphical specification. The phased execution, in which the static higher order functions are evaluated during a setup phase, is ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine," Memorandum CMU-CS-91-101, School of Computer Science, Carnegie Mellon University, Ph.D. Thesis, May 15, 1991.


A Hierarchical Multiprocessor Scheduling Framework For.. - Pino, Bhattacharyya, Lee (1995)   (Correct)

....SDF graph as the precedence DAG (directed acyclic graph) of , or simply as the DAG of . Unfortunately, the expansion due to the repetition count of each SDF node can lead to an exponential growth of nodes in the DAG. This growth has been overlooked in previous SDF multiprocessor scheduling work [9, 10]. An SDF graph that exhibits this growth is shown in figure 3. It is easily seen that the number of nodes in the corresponding DAG is . B 1 A 1 Figure 2. The precedence graph for the SDF graph of figure 1. A 2 A 3 B 2 i 1 ( r a d a j 1 ( k a i r a d a j 1 ( k a i 1 ( r a d ....

H. Printz, Automatic mapping of large signal processing systems to a parallel machine , Ph.D. Dissertation CMU-CS-91-101, Carnegie Mellon, 1991.


Communication and Memory Requirements as the Basis .. - Subhlok.. (1994)   (21 citations)  (Correct)

....Compilation and optimization of programs for private memory parallel computers has been a very active area of research for several years. Several parallelizing compilers have been developed for data parallel programs, including Fortran D [15] and Vienna Fortran [4] and for task parallel programs [9, 10]. Recent research shows that a large class of applications contain task and data parallelism [7] and it is important to exploit them in a single compiler 2 The number of processors in our machine is too small for a meaningful discussion with the full size stereo. diff diff diff Partition ....

PRINTZ, H. Automatic Mapping of Large Signal Processing Systems to a Parallel Ma chine. PhD thesis, School of Computer Science, Carnegie Mellon University, 1991. Also available as report CMU-CS-91-101.


Advanced Systolic Design - Lavenier, Quinton, Rajopadhye (1999)   (Correct)

....The data can be transmitted using an application specific protocol which directly analyzes the header of the messages and explicitly tells the communication system where to deposit the messages. This protocol management can be directly used by parallel program generators such as Apply [29] AL [30, 31] or C stolic [32] The iWarp s second interesting communication innovation is the implementation of logical channels between processors. Logical channels provide both a higher degree of connectivity than physical links, and a mechanism for delivering guaranteed communication bandwidth for certain ....

H. Printz, H. Kung, T. Mummert, and P. Scherer, "Automatic mapping of large signal processing systems to a parallel machine," in Proc. of SPIE, RealTime Signal Processing XII, (San Diego CA (USA)), Aug. 1989.


The Token Flow Model - Buck, Lee (1992)   (14 citations)  (Correct)

....r 2 O 2 r 3 I 3 = r i G O i j i , O i I i j i , I i G 10 1 0 0 0 0 10 1 0 0 0 0 1 10 0 0 0 0 1 10 = Gr o = Buck and Lee 5 of 22 where is a vector full of zeros, and is the repetition vector containing the for each actor. Printz calls (4) the balance equations [Pri91]. For the topology matrix in (9) one solution is . 5) In fact, this solution is the smallest one with integer entries. For a connected SDF graph, it is shown in [Lee87a] that a necessary condition to be able to construct an admissible periodic schedule is that the rank of be equal to one less ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine," Memorandum CMU-CS-91-101, School of Computer Science, Carnegie Mellon University, Ph.D. Thesis, May 15, 1991.


A novel framework for multi-rate scheduling in DSP applications - Govindarajan, Gao (1993)   (6 citations)  (Correct)

....3 . ntroduction In this paper, we are interested in constructing efficient compile time (static) schedules for large grain, synchronous signal processing applications [11, 14]. Large grain means that each task in the applications generates tens, or even hundreds, of outputs per invocation. Each of these tasks represent operations such as an FIR filter, FFT, or bandshifting. Synchronous means that the amount of input consumed by each task, and the amount of output ....

....that the amount of input consumed by each task, and the amount of output generated, are known in advance and invariable 1 . Large grain synchronous systems are important as they can represent a large class of DSP computations, which includes a variety of narrowband and broadband sonar systems [11, 14]. Large grained synchronous signal processing applications can be represented in the form of dataflow graphs, where the nodes (also called actors) represent a computation task, and arcs (or channels) represent communication between them. The communication can be viewed as a sequence of tokens ....

[Article contains additional citation context not shown here]

H. Printz. Automatic mapping of large signal processing systems to a parallel machine. Memorandum CMU-CS-91-101 (Ph.D. Dissertation), Computer Science Department, CarnegieMellon University, 1991.


Optimizing Synchronization in Multiprocessor.. - Bhattacharyya, Sriram, .. (1995)   (Correct)

....and Messerschmitt [23] and Chao and Sha [6] have developed systematic techniques for exploiting overlapped execution to generate schedules that have optimal throughput, assuming zero cost for IPC. Other work has focused on taking IPC costs into account during scheduling, such as that described in [1, 21, 27, 31]; these efforts have not attempted to exploit overlapped execution of dataflow graph iterations. Similarly, in [10] Govindarajan and Gao develop techniques to simultaneously maximize throughput, possibly using overlapped execution, and minimize buffer memory requirements under the assumption of ....

....the token that is consumed by the filter, but also on the invocations that produce the preceding tokens, where is the order of the filter. Such dependence can easily be evaluated with an additional dataflow parameter on each actor input that specifies the number of past tokens that are accessed [27] 1 . Using this information, together with the invocation counts specified by , we obtain the precedence relationships specified by the graph of Figure 8(b) in which the th invocation of actor is labeled , and each edge specifies that invocation requires data produced by invocation iteration ....

H. Printz, Automatic Mapping of Large Signal Processing Systems to a Parallel Machine, Ph.D. thesis, Memorandum CMU-CS-91-101, School of Computer Science, Carnegie Mellon University, May, 1991.


Automatic Mapping of Task and Data Parallel Programs for.. - Jaspal Subhlok (1993)   (10 citations)  (Correct)

....Compilation and optimization of programs for private memory parallel computers has been a very active area of research for several years. Several parallelizing compilers have been developed for data parallel programs, including Fortran D [13] and Vienna Fortran [3] and for task parallel programs [8, 9]. Recent research shows that a large class of applications contain task and data parallelism [6] and it is important to exploit them in a single compiler framework [4, 5, 12] There is also a large body of literature on, partitioning, load balancing and scheduling of parallel programs [1, 10] We ....

PRINTZ, H. Automatic Mapping of Large Signal Processing Systems to a Parallel Ma chine. PhD thesis, School of Computer Science, Carnegie Mellon University, 1991. Also available as report CMU-CS-91-101.


A Hierarchical Multiprocessor Scheduling System for DSP.. - Pino, Bhattacharyya, Lee (1995)   (3 citations)  (Correct)

....Systems, and Computers October a manner that simplifies the DAG without hiding much exploitable parallelism. It is important to note that each resultant cluster is mapped onto a single processor. This observation motivates the modification of execution time minimizing clustering heuristics [5, 6, 7, 8] for use on the SDF graph. With these multiprocessor scheduling clustering heuristics, the resultant clusters are to be mapped onto a single processor. By clustering the SDF graph we also have the opportunity to use specialized uniprocessor SDF schedulers, which can optimize for such parameters as ....

H. Printz, Automatic mapping of large signal processing systems to a parallel machine, Ph.D. Dissertation CMU-CS91 -101, Carnegie Mellon, 1991.


Communication and Memory Requirements as the Basis for.. - Jaspal Subhlok (1994)   (21 citations)  (Correct)

....transforms, narrowband tracking radar, and multibaseline stereo. We examine the tradeoffs between various mappings for them and show how the framework is used to obtain efficient mappings. 1 Introduction Parallelizing compilers typically support data parallelism [7, 25] or task parallelism [16, 19], but not both. However, many applications contain a mix of task and data parallelism [1, 6, 10, 12] and it is important for the compiler to exploit both styles [8, 11, 24] In this paper, we address how program characteristics influence the tradeoffs in taskand data parallel programs, ....

PRINTZ, H. Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. PhD thesis, School of Computer Science, Carnegie Mellon University, 1991. Also available as report CMU-CS-91-101.


Generating Compact Code From Dataflow Specifications.. - Bhattacharyya, Buck.. (1995)   (4 citations)  (Correct)

....actors, consume a fixed number of data items, called tokens or samples, per invocation and produce a fixed number of output samples per invocation. SDF and related models have been studied extensively in the context of synthesizing assembly code for signal processing applications, for example [8, 9, 10, 11, 17, 19, 20, 21]. Figure 1 shows a simple SDF graph with three actors, labeled A, B and C. Each arc is annotated with the number of samples produced by its source and the number of samples consumed by its sink. Thus, actor A produces two samples on its output arc each time it is invoked and B consumes one sample ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine", Memorandum No. CMU-CS-91-101, School of Computer Science, Carnegie-Mellon University, May 1991.


Rate-Optimal Schedule for Multi-Rate DSP Computations - Govindarajan, Gao (1993)   (Correct)

....available in VLSI DSP architectures grows rapidly, mapping DSP programs to exploit parallelism at all levels becomes increasingly important. In this paper, we are interested in constructing efficient compiletime (static) schedules for large grain, synchronous signal processing applications [25, 35]. Large grain means that each task in the applications generates tens, or even hundreds, of outputs per invocation. Each of these tasks represent operations such as an FIR filter, FFT, or bandshifting. Synchronous means that the amount of input consumed by each task, and the amount of output ....

....means that the amount of input consumed by each task, and the amount of output generated, are known in advance and invariable. Large grain synchronous systems are important as they can represent a large class of DSP computations, which includes a variety of narrowband and broadband sonar systems [25, 35]. Such large grained synchronous signal processing applications can be represented in the form of dataflow graphs, where the nodes (also called actors) represent a computation task, and arcs (or channels) represent communication between them. The communication can be viewed as a sequence of tokens ....

[Article contains additional citation context not shown here]

H. Printz. Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. PhD thesis, Carnegie-Mellon U., 1991. Publ. as Memorandum CMU-CS-91-101, Dept. of Comp. Sci.


Unknown -   (Correct)

....to specifying signal processing systems [11] However, there are ongoing efforts towards augmenting such languages to make them more suitable; for example, 18] proposes extensions to the C language. There have been several efforts toward developing compiler techniques for SDF and related models[12, 21, 25, 26, 27]. Ho [16] developed the first compiler for pure SDF semantics. The compiler, part of the Gabriel design environment [21] was targeted to the Motorola DSP56000 and the code that it produced was markably more efficient than that of existing C compilers. However, due to its inefficient ....

....vectorized, as discussed in [27] Some other consequences of the choice of blocking factor in uniprocessor code generation are discussed later in this paper. Several scheduling problems for SDF and related models have been addressed: constructing efficient multiprocessor schedules is discussed in [26, 28]; Ritz et. al discuss vectorization [27] the problem or organizing loops is examined in [2] and compiler scheduling techniques for efficient register allocation are presented in [25] In this paper, we assume that a schedule has been constructed under one or more of these criteria. In other ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine", Memorandum CMU-CS-91-101, School of Computer Science, Carnegie Mellon University, May 1991.


Compiling Task and Data Parallel Programs for.. - Gross, Hinrichs.. (1992)   (1 citation)  (Correct)

....each data parallel task in sequence, using all processors for each task. However, this can lead to poor processor utilization, since the number of sensors and the sampling rate (and thus the size of the data) are limited by hard physical constraints. This kind of phenomenon was encountered in [14], where it was shown that only up to 8 processors could be used efficiently in a purely data parallel spectral detection system. The solution is to exploit both styles of parallelism by allocating a subset of processors to each data parallel task. There are a number of different approaches to ....

H. W. Printz. Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. PhD thesis, Carnegie Mellon, May 1991.


Looped Schedules For Dataflow Descriptions Of Multirate.. - Bhattacharyya, Lee (1994)   (2 citations)  (Correct)

....by its sink actor. The D on the arc between B and C represents a unit delay, which can be viewed as an initial sample that is queued on the arc. SDF and related models have been studied extensively in the context of synthesizing assembly code for signal processing applications, for example [7, 8, 9, 10, 17, 18, 19, 20]. In SDF, iteration is defined as the repetition induced when the number of samples produced on an arc (per invocation of the source actor) does not match the number of samples consumed (per sink invocation) 12] For example, in figure 1, actor B must be invoked two times for every invocation of ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine", Memorandum CMU-CS-91-101, School of Computer Science, Carnegie-Mellon University, May 1991.


A Compiler Scheduling Framework For Minimizing Memory.. - Shuvra Bhattacharyya (1993)   (1 citation)  (Correct)

....of Synchronous Dataflow (SDF) programming [10] a restricted form of dataflow programming in which the number of data items produced and consumed by each functional block is known at compile time. SDF or closely related semantics underlie many software design environments for signal processing[3, 4, 6, 7, 11, 12, 13, 14, 15]. Figure 1 shows a simple SDF graph. The numbers at both ends of each arc designate the rates at which blocks produce and consume data samples. For example, block Z consumes 20 samples from its input arc each time it is invoked, and Y produces 10 samples on its output arc. The 10D on the arc ....

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine", Memorandum CMU-CS-91-101, School of Computer Science, Carnegie-Mellon University, May 1991, PhD Thesis.


Minimizing Buffer Requirements under Rate-Optimal.. - Govindarajan, Gao, Desai (1994)   (1 citation)  (Correct)

....At best, an MBRO schedule can perform with only 50 of the buffer requirement for an MRSP schedule or for a block schedule and without compromising the computation rate. 1 Introduction Large grain synchronous dataflow graphs (originally proposed by Lee et al. 16] and subsequently used by [23, 25]) have been widely accepted as a powerful programming model for DSP applications. Here large grain means that each task in the applications generates (or consumes) multiple, sometimes in tens or even in hundreds, samples per invocation. Each of these tasks represent operations such as an FIR ....

....by each task, and the amount of output generated, are known a priori and are fixed. Large grain regular dataflow networks are important as they can represent a large class of DSP computations, such as speech coding, auto correlation, power spectrum, voice band modulation, and a variety of filters [16, 23]. A large grained dataflow graph consists of tasks (also called nodes or actors) and arcs (or channels) representing communication between tasks. The communication can be viewed as a sequence of tokens passing through an arc. The temporal sequences of values passed along the communication channels ....

[Article contains additional citation context not shown here]

H. Printz. Automatic mapping of large signal processing systems to a parallel machine. Memorandum CMU-CS-91-101 (Ph.D. Dissertation), Computer Science Department, Carnegie-Mellon University, 1991.


A Comparative Study of DSP Multiprocessor List Scheduling Heuristics - Liao (1993)   (8 citations)  (Correct)

.... of proposed heuristics to extend the list scheduling technique to multiprocessors with non zero IPC delay models [16, 18, 17, 8, 15, 3, 12] List scheduling heuristics with non zero IPC delay are also widely used as basic scheduling routines in more complex multiprocessor scheduling heuristics [14, 10, 6, 13]. Unfortunately, there is little quantitative data to compare the performance of list scheduling heuristics with non zero IPC delay. This paper presents a comparative study of a collection of list scheduling heuristics with non zero IPC delay. A LGDFG can be characterized by the graph parallelism ....

....the connection setup time, y represents the transmission and synchronization overhead per data unit, and d is the the number of data units to be sent from one processor to another. More elaborate IPC cost models considering different interconnection networks and routing mechanisms can be found in [15, 6, 13]. The IPC cost model for crossbar switch connected multiprocessors used in this paper can be treated as a special case of that presented in [15] 3 List Scheduling Heuristics In this section, we briefly describe the heuristics in used our study. We choose to implement a representative sample of ....

H. Printz. Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. PhD thesis, Carnegie-Mellon University, 1991. Published as Memorandum CMU-CS-91-101, Department of Computer Science.


Exploiting Task and Data Parallelism on a Multicomputer - Subhlok (1993)   (55 citations)  (Correct)

.... and finer grained parallelism within each task [7] For example, a sonar spectral detection system consists of a sequence of parallel pipelines, where each pipeline consists of a sequence of downsampling time domain filters, followed by an FFT, thresholding, and other postprocessing operators [16]. Two widely used styles of parallelism for private memory multicomputers are data parallelism [9, 11, 20, 19, 10] and task parallelism [6, 13, 12, 14] Data parallelism is typically expressed as a single thread of control operating on data sets distributed over all nodes. It is especially useful ....

Printz, H., Kung, H. T., Mummert, T., and Scherer, P. Automatic mapping of large signal processing systems to a parallel machine. In Proceedings of SPIE Symposium, Real-Time Signal Processing XI (San Diego, CA, Aug. 1989), Society of Photo-Optical Instrumentation Engineers, pp. 2--16.


Multidimensional Synchronous Dataflow - Murthy, Lee (2002)   (4 citations)  (Correct)

No context found.

H. Printz, "Automatic mapping of large signal processing systems to a parallel machine," Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA, 1991.


A Comparison of Clustering and Scheduling Techniques for.. - Kianzad, Bhattacharyya (2003)   (Correct)

No context found.

H. Printz, Automatic Mapping of Large Signal Processing Systems to a Parallel Machine. Ph.D. Thesis, school of computer Science, Carnegie Mellon University, May 1991.


A Parallelizing Compiler Based on Partial Evaluation - Surati (1993)   (1 citation)  (Correct)

No context found.

H. Printz, "Automatic Mapping of Large Signal Processing Systems to a Parallel Machine, " Carnegie Mellon Computer Science Department Technical Report CMU-CS91 -101., May, 1991.

First 50 documents

Online articles have much greater impact   More about CiteSeer.IST   Add search form to your site   Submit documents   Feedback  

CiteSeer.IST - Copyright Penn State and NEC