| D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981. |
....increased proportional to the number of processors installed in the system. It fully depends on the algorithm designed and the system architecture proposed. Mesh connected computers (MCC s) are one well known example of a parallel processing system Fig. 11. Plastic image. Fig. 12. Frog image. [20]. Even though the architecture of the MCC is simple and regular, its bus system has no reconfigurability and is not acceptable for those algorithms requiring global communications. Researchers overcame such drawbacks by equipping it with a reconfigurable bus system. Several reconfigurable parallel ....
D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers," IEEE Trans. Comput., vol. C-30, pp. 101--107, Feb. 1981.
....distributed memory parallel machine. A companion paper [24] deals with the problem of performing dynamic permutations. 1 Introduction Let n be the number of elements distributed across p processors. In a Random Access Read (RAR) each of the n elements may need to read data from another element [22]. The data is available in array D. Each element has the index of the element from which data is needed in array P. That is, element i needs 9(P(i) Figure i shows an example of a RAR. After RAR D(7) D(O) D(7) D(1) D(6) D(3) D(O) Figure 1: RAR example In a Random Access Write (RAW) each of ....
....index of the element from which data is needed in array P. That is, element i needs 9(P(i) Figure i shows an example of a RAR. After RAR D(7) D(O) D(7) D(1) D(6) D(3) D(O) Figure 1: RAR example In a Random Access Write (RAW) each of the n elements may need to write data to another element [22]. The data is available in array D. Each element has, in array P, the index of the element to which it has to send its data. Unlike the RAR case, it is possible to have collisions during a RAW. This happens when two or more data elements are written to the same destination. When collisions are ....
[Article contains additional citation context not shown here]
D. Nassimi and S. Sahni. Data Broadcasting in SIMD Computers, IEEE Transactions on Computers C-30(2):101-107 (1981).
.... next sections use the following basic operations known to be executable in logarithmic time on a hypercube or shuffle exchange network: ffl parallel prefix operation and segmented parallel prefix operation [Sch80] ffl monotone routing (the relative ordering of the data items remains unchanged) [NS81]; ffl sparse enumeration sort (sorting p ff data items on a p processor hypercube, for some fixed ff 1) NS82] ffl parentheses structured routing (routing between matching pairs in a well formed string of parentheses) MW92] 3 Tree Contraction on the Hypercube In this section we show how ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C-30:101--107, 1981.
.... inverse concentration routing, monotone routing (each processor is the source and the destination of at most one data item, the order of the data items is preserved by the routing, and, in concentration routings, the data items are concentrated to the leftmost processors, one item per processor) [NS81]. intermediate Figure 1: Subcube allocation by a monotone routing Lemma 1 There is an O(log log n log n) time algorithm for the subcube allocation problem with expansion 1. Proof: It is sufficient to sort the given intervals according to their size in descending order, and to concentrate the ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C-30:101--107, 1981.
....connected computers by performing a sort. Consider the case of area determination. The pixels are first sorted by the field comp. Next the first and last pixel in each sequence with the same comp value is identified. The distance between these can be obtained by performing a data concentration [7] of the ID of the processors containing the last pixel in each sequence. While the same technique can be applied to an RMESH, more efficient algorithms result from a different technique. On an NN RMESH, the area and perimeter can be determined in O (logN) time while it takes O (N) time to sort ....
D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers", IEEE Transactions on Computers, vol C-30, no. 2, Feb. 1981, pp 101-107.
....algorithms for parallel machines with processors interconnected as a hypercube of dimension d. The hypercube interconnection network is used in many parallel architectures. It is not easy to conceive parallel algorithms for such distributed memory machines that fully exploit their parallelism. In [4, 7, 8] the hypercube is viewed as a matrix whose rows and columns are formed by subhypercubes. This matricial visualization, together with a series of properties derived from the recursive de nition of the hypercube, constitutes a useful instrument to design parallel algorithms for machines of such ....
....that MMM 1 is better for n. In this section we present some basic communication operations, and propose a notation to simplify the algorithm description. In Section 2, we review the matricial visualization of the hypercube and give some of its properties. In [12] some algorithms presented in [7, 8] are described using the basic operations and the properties of the matricial visualization. It can be seen that such descriptions are simpler than the original ones. In Section 3, we present the SIMD algorithms for the basic operations on the hypercube, together with their time complexities. ....
[Article contains additional citation context not shown here]
Nassimi, D. and Sahni, S., \Data Broadcasting in SIMD Computers", IEEE Trans. Comput., Vol. C-30, N. 2, p. 101-107, February 1981.
.... see that, because of the communication diameter, the problems in this paper have time complexities Omega; p n) In this paper, we will frequently use Theta( p n)timestandardmeshopera tions such as sorting, random access read, random access write, compression, parallel prefix, and list ranking [4, 23, 24, 25,29]. 2.2 The Multisearch Problem Let G = V#E) be a directed or undirected graph of size n = jV j jEj, where the out degree or degree, respectively,ofanyvertex is bounded bysome constant. Let U be a universe of possible search queries on G. Define the search path of a query q 2 U , denoted ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C-30(2):101--107, February 1981.
....time complexity O(log n) These operations include segmented parallel prex and monotonic routing which together allow a monotonic read. Thus the read is monotonic, ioe for any pair of processors p i and p j , with i j, which want to read data on processors p h and p k , we have h k. We refer to [8, 9] for a detailled discussion of these operations. Another operation we use is sorting n numbers, which can be done in time O(log n log log n) 4] We shall occasionally need to solve problems on subcubes of a hypercube. We can obtain subcubes of dimension d d by selecting all 2 d nodes ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers, IEEE Transactions on Computers, C-30(2), pp. 101107, February 1981.
.... CM2 [St87] On constant size data, arithmetic operations on each processor and communication between processors which are adjacent along a fixed dimension take time O(1) In this paper we will use many basic vector operations such as parallel prefix, monotonic routing and bitonic merge [NS81, B68]. The parallel prefix sum of a vector V is the vector W , with W [k] P k i=0 V [i] 0 k p. Instead of summing we can perform any binary associative operation, e.g. copying. A further generalization is the segmented parallel prefix. The vector is split into segments and the parallel prefix ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981.
....3. Parallel Quadtree Manipulation Methods (New Results Highlighted) The time and space complexities listed in Table 3 for manipulating linear quadtrees with path encoding on a hypercube are obtained from [2] by using standard PRAM simulation on a hypercube, as described by Nassimi and Sahni ([13]) together with Cypher and Plaxton s deterministic hypercube sorting algorithm ( 4] Follows from [2] by standard PRAM simulation on a hypercube as described in [12] together with [4] This operation is trivial for pointer based quadtrees, and listed for completeness only. The hypercube ....
....starting with the leaves (which are 10 given) Since it is a complete tree, at each stage the addresses of the nodes of the subsequent level can be immediately computed. Thus, Step 3 requires time O(log 2 M) because each level can be constructed using a concentrate and distribute operation [13]. Step 4 is a multi way search operation as outlined in Section 2.3, with traveling messages represented by query processes. Hence, it requires time O(h log M) O(log 2 M) Note that, Step 4 does not change the topology of the tree but marks only the nodes to be deleted. In Step 5, the marked ....
[Article contains additional citation context not shown here]
D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers," IEEE Transactions on Computers, Vol. 30, No. 2, 1981, pp. 101-106.
....operations within each block. In most applications this scheduling problem is easy, although this is not always true. The corollary follows without assumption. Now let us look at simulations between di#erent types of PRAM. In all these results, a simulated NC algorithm remains in NC. Theorem 2 [4, 54, 69]. A parallel computation that can be performed in time t, using p strong CRCW processors, can also be performed in time t log p, using p EREW processors. Theorem 3. A parallel computation that can be performed in time t, using p strong CRCW processors, can also be performed in time (a) O(t) ....
D. Nassimi and S. Sahni, Data Broadcasting SIMD Computers. IEEE Trans. Comput. C-30, 1981, 101--107.
....0 m n. Each record has associated with it a destination address in the range 0 through n Gamma 1, with the restriction that the destination addresses form a monotonically increasing sequence. The monotonic routing algorithm sends each of the m records to its destination address within the array [7]. Special cases of monotonic routing include the concentrate, in which the m records are routed to the first m array locations, the inverse concentrate, in which the m records are originally located in the first m array locations, and the increment, in which each of the m records is moved to the ....
....moved to the next higher array location. Bit Permute Complement (BPC) routing performs a permutation of n records where the destination addresses are calculated by permuting and complementing the bits of the source addresses [8] Broadcasting copies a record from one processor to all n processors [7]. Bitonic merging is the basic operation underlying Batcher s bitonic sort. Given two sorted lists, each of length at most n, this operation merges them into a single sorted list. A BPC route must be used to reverse one of the two lists before the merge can be performed. Odd even bitonic merges ....
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C--30:101--107, 1981.
....memory models. Specifically, let a cube type computer of size p denote a hypercube, shuffle exchange or cubeconnected cycles computer with p processors, and let Sort(n; p) be the worst case time required to sort n records on a cube type computer of size p. Algorithms given by Nassimi and Sahni [11] can be used to implement a single operation of a powerful shared memory model (the Priority CRCW PRAM) that has n processors and n words of memory using a cube type computer of size p in O(Sort(n; p) time. Thus improvements in sorting can lead to improvements in a wide range of other problems. ....
....with n processors in O(log n) time. Bit Permute Complement (BPC) routing performs a permutation of n records where the destination addresses are calculated by permuting and complementing the bits of the source addresses [12] Broadcasting copies a record from one processor to all n processors [11]. Benes routing is a technique for implementing arbitrary permutations efficiently [4] Benes routing permutes n records by sending each record first to an intermediate destination and then to its final destination given by the permutation. The intermediate destinations are chosen so that the ....
David Nassimi and Sartaj Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C-30(2):101--107, February 1981.
....values received. Each column j processor can determine which S value is needed in each iteration of the while loop. Let Z(i, j) be such that PE [i, j ] needs S(Z(i, j ) j ) in the current iteration. All PEs can get their S values in O(log n) time using the random access read (RAR) algorithm of [NASS81]. However, since Z (i, j ) Z(i 1, j) the sort steps used in the RAR algorithm may be eliminated and the RAR computed in O (logn) time. Since log 2 n iterations of the while loop are made in the forward pass, the total time needed for the forward pass is O (log n) In the backward pass, again, ....
D. Nassimi and S. Sahni, "Data broadcasting in SIMD Computers", IEEE Transactions on Computers, No. 2, Vol. C-30, 1981, pp 101-107.
....in O (logN) time using an N processor hypercube. Additionally, we develop mesh algorithms for translation, rotation, and scaling that complete in O(N) time and use only O(1) memory per processor. While these operations can be performed in these time and space bounds using random access writes [4], our algorithms are elegant and have smaller constant factors associated with the complexity. In the next section we describe our hypercube and mesh models and some basic data 1. Throughout this chapter, we assume N is a power of 2. Sanjay Ranka and Sartaj Sahni movement ....
....performed with i unit routes. As in the case of the SIMD shift, the MIMD shift is also easily modified to an end off zero fill shift without increasing the number of unit routes. 4. 3 Row and Column Reordering These are special cases of the random access write (RAW) operation defined in [4]. We assume an NN array logical view of an N PE hypercube. In a row reordering the destination processor, dest (p) for data in any PE is another PE in the same row. The dest ( values in each row of the NN processor array are either nondecreasing left to right for all rows or nonincreasing ....
[Article contains additional citation context not shown here]
D. Nassimi and S. Sahni, "Data Broadcasting in SIMD computers", IEEE Transactions on Computers, No. 2, Vol. C-30, Feb 1981, pp. 101-107.
....computers. Several researchers have developed algorithms for various data routing patterns in hypercubes. For example, the general one to one data routing problem may be solved by sorting on the destination tags ( NASS82a] Efficient algorithms for random access reads and writes are developed in [NASS81] and an optimal algorithm to perform all data routes that fit into the class of bit permute complement permutations is developed in [NASS82b] Broadcasting and personalized communication are considered in [JOHN87b] Hypercube algorithms for data routing problems that arise in Gaussian elimination ....
D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers", IEEE Transactions on Computers, C-30, No 2, Feb 1981, pp 101-107.
No context found.
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Transactions on Computers, C-30(2):101-107, Feb. 1981.
....gets every pair to its correct destination processor. In group 0, Step i is a concentrate localized to the group, and in the remaining groups, Step i is a generalized concentrate in which the ranks have been increased by the same amount. In all groups we may use the mesh concentrate algorithm of [10] to accomplish the routing in 4(v 1) electronic moves. Step 3 is also a concentrate as the [r NJ values of the pairs are in ascending order from 0, 1, 2, So Steps i and 3 take 4(v 1) electronic moves each in the SIMD model and 2(v 1) in the MIMD model [10] Therefore, the overall ....
....mesh concentrate algorithm of [10] to accomplish the routing in 4(v 1) electronic moves. Step 3 is also a concentrate as the [r NJ values of the pairs are in ascending order from 0, 1, 2, So Steps i and 3 take 4(v 1) electronic moves each in the SIMD model and 2(v 1) in the MIMD model [10]. Therefore, the overall complexity of concentrate is 8(v 1) electronic and 2 OTIS moves in the SIMD model and 4(v 1) electronic and 2 OTIS moves in the MIMD model. We can improve the SIMD time to 7(v 1) electronic and 2 OTIS moves by using a better mesh concentrate algo rithm than the one ....
[Article contains additional citation context not shown here]
David Nassimi and Sartaj Sahni, "Data broadcasting in SIMD computers," IEEE Transactions on Computers, vol. C-30, no. 2, pp. 101-107, Feb. 1981.
....on an MIMD computer [RANK88b] As in the case of the SIMD shift, the MIMD shift is also easily modified to an end off zero fill shift without increasing the number of unit routes. 2.3. 3 Row and Column Reordering These are special cases of the random access write (RAW) operation defined in [NASS81]. We assume an NN array logical view of an N PE hypercube (cf. Section 2.2) In a row reordering the destination processor for data in any PE is another PE in the same row. Hence, it is sufficient for each PE to simply have a value dest (p) which gives the column index of the destination PE. ....
....the column index of the destination PE. Furthermore, the dest ( values in each row of the NN processor array are either nondecreasing left to right for all rows or nonincreasing left to right for all rows. Because of this monotonicity of the dest values, the sort step of the RAW algorithm of [NASS81] may be replaced by a step that does a data concentration. This data concentration takes O (logN) time [NASS81] Another change is needed when dest is a nonincreasing function. In this case, the ranking step of [NASS81] does a reverse ranking (i.e. right to left rather than left to right) In ....
[Article contains additional citation context not shown here]
D. Nassimi and S. Sahni, "Data Broadcasting in SIMD computers", IEEE Transactions on Computers, No. 2, Vol. C-30, Feb 1981, pp. 101-107.
No context found.
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981.
No context found.
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981.
No context found.
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981.
No context found.
D. Nassimi and S. Sahni, "Data broadcasting in SIMD computers," IEEE Trans. Comput. C-27(2) pp. 2-7 (1979).
No context found.
D. Nassimi and S. Sahni. Data broadcasting in SIMD computers. IEEE Trans. on Computers 30:2, pp. 101--106, 1981.
No context found.
Nassimi,D., and Sahni,S., "Data Broadcasting in SIMD Computers," IEEE Transactions on Computers, vol.C30, no.2, 1981, pp.101-107.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC