| Blelloch, G. & C.R. Rosenberg, 1987. Network learning on the Connection Machine, Proceedings Tenth International Joint Conference on Artificial Intelligence, Milan, Italy. |
....lines separate general purpose computers, transputer systems, and neurocomputers, respectively. Computer No of PEs Network size FP MCUPS Sun 3 [19] 1 NETTALK 1 32 0.034 NCube 4 [5] 16 Optimal network 1 32 0.19 Sun SparcStation 10 [84] 1 1 32 1.1 Alpha Station [84] 1 1 32 3. 2 CM 2 [85] 16K NETTALK (60) 1 2.8 Cray 2 [22] 4 257 x 256 x 131,072 1 32 10 1 iPSC 860 [19] 32 NETTALK (80) 2000 32 11 MP 1 [14] 4096 128 x 64 x 16 1536 32 12 IBM RISC 6000 550 [86] 1 1000 x 1000 x 1 1000 32 17.6 Cray X MP [22] 4 257 x 256 x 131,072 1 32 18 1 CM 5 [23] 32 NETTALK (80) lbb 32 ....
G. Blelloch and C. R. Rosenberg, "Network learning on the Connection Machine," Proc. of IJCAI87, pp. 323--326, 1987.
....1.6 CMOS 8e8 y y Hirai [14] digital 1.2 CMOS 6, 84 8.4e4 n n y DNNA [16] stochastic 32, 1024 2e8 n n y STONN [24] stochastic 20, 2000 5e8 n n y TInMANN [8] stochastic 2.0 CMOS 1, y n y Hitachi [25] digital wsi 0.8 CMOS 540, 34K 1. 4e8 n n y CM1 [5] mpp 3e6 y y y Warp [20] systolic 17e6 y y y AAP 2 [22] array 18e6 y y y SPRINT [12] systolic , 1e6 12e6 100W, 8J y y y neuron [23] 1e11,1e15 8 bit 1e16 1W, 0.1fJ y n y y Table 1: Summary of existing implementations. In the right four columns, l = ....
G. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, pages 323--326, 1987. Milan, Italy.
....online update allows n p weight changes to be applied, whereas the most extreme form of pooled update allows only 1. For some data sets (the NetTalk training set among them) this is a decided disadvantage. Using online update, NetTalk can be learned in 10 complete passes through the training set[3], a performance unlikely to be matched by the 10 updates allowed by strict pooled update. It is, however, an open question whether pooled update is worse in general. For some tasks, it appears to work better; in some recent experiments performed by one of the authors (Witbrock) training ....
....lower execution efficiency, tree update takes considerably less time than ring update when more than 4 processors are used. 6.3. Performance Measurements We measured the performance on our simulator using NetTalk[17] text to phoneme benchmark (as did Pomerleau et al. 15] and Blelloch Rosenberg [3]) The network consists of an input layer with 203 units and a true unit, a hidden layer with 60 units, and an output layer of 26 units. The input layer is fully connected to the hidden layer, and the hidden layer is fully connected 14 1 10 100 1000 MCPS 1 2 4 8 16 32 64 128 256 512 ....
[Article contains additional citation context not shown here]
Blelloch, G. and Rosenberg, C.R., "Network learning on the Connection Machine", Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Milan, Italy, 1987.
....BP can be parallelized either by network partitioning or by pattern partitioning. In network partitioning schemes, nodes and weights of the neural network are partitioned among different processors and thus the computations of node activations, node errors and weight changes are parallelized [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17]. In pattern partitioning, individual weight changes due to various learning patterns are computed concurrently [17, 18, 19, 20] Pattern partitioning and network partitioning can also be combined [6, 11, 12, 14, 20, 21] to form hybrid schemes. Several machine architectures including linear ....
....on hypercubes . Most previous parallel formulations of BP on hypercube or on related architectures used vertical sectioning [15] pattern partitioning [18, 19] or a hybrid of vertical sectioning and pattern partitioning[11, 12] For hypercubes, the authors are aware of only one exception [8] in which each node and weight of the BP network is mapped onto a separate processor of CM2 TMk . As discussed in [11, 12] this method incurs too much communication overhead. The paper is organized as follows. Section 2 overviews the serial BP algorithm and its existing parallel formulations. ....
[Article contains additional citation context not shown here]
G. Blelloch and C. R. Rosenberg. Network learning on the connection machine. Technical report, MIT, November 1986.
....measured in CUPS (Connections Weights Updated Per Second) In the case learning by block epoch is used, CUPS is given by the number of weight change values computed per second. See Table 1 for the performance of some computers. Computer No of cells MCUPS Sun 3 [5] 1 0. 034 iPSC 860 [5] 32 11 CM 1 [16] 2.8 Warp [13] 20 32 CM 2 [11] 65536 40 MP 1216 [7] 16384 41 Cellular arch. 14] 4175 52 Sandy [15] 60 135 (Est. Table 1: Performance of other computers running NETtalk. In order to compare parallel BP algorithms and show how well they performed for an increasing number of processors, four ....
G. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proc. of IJCAI87, pages 323--326, 1987.
....are SIMD machines; cellular automata are MIMD machines. Most implementations of these architectures to date have consisted of large numbers of individually packaged chips placed on multiple circuit boards in racks of equipment with significant investments in power supplies and thermal engineering [5, 14, 3]. Several attempts have been made to build waferscale systems, with limited success due to yield problems. The opportunity exists to develop tile based architectures which map efficiently onto multichip modules. The size and composition of the tiles can be optimized to maximize overall system ....
G. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, pages 323--326, 1987. Milan, Italy.
....in the DataVault a parallel array of disk drives (TMC, 1991a) and accessed as and when required. A mapping of feed forward networks onto the CM 2 in which each node is mapped onto a processor and each link is mapped onto two processors one processor at each end of the link is reported in (Belloch and Rosenberg, 1987). A survey of several implementations of back propagation on the CM 2 can be found in (Singer, 1990) In addition to considering the advantages and disadvantages of the various approaches, Singer, 1990) also compares simulation performance. Implementation of recurrent back propagation on the CM 2 ....
Belloch, G. and Rosenberg, C. R. (1987). Network learning on the connection machine. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, Milan, Italy.
....step, or 138 MCPS. 3.10 Neurocomputer implementations There have been several papers describing implementations of neural nets on massively parallel processors or neural coprocessors. We only mention those that have reported performance by way of comparison with neurochips. Blelloch and Rosenberg [12] discuss mapping backpropagation onto the Connection Machine, reporting 3 MCUPS. Pomerleau et al. [61] discuss mapping backpropagation onto the CMU Warp, reporting 17 MCUPS. Watanabe et al. [75] discuss backpropagation on the NTT AAP 2, reporting 18 MCUPS. De Groot and Parker [27] describe mapping ....
....[18] digital 1.6 CMOS 8e8 yes yes Hirai [36] digital 1.2 CMOS 6, 84 8.4e4 no yes DNNA [42] stochastic 32, 1024 2e8 no yes STONN [78] stochastic 20, 2000 5e8 no yes TInMANN [16] stochastic 2.0 CMOS 1, yes yes Hitachi [81] digital wsi 0.8 CMOS 540, 34K 1. 4e8 no yes CM1 [12] mpp 3e6 yes Warp [61] systolic 17e6 yes AAP 2 [75] array 18e6 yes SPRINT [27] systolic , 1e6 12e6 100W, 8 J yes neuron 1e11,1e15 1e16 1W, 0.1fJ yes yes Table 1: Summary of existing implementations Table 1 gives a summary of existing implementations. It is not ....
G. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence, pages 323--326, 1987. Milan, Italy.
....be very small in order to ensure convergence. After some preliminary trials, we settled on a learning rate of (nd) Gamma1 for the reconstructors and (2nd) Gamma1 for the compressor. Parallel training Parallelization of the training set provides a significant speedup to back propagation (Blelloch Rosenberg, 1987). By removing dependencies from the original RAAM training regimen and parallelizing the algorithm on a 4096 processor Maspar MP2, we were able to run large scale experiments with full parallelization over the training sets. Data We tested our methods on four different data sets, each consisting ....
Blelloch, G., Rosenberg, C.R. 1987. Network learning on the Connection Machine, Proceedings Tenth International Joint Conference on Artificial Intelligence, Milan, Italy.
....rate must be very small in order to ensure successful training. After some preliminary trials, we settled on a learning rate of (nd) Gamma1 for the reconstructors and (2nd) Gamma1 for the compressor. Parallelization of the training set provides a significant speed up to backpropagation (Blelloch Rosenberg, 1987). By removing dependencies from the original RAAM training regimen and parallelizing the algorithm on a 4096 processor Maspar MP2, we were able to run large scale experiments with full parallelization over the training sets. 5 Results The results are shown in Table 2, where n is the number of ....
Blelloch, G. & C.R. Rosenberg, 1987. Network learning on the Connection Machine, Proceedings Tenth International Joint Conference on Artificial Intelligence, Milan, Italy.
.... This slows down communication between non adjacent processors, and limits its viability to systolic implementations[ For massively parallel machines, a method of allocating one processor for each cell and two processors for each weight has been suggested recently by Blelloch and Rosenberg[5]. This spreads the weights of the connections of any cell over several processors. These weights can be efficiently processed using scan operations and pipelining. Using such a representation, the CM 2 is able to up to 64M interconnects. For the Ametek 2010, a configuration of 1024 mesh connected ....
.... TI[52] TI Explorer neural simulator WARP, Linear systolic array 10 17M 320K BP CMU[8, 39] of ten processors (NETtalk) Butterfly, 128 68020 nodes 16M 120M Runs Rochester BBN[4, 8] connected by connectionist butterfly network;VAX simulator[13] CM 2, 64K single bit processors, 13M 64M BP[5] Thinking Machines hypercube network; NETtalk) Corp. 38, 8] VAX or Symbolics MX 1 16 16 digital signal 120M 50M Projected performance MIT processors; Lincoln Labs[8, 22] Lisp machine X MP 2, Two Cray processors 50M 2M Estimated performance Cray Research[8] with shared memory IPS = Interconnects ....
G. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proc. of the 10th Int'l Joint Conf. on Artificial Intelligence, pages 323--326, Milan, Italy, Aug. 1987.
....vector multiplication. The algorithm is therefore well suited for applications in which the matrix is only used a few times (as is often the case with adaptive meshes) Segmented operations have been used for sparse matrix vector multiplication with good success on the Connection Machine [9, 36], but the application to vector multiprocessors is new. We have implemented the SEGMV algorithm on the Cray Y MP C90 and have compared its running time to various other algorithms on several sparse matrices from the Harwell Boeing collection and industrial application codes. Figure 1 summarizes ....
.... can be used to implement many dataparallel algorithms for problems with irregular structures, including sparse matrix routines [5, 6] Other uses of segmented scans include computer graphics [15] object recognition [37] processing image contours [12] parallel quicksort [5] machine learning [9], and network optimization [28] Because of their usefulness for such problems, hardware support was included in the Connection Machine CM 5 [26] for segmented scans, and the proposed High Performance Fortran (HPF) standard [21] contains scan intrinsics (called PREFIX and SUFFIX) with an optional ....
G. E. Blelloch and C. R. Rosenberg. Network learning on the Connection Machine. In Proceedings of the AAAI Spring Symposium Series: Parallel Models of Intelligence, pages 355--362, August 1987.
No context found.
Blelloch, G. & C.R. Rosenberg, 1987. Network learning on the Connection Machine, Proceedings Tenth International Joint Conference on Artificial Intelligence, Milan, Italy.
No context found.
G. Blelloch and C. R. Rosenberg, "Network learning on the Connection Machine," Proc. of IJCAI87, pp. 323--326, 1987.
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC