| Kimberly Keeton, David A. Patterson, and Thomas E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, Stanford University, Stanford, CA, August 1995. |
....processed serially. Each subtransaction accesses six data pages and proportion of the write operation is specified as 0.6. The execution site of each subtransaction is selected randomly. For a transaction related message, communication delay is calculated as 0.2ms (message size in KB) 0. 1ms [33, 21]. Disk access delay for stable logging consists of rotational delay and data transfer delay. For the delay time, 4.17ms of rotational delay and 30MB sec of data transfer rate are used [23] Since a dedicated log disk is assumed for logging, there is no seek time involved with log writes. We have ....
Kimberly K. Keeton, Thomas E. Anderson, and David A. Patterson. Logp quantified: The case for low-overhead local area networks. In A Symposium on High Performance Interconnects, August 1995.
.... messaging systems [7, 18, 19, 25, 32, 35, 36] offer the best communication performance, as they create a fast communication path that bypasses the traditional in kernel messaging protocol stack (e.g. TCP IP) which is a serious obstacle in exploiting the high performance of modern network [1, 15, 16]. Although most of these lightweight messaging systems are successful in delivering the raw network performance to higher level applications, there remain a number of issues that are not well addressed by these lightweight messaging systems, such as: Only focus on point to point performance ....
K. Keeton, T. Anderson, and D. Patterson. Logp quantified: The case for low-overhead local area networks. In Hot Interconnects III: A Symposium on High Performance Interconnects, Aug. 1995.
....DMA into the host memory to finish, which sets the marginal cost of 11.6ns byte. If a receiving process is asleep, latency is dominated by interrupt service and process context switching time. We observed 4 78s for packets with no payload, which compares favorably with other recent reports [Jones96, Keeton95, vonEiken95]. On the same machine, a context switch provoked by a semaphore takes 31s. 4.3 Bandwidth and packet size Using 4KB packets, the bottleneck is the I O bus interface. The observed slope of the latency function in Figure 7 is 30.7MB s, from which we infer that the LANai control program achieves a ....
Kimberley Keeton, Thomas Anderson, and David Patterson. LogP quantified: the case for lowoverhead local area networks. Presented at HotInterconnects III (Stanford, CA), August 1995. Available at http://now.cs.berkeley.edu/Papers/Papers /hotinter95-tcp.ps.
....The reciprocal of G characterizes the available per processor communication bandwidth for long messages. LoGPC [5] is a new model where application specific parameters are introduced to account for network and resource contention effects. LogP is quantified for low overhead local area networks in [14]. The performance assessment of LogP for fast network interfaces is presented in [11] The majority of performance measurements for point to point MPI communications are based on measuring the round trip time or the average transfer time between two processes. Benchmarks like the COMMS1 and COMMS2 ....
K. Keeton, T. Anderson, and D. Patterson, "LogP Quantified: The Case for Low-Overhead Local Area Networks," Hot Interconnects III: A Symp. on High Performance Interconnects, Stanford University, Stanford, CA, Aug. 10-12, 1995.
....each generation, going from 33, to 60, 167, and finally 400 MHz. Intuitively, the reduction in the performance gap between general purpose and specialized messaging systems is easy to understand: the high cost of general purpose messaging arises from the large number of protocol instructions [28]. As processor speed increases, the cost of executing instructions drops correspondingly. Critically, however, I O devices have not kept pace with processor speed Consider that in 1992, a typical processor and I O bus (e.g. a SPARC 2) both ran at 33 MHz. At these speeds, the cost of executing ....
....models [4, 13, 26] However, these models were always in the context of specialized communication systems, thus making comparisons to general purpose systems difficult. In a more general networking context, much work has been done to quantify the performance of existing IP protocol stacks [27, 28], or increase their performance via copy reduction techniques [8, 9, 15, 29, 40] Unlike the analysis of specialized messaging, the performance of these general purpose systems is rarely defined in terms of architectural models. It is difficult from these studies to determine the effect of the ....
K. Keeton, D. A. Patterson, and T. E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects 3, Stanford University, Stanford, CA, August 1995.
....ISSC( be completed, and the one way network interface overhead takes only 175 4 processor cycles (0. 825 s) This measure indicates that the inter node communication in MANNA machines is very efficient when compared with networks of workstations which may be as high as hundreds of micro seconds [9]. 4.1 Cache Performance To measure the performance of I Structures and ISSC, we selected four different benchmarks: Dense matrix multiplication, Conjugate Gradient, Hopfield network, and Sparse matrix multiplication. Dense matrix multiplication is a simple minded, nonblocking algorithm that ....
.... and receiving of network packages may take from hundreds to thousands of cycles depending on the design of the network interface [4] In some machines, a parallel environment is built on top of the TCP protocol and the communication interface overhead may be as high as hundreds of micro seconds [9]. Even with some improved protocols, like Fast Sockets [13] and Active Messages [14] it still costs 4060 micro seconds to send a message to the network. 4 8 12 16 Number of Nodes 0.0 4.0 8.0 12.0 16.0 Absolute Speedup 0 4 8 12 16 Number of Nodes 0.0 1.0 2.0 3.0 4.0 5.0 Absolute ....
[Article contains additional citation context not shown here]
K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III: A Symposium on High Performance Interconnects, August 1995.
....the code complexity to do so might be taxing on the programmer and the spatial data locality still can not be exploited. putations. In some machines, a parallel environment is built on top of the TCP protocol and the communication interface overhead may be as high as hundreds of micro seconds [9]. Even with some improved protocols, like Fast Sockets [15] and Active Messages [16] it still costs 4060 micro seconds to send a message to the network. For instance, in the multi packet delivery implementation of Active Messages in the CM 5 machine, it costs 6221 instructions for sending a ....
.... 4 Conclusions Do software caches really work In this paper, we demonstrated a software implementation of IStructure cache, i.e. ISSC, can deliver performance gains for most distributed memory systems which don t have extremely fast inter node communications, such as network of workstations [3, 9, 15, 18]. ISSC caches values obtained through split phase transactions in the operation of an I Structure. It also exploit spatial data locality by clustering individual element requests into block. Our experiment results show that the inclusion of ISSC in a parallel system that provides split phase ....
K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III: A Symposium on High Performance Interconnects, August 1995.
....each generation, going from 33, to 60, 167, and finally 400 MHz. Intuitively, the reduction in the performance gap between general purpose and specialized messaging systems is easy to understand: the high cost of general purpose messaging arises from the large number of protocol instructions [28]. As processor speed in1 Specialized Message Layer Performance 0 5 10 15 20 25 30 35 40 45 Time usec RTT gap L Or Os 1992 1994 1997 2000 Figure 1. LogP performance of 4 Active Message Layers. The figure shows the LogP performance of 4 specialized message layers using the ....
....models [4, 13, 26] However, these models were always in the context of specialized communication systems, thus making comparisons to general purpose systems difficult. In a more general networking context, much work has been done to quantify the performance of existing IP protocol stacks [27, 28], or increase their performance via copy reduction techniques [8, 9, 15, 29, 40] Unlike the analysis of specialized messaging, the performance of these general purpose systems is rarely defined in terms of architectural models. It is difficult from these studies to determine the effect of the ....
K. Keeton, D. A. Patterson, and T. E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects 3, Stanford University, Stanford, CA, August 1995.
....nodes are available limiting the scale of experiments. Second, there are no widely available implementations of portable libraries like PVM or MPI that have been highly tuned up for the clusters. Third, the popular FDDI and Ethernet are slower than interconnects of massively parallel processors [Keeton95]. For example, Presently affiliated with Oracle Corporation. y Presently affiliated with Transarc Corporation. with a single sender and receiver on an FDDI HP workstation cluster, we obtained 3.33 MB sec on average out of the peak FDDI bandwidth of 12.5 MB sec using MPICH 1.0.11 [MPICH] 1 . ....
K. K. Keeton, T. E. Anderson and D. A. Patterson, LogP Quantified: The Case for LowOverhead Local Area Networks, Hot Interconnects III, August 1995.
....made about the optimization problem. We pick two platforms: the IBM SP2 with a custom network, and a network of Sparc workstations (NOW) connected by a commodity network (Myrinet) IBM s message passing library MPL and MPICH 2 are used for communication. Details of the networks can be found in [25, 24, 16]. We want to measure how large messages are rewarded by the network, while estimating the local buffer copy cost to collect small messages. Figure 5 shows the profiling code and results. The top curve shows the bandwidth of local bcopy as a function of buffer size. The bottom curve plots network ....
K. Keeton, T. Anderson, and D. Patterson. LogP quantified: The case for low-overhead local area networks. In Proc. Hot Interconnects III: A Symposium on High Performance Interconnects, Stanford, CA, Aug. 1995.
.... s communication interface overhead may take from hundreds to thousands of cycles depending on the design of the network interface [7] In some machines, a parallel environment is built on top of the TCP protocol and the communication interface overhead may be as high as hundreds of micro seconds [17]. Even with some improved protocols, like Fast Sockets [23] and Active Messages [26] it still costs 4060 micro seconds to send a message to the network. Thus, in our second set of experiments, shown in Figure 6, we add 10 s (500 processor cycles) to the cost of both I structure and ISSC ....
....incurred by remote operations. In this paper, we demonstrated that even a software emulation of an I structure cache can yield performance improvements. We also demonstrated that such performance gains will be even more evident in machines with higher latencies, such as network of workstations [17, 23]. The concept of I Structure caches is not limited on software implementation. While some researchers concentrate on the development of faster network interfaces [6, 11, 22] I Structure caches can be implemented with dedicated hardware, a decoupled general purpose processor or even integrated ....
K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III: A Symposium on High Performance Interconnects, August 1995.
....of microseconds required to perform a single step on a serial machine (43800) we performed a LogP analysis of the system for the SMPs, the NOW, and for a mythical ethernet system. We obtained the appropriate values for the parameters from papers by the NOW group at Berkeley [Lumetta et al., 1997; Keeton et al., 1995]. Combining these numbers into alpha beta values, we then obtained the following timing curves. Parallelizing Robocup CS 267 David Andre 05 17 98 10 S e q u e n t i a l A n a l y s i s 0 1 0 0 0 0 2 0 0 0 0 3 0 0 0 0 4 0 0 0 0 5 0 0 0 0 6 0 0 0 0 7 0 0 0 0 8 0 0 0 0 9 0 0 0 0 1 0 0 0 ....
Keeton, K.K, Anderson, T.A., and Patterson, D.A. (1995). LogP quantified: the case for low-overhead local area networks. Presented at Hot Interconnects III: A symposium on High Performance Interconnects, Stanford University.
....kilobyte. Kay and Pasquale [59] found that the median message sizes for TCP and UDP (mostly generated by the Network File System) traffic in a departmental network were 32 and 128 bytes respectively. They also found that 99 of TCP and 86 of the UDP traffic was less than 200 bytes. Keeton, et al. [60] analyzed a debitcredit benchmark on a commercial database and found that all messages were less than 200 bytes. In the seven parallel scientific applications I studied in this chapter, I found that the average message size ranges between 19 230 bytes (Table 4.3) Current microprocessors offer ....
Kimberly A. Keeton, Thomas E. Anderson, and David A. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, 1995.
....but Autonet could not and thus could not be tolerated. Consequently, Autonet cannot carry host traffic while reconfiguration is in progress. This could cause deadlock, should inconsistent forwarding tables arise. Thus, direct application access to network was not supported, though many studies [16, 17, 18, 19] now show that such direct bindings of physical communication resources to virtual ones is necessary for obtaining high performance as networks move into the gigabyte per second range. Processors in each switch periodically execute a distributed topology acquisition algorithm. This updates ....
K. Keeton, T. Anderson, and D. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proc. of Hot Interconnects III, August 1995.
.... latency (L) as defined by the LogP model [40] We assume that a messages arrives in exactly L time units (even when a process sends a message 87 System Latency Source ( s) Cray T3D 2 [6] TMC CM 5 6 [167] Intel Paragon 6 [38] FDDI 6 [115] Myrinet 11 [38] Fore ATM 33 [166] Switched Ethernet 52 [89] Table 7.1: Measured Network Latency. The table shows the network latency of recent MPPs and Networks of Workstations. Variable Description Value L network latency 10 s o overhead 0 g gap 0 P number of processors 32 W wake up from message arrival 50 s and 200 s Q duration of time slice SSC : ....
Kimberly Keeton, David A. Patterson, and Thomas E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, Stanford University, Stanford, CA, August 1995.
.... et al. reported a strong sensitivity of application performance to communication overhead [18] Keeton et al. measured communication overhead and latency, and reported that increased overhead had a significant impact on the performance of applications with small message request response behavior [16]. Berry et al. 4] developed prototype implementations of the VI Architecture on 100 Mbit sec Ethernet, and on Myrinet. Both implementations used software emulations. In addition, they programmed the Lanai network interface controller on the Myrinet NIC to perform the VI emulation. The latter ....
K. Keeton, D. A. Patterson, and T. E. Anderson, "LogP Quantified: The Case for Low-Overhead Local Area Networks," presented at Hot Interconnects III, 1995.
....we conclude 12 that for small messages, system call overhead and streams interface overhead dominate the total time required to transmit the messages, while for large messages, the time is dominated by the copying overhead. This result coincides with the observations from other research efforts [4, 10, 20], which claim that for small messages, per message overheads (e.g. system call overhead, context switch time, protocolspecific processing time etc. dominates the total cost for data transfer, and for large messages, per byte overheads (e.g. data checksumming, data movement etc. are the main ....
K. K. Keeton, T. E. Anderson, and D. A. Patterson, "LogP Quantified: The Case for LowOverhead Local Area Networks", Proc. of Hot Interconnects III, August 1995.
....bandwidths of network links and the decreasing latencies of network switches cause software overheads to dominate communication costs. This issue has been studied in the context of message passing libraries and their alternatives for MPPs [2] and in the context of TCP IP performance for LANs [30]. In both of these domains, small messages are common, and software overhead determines their performance. To the extent that small messages are necessary, overhead in communication software determines network utilization and performance. For example, a TCP IP implementation with 100 microseconds ....
K. Keeton, T. Anderson, and D. Patterson, "LogP Quantified: The Case for Low-Overhead Local Area Networks", In Proceedings of Hot Interconnects III, August 1995.
....running over the aforementioned ATM based environments. Unlike many types of applications, communications performance cannot be solely characterized by throughput; this is only possible for applications generating very large messages in a one way exchange between sending and receiving hosts [6]. Communications in parallel computing applications approach to the request response model, since each task sends data to other tasks and expects other data from them. In this model, per message overheads set a limit on the achievable performance and therefore, as discussed in [7] latency is a ....
K. K. Keeton, T. E. Anderson, and D. A. Patterson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proceedings of Hot Interconnects III, 1995.
No context found.
KEETON, K., PATTERSON, D. A., AND ANDERSON, T. E. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III (Stanford University, Stanford, CA, August 1995).
No context found.
KEETON, K., PATTERSON, D. A., AND ANDERSON, T. E. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III (Stanford University, Stanford, CA, August 1995).
....the performance required by applications like xFS. First, the serverless architecture requires the use of many small control messages, whose efficient transmission is crucial to the performance of xFS. Traditional RPC implementations layered on top of heavy weight protocols (such as TCP and UDP) [26] cannot satisfy this performance requirement. Second, instead of using simple two party request reply exchanges, xFS transactions can require the cooperation of several machines. Unfortunately, when we synthesize multi party communication using RPC, we cannot benefit from the semantic advantages ....
Keeton, K. K., Anderson, T. E., and Patterson, D. A. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Proc. of 1995 Hot Interconnects III (August 1995).
No context found.
Kimberly Keeton, David A. Patterson, and Thomas E. Anderson. LogP Quantified: The Case for Low-Overhead Local Area Networks. In Hot Interconnects III, Stanford University, Stanford, CA, August 1995.
No context found.
K. Keeton, T. Anderson, and D. Patterson, "LogP Quantified: The Case for Low-Overhead Local Area Networks," Hot Interconnects III: A Symp. on High Performance Interconnects, Stanford Univ., Stanford, CA, Aug. 1995.
No context found.
K. Keeton, T.E. Anderson, and D.A. Patterson, "LogP Quantified: The Case for Low-Overhead Local Area Networks,"Hot Interconnects III Symp. Record, 1995; http://http.cs.berkeley. edu/~kkeeton/Papers/papers.html.
First 50 documents Next 50
Online articles have much greater impact More about CiteSeer.IST Add search form to your site Submit documents Feedback
CiteSeer.IST - Copyright Penn State and NEC