• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 119
Next 10 →

Effects of communication latency, overhead, and bandwidth in a cluster architecture

by Richard P. Martin, Amin M. Vahdat, David E. Culler, Thomas E. Anderson - In Proceedings of the 24th Annual International Symposium on Computer Architecture , 1997
"... This work provides a systematic study of the impact of communication performance on parallel applications in a high performance network of workstations. We develop an experimental system in which the communication latency, overhead, and bandwidth can be independently varied to observe the effects on ..."
Abstract - Cited by 108 (6 self) - Add to MetaCart
, slowing down by a factor of 60 on 32 processors when overhead is increased from 3 to 103 s. Applications in this study are also sensitive to per-message bandwidth, but are surprisingly tolerant of increased latency and lower per-byte bandwidth. Finally, most applications demonstrate a highly linear

A Trace-Driven Analysis of the UNIX 4.2 BSD File System

by John Ousterhout, Herve Da Costa, David Harrison, John A. Kunze, Mike Kupfer, James G. Thompson , 1985
"... We analyzed the UNIX 4.2 BSD file system by recording userlevel activity in trace files and writing programs to analyze the traces. The tracer did not record individual read and write operations, yet still provided tight bounds on what information was accessed and when. The trace analysis shows that ..."
Abstract - Cited by 277 (5 self) - Add to MetaCart
that the average file system bandwidth needed per user is low (a few hundred bytes per second). Most of the files accessed are open only a short time and are accessed sequentially. Most new information is deleted or overwritten within a few minutes of its creation. We also wrote a simulator that uses the traces

High Performance RDMA-Based MPI Implementation over InfiniBand

by Jiuxing Liu, Jiesheng Wu, Dhabaleswar K. Panda - In 17th Annual ACM International Conference on Supercomputing (ICS ’03 , 2003
"... Although InfiniBand Architecture is relatively new in the high performance computing area, it o#ers many features which help us to improve the performance of communication subsystems. One of these features is Remote Direct Memory Access (RDMA) operations. In this paper, we propose a new design of MP ..."
Abstract - Cited by 126 (31 self) - Add to MetaCart
currently delivers a latency of 6.8 microseconds for small messages and a peak bandwidth of 871 Million Bytes (831 Mega Bytes) per second. Performance evaluation at the MPI level shows that for small messages, our RDMA-based design can reduce the latency by 24%, increase the bandwidth by over 104

The Relative Importance of Memory Latency, Bandwidth, and Branch Limits to Performance

by Norman P. Jouppi, Parthasarathy Ranganathan - In The Workshop on Mixing Logic and DRAM: Chips that Compute and Remember , 1997
"... This study investigates the relative importance of memory latency, memory bandwidth, and branch predictability in determining limits to processor performance. We use an aggressive simulation model with few other limits to study the performance of SPEC92 benchmarks. Our basic machine model assumes a ..."
Abstract - Cited by 18 (0 self) - Add to MetaCart
to performance until it exceeds 100 to 200 cycles. Memory bandwidth is not usually a significant limit either. In systems with memory latency of 16 cycles and perfect branch predictability, many applications require less than 6 bytes per cycle, while all but one perform well if 100 bytes per cycle are available

Efficiently Measuring Bandwidth at All Time Scales

by Frank Uyeda, Luca Foschini, Fred Baker, Subhash Suri, George Varghese
"... The need to identify correlated traffic bursts at various, and especially fine-grain, time scales has become pressing in modern data centers. The combination of Gigabit link speeds and small switch buffers have led to “microbursts”, which cause packet drops and large increases in latency. Our paper ..."
Abstract - Cited by 2 (0 self) - Add to MetaCart
kernel introduce minimal overhead on applications running at 10 Gbps, consume orders of magnitude less memory than event logging (hundreds of bytes per second versus Megabytes per second), but still provide good accuracy for bandwidth measures at any time scale. Our techniques can be implemented

High Performance Fair Bandwidth Allocation for Resilient Packet Rings

by V. Gambiroza, Y. Liu, P. Yuan, E. Knightly , 2002
"... The Resilient Packet Ring (RPR) IEEE 802.17 standard is under development as a new high-speed backbone technology for metropolitan area networks. A key performance objective of RPR is to simultaneously achieve high utilization, spatial reuse, and fairness, an objective not achieved by current techno ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Distributed Virtual-time Scheduling in Rings (DVSR). The key idea is for nodes to compute a simple lower bound of temporally and spatially aggregated virtual time using per-ingress counters of packet (byte) arrivals. We show that with this information propagated along the ring, each node can remotely

Can High Bandwidth and Latency Justify Large Cache Blocks in Scalable Multiprocessors?

by Ricardo Bianchini, Thomas J. Leblanc - Computer Science Department, University of Rochester , 1994
"... this paper, we examine the relationship between these factors in the context of large-scale, network-based, cache-coherent, shared-memory multiprocessors. Prompted by expected increases in interconnection network bandwidth (particularly with the use of optical networks), we consider whether or not i ..."
Abstract - Cited by 5 (3 self) - Add to MetaCart
of bandwidth on the choice of block size for each program in our application suite, using the miss rate and mean cost per reference as our main evaluation metrics. Our results show that block sizes between 32 and 128 bytes provide the best performance for our applications. Larger blocks usually result

High-Bandwidth Packet Switching on the Raw General-Purpose Architecture

by Gleb Chuvpilo, Gleb Albertovich Chuvpilo, Gleb Albertovich Chuvpilo , 2002
"... One of the distinct features of modern Internet routers is that most performance-critical tasks, such as the switching of packets, is currently done using Application Specific Integrated Circuits (ASICs) or custom-designed hardware. The only few cases when off-the-shelf general-purpose processors or ..."
Abstract - Add to MetaCart
MHz Raw processor is able to switch 3.3 million packets per second at peak rate, which results in the throughput of 26.9 gigabits per second for 1,024-byte packets. Second, it shows that it is possible to obtain an efficient mapping of a dynamic communications pattern, such as the connections

Very low power pipelines using significance compression

by Ramon Canal, Antonio González, James E. Smith , 2000
"... Data, addresses, and instructions are compressed by maintaining only significant bytes with two or three extension bits appended to indicate the significant byte positions. This significance compression method is integrated into a 5-stage pipeline, with the extension bits flowing down the pipeline t ..."
Abstract - Cited by 67 (2 self) - Add to MetaCart
. A byte serial pipeline is the simplest implementation, but suffers a CPI (cycles per instruction) increase of 79 % compared with a conventional 32-bit pipeline. Widening certain pipeline stages in order to balance processing bandwidth leads to an implementation with a CPI 24 % higher than

Distributed Computing Economics

by Jim Gray , 2003
"... Computing economics are changing. Today there is rough price parity between: (1) one database access; (2) 10 bytes of network traffic; (3) 100,000 instructions; (4) 10 bytes of disk storage; and (5) a megabyte of disk bandwidth. This has implications for how one structures Internet-scale distributed ..."
Abstract - Cited by 55 (0 self) - Add to MetaCart
Computing economics are changing. Today there is rough price parity between: (1) one database access; (2) 10 bytes of network traffic; (3) 100,000 instructions; (4) 10 bytes of disk storage; and (5) a megabyte of disk bandwidth. This has implications for how one structures Internet
Next 10 →
Results 1 - 10 of 119
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University