Results 1 - 10
of
17
Performance Analysis of TLS Web Servers
- In Proceedings of the Network and Distributed Systems Security Symposium (NDSS
, 2002
"... ..."
An Efficient Programmable 10 Gigabit Ethernet Network Interface Card
- In Proc. of HPCA
, 2005
"... This paper explores the hardware and software mechanisms necessary for an efficient programmable 10 Gigabit Ethernet network interface card. Network interface processing requires support for the following characteristics: a large volume of frame data, frequently accessed frame metadata, and high fra ..."
Abstract
-
Cited by 15 (3 self)
- Add to MetaCart
This paper explores the hardware and software mechanisms necessary for an efficient programmable 10 Gigabit Ethernet network interface card. Network interface processing requires support for the following characteristics: a large volume of frame data, frequently accessed frame metadata, and high frame rate processing. This paper proposes three mechanisms to improve programmable network interface efficiency. First, a partitioned memory organization enables low-latency access to control data and highbandwidth access to frame contents from a high-capacity memory. Second, a novel distributed task-queue mechanism enables parallelization of frame processing across many low-frequency cores, while using software to maintain total frame ordering. Finally, the addition of two new atomic read-modify-write instructions reduces frame ordering overheads by 50%. Combining these hardware and software mechanisms enables a network interface card to saturate a full-duplex 10 Gb/s Ethernet link by utilizing 6 processor cores and 4 banks of on-chip SRAM operating at 166 MHz, along with external 500 MHz GDDR SDRAM. 1.
Spinach: A Liberty-based simulator for programmable network interface architectures
- In Proceedings of the SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES
, 2004
"... This paper presents Spinach, a new simulator toolset specifically designed to target programmable network interface architectures. Spinach models both system components that are common to all programmable environments (e.g., ALUs, control and data paths, registers, instruction processing) and compon ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
This paper presents Spinach, a new simulator toolset specifically designed to target programmable network interface architectures. Spinach models both system components that are common to all programmable environments (e.g., ALUs, control and data paths, registers, instruction processing) and components that are specific to the embedded systems and network interface environments (e.g., software-controlled scratchpad memory, hardware assists for DMA and medium access control). Spinach is built on the Liberty Simulation Environment (LSE) and exploits LSE’s modularity to support easy reconfiguration of programmable network interface cards (NICs) and embedded systems, enabling wide design space exploration with little or no code variation. For example, the same underlying C code is used whether supporting a uniprocessor Gigabit network interface, a multiprocessor Gigabit interface, or a multiprocessor 10 Gigabit interface with a highly heterogeneous memory system. The only difference is in a small number of lines of high-level scripting code used to configure the various modules into a simulation model. Spinach is validated by modeling the Tigon-2 programmable Ethernet controller by Alteon Websystems running actual Ethernet processing firmware and by comparing the reported results to actual hardware benchmarks. Spinach is then used to obtain new insights about the performance of Gigabit and 10 Gigabit network interfaces.
Exploiting Task-level Concurrency in a Programmable Network Interface
, 2003
"... the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. This paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware fo ..."
Abstract
-
Cited by 11 (4 self)
- Add to MetaCart
the functionality of network services but lead to instruction processing overheads when compared to application-specific network interfaces. This paper aims to offset those performance disadvantages by exploiting task-level concurrency in the workload to parallelize the network interface firmware for a programmable controller with two processors. By carefully partitioning the handler procedures that process various events related to the progress of a packet, the system can minimize sharing, achieve load balance, and efficiently utilize on-chip storage. Compared to the uniprocessor firmware released by the manufacturer, the parallelized network interface firmware increases throughput by 65% for bidirectional UDP traffic of maximum-sized packets, 157% for bidirectional UDP traffic of minimum-sized packets, and 32--107% for real network services. This parallelization results in performance within 10--20% of a modern ASIC-based network interface for real network services.
Communication breakdown: Analyzing cpu usage in commericial web workloads
- In ISPASS
, 2004
"... There is increasing concern among developers that future web servers running commercial workloads may be limited by network processing overhead in the CPU as 10Gb ethernet becomes prevalent. We analyze CPU usage of real hardware running popular commercial workloads, with an emphasis on identifying n ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
There is increasing concern among developers that future web servers running commercial workloads may be limited by network processing overhead in the CPU as 10Gb ethernet becomes prevalent. We analyze CPU usage of real hardware running popular commercial workloads, with an emphasis on identifying networking overhead. Contrary to much popular belief, our experiments show that network processing is unlikely to be a problem for workloads that perform significant data processing. For the dynamic web serving workloads we examine, networking overhead is negligible (3 % or less), and data processing limits performance. However, for web servers that serve static content, networking processing can significantly impact performance (up to 25 % of CPU cycles). With an analytical model, we calculate the maximum possible improvement in throughput due to protocol offload to be 50 % for the static web workloads. 1
Protection strategies for direct access to virtualized I/O devices
- In Proceedings of the 2008 USENIX Annual Technical Conference
, 2008
"... Commodity virtual machine monitors forbid direct access to I/O devices by untrusted guest operating systems in order to provide protection and sharing. However, both I/O memory management units (IOMMUs) and recently proposed software-based methods can be used to reduce the overhead of I/O virtualiza ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Commodity virtual machine monitors forbid direct access to I/O devices by untrusted guest operating systems in order to provide protection and sharing. However, both I/O memory management units (IOMMUs) and recently proposed software-based methods can be used to reduce the overhead of I/O virtualization by providing untrusted guest operating systems with safe, direct access to I/O devices. This paper explores the performance and safety tradeoffs of strategies for using these mechanisms. The protection strategies presented in this paper provide equivalent inter-guest protection among operating system instances. However, they provide varying levels of intra-guest protection from driver software and incur varying levels of overhead. A simple direct-map strategy incurs the least overhead, providing native-level performance but offering no enhanced protection from misbehaving device drivers within the guest operating system. Additional protection against guest drivers can be achieved by limiting IOMMU page-table mappings to memory buffers that are actually used in I/O transfers. Furthermore, the cost incurred by this limitation can be minimized by aggressively reusing these mappings. Surprisingly, a software-only strategy that does not use an IOMMU at all performs competitively, and sometimes better than, hardware-based strategies while maintaining strict inter-guest isolation. 1
Analyzing NIC Overheads in Network-Intensive Workloads
- In 8th Workshop on Computer Architecture Evaluation using Commercial Workloads (CAECW), Feb 2005
, 2004
"... Modern high-bandwidth networks place a significant strain on host I/O subsystems. However, despite the practical ubiquity of TCP/IP over Ethernet for high-speed networking, the vast majority of end-host networking research continues in the current paradigm of the network interface as a generic perip ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Modern high-bandwidth networks place a significant strain on host I/O subsystems. However, despite the practical ubiquity of TCP/IP over Ethernet for high-speed networking, the vast majority of end-host networking research continues in the current paradigm of the network interface as a generic peripheral device. As a result, proposed optimizations focus on purely software changes, or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We look at an alternative approach: leave the kernel TCP/IP stack unchanged, but eliminate bottlenecks by closer attachment of the NIC to the CPU and memory system.
Network interface data caching
- IEEE Transactions on Computers
, 2005
"... Abstract—Network interface data caching reduces local interconnect traffic on network servers by caching frequently-requested content on a programmable network interface. The operating system on the host CPU determines which data to store in the cache and for which packets it should use data from th ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
Abstract—Network interface data caching reduces local interconnect traffic on network servers by caching frequently-requested content on a programmable network interface. The operating system on the host CPU determines which data to store in the cache and for which packets it should use data from the cache. To facilitate data reuse across multiple packets and connections, the cache only stores application-level response content (such as HTTP data), with application-level and networking headers generated by the host CPU. Network interface data caching reduces PCI traffic by 12-61 percent for six Web workloads on a prototype implementation of a uniprocessor Web server. This traffic reduction improves peak throughput for three workloads by 6-36 percent. Index Terms—Web servers, local interconnects, network interfaces, operating systems. 1
The Performance Potential of an Integrated Network Interface
, 2004
"... High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimization ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
High-bandwidth TCP/IP networking is a core component of current and future computer systems. Though networking is central to computing today, the vast majority of end-host networking research focuses on the current paradigm of the network interface being merely a peripheral device. Most optimizations focus solely on software changes or on moving some of the computation from the primary CPU to the off-chip network interface controller (NIC). We present an alternative approach for achieving high performance networking. Rather than increasing the complexity of the NIC, we directly integrate a conventional NIC on the CPU die.
Design Alternatives for a High-Performance Self-Securing Ethernet Network Interface
, 2007
"... This paper presents and evaluates a strategy for integrating the Snort network intrusion detection system into a high-performance programmable Ethernet network interface card (NIC), considering the impact of several possible hardware and software design choices. While currently proposed ASIC, FPGA, ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
This paper presents and evaluates a strategy for integrating the Snort network intrusion detection system into a high-performance programmable Ethernet network interface card (NIC), considering the impact of several possible hardware and software design choices. While currently proposed ASIC, FPGA, and TCAM systems can match incoming string content in real-time, the system proposed also supports the stream reassembly and HTTP content transformation capabilities of Snort. This system, called LineSnort, parallelizes Snort using concurrency across TCP sessions and executes those parallel tasks on multiple low-frequency pipelined RISC processors embedded in the NIC. LineSnort additionally exploits opportunities for intra-session concurrency. The system also includes dedicated hardware for high-bandwidth data transfers and for high-performance string matching. Detailed results obtained by simulating various software and hardware configurations show that the proposed system can achieve intrusion detection throughputs in excess of 1 Gigabit per second for fairly large rule sets. Such performance requires the system to use hardware-assisted string matching and a small shared data cache. The system can extract performance through increases in processor clock frequency or parallelism, allowing additional flexibility for designers to achieve performance within specified area or power budgets. By efficiently offloading the computationally difficult task of intrusion detection to the network interface, LineSnort enables intrusion detection to run directly on PCbased network servers rather than just at powerful edge-based appliances. As a result, LineSnort has the potential to protect servers against the growing menace of LAN-based attacks, whereas traditional edge-based intrusion detection deployments can only protect against external attacks.

