Results 1 -
6 of
6
High-performance Packet Classification Algorithm for Many-core and Multithreaded Network Processor
- CASES'06
, 2006
"... Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps speed or higher is a challenging problem and it ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
(Show Context)
Packet classification is crucial for the Internet to provide more value-added services and guaranteed quality of service. Besides hardware-based solutions, many software-based classification algorithms have been proposed. However, classifying at 10Gbps speed or higher is a challenging problem and it is still one of the performance bottlenecks in core routers. In general, classification algorithms face the same challenge of balancing between high classification speed and low memory requirements. This paper proposes a modified Recursive Flow Classification (RFC) algorithm, Bitmap-RFC, which significantly reduces the memory requirements of RFC by applying a bitmap compression technique. To speed up classifying speed, we experiment on exploiting the architectural features of a many-core and multithreaded architecture from algorithm design to algorithm implementation. As a result, Bitmap-RFC strikes a good balance between speed and space. It can not only keep high classification speed but also reduce memory space significantly. This paper investigates the main NPU software design aspects that have dramatic performance impacts on any NPU-based implementations: memory space reduction, instruction selection, data allocation, task partitioning, and latency hiding. We experiment with an architecture-aware design principle to guarantee the high performance of the classification algorithm on an NPU implementation. The experimental results show that the Bitmap-RFC algorithm achieves 10Gbps speed or higher and has a good scalability on Intel IXP2800 NPU.
Advances in the Dataflow Computational Model
, 1999
"... The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
The dataflow program graph execution model, or dataflow for short, is an alternative to the stored- program (von Neumann) execution model. Because it relies on a graph representation of programs, the strengths of the dataflow model are very much the complements of those of the stored-program one. In the last thirty or so years since it was proposed, the dataflow model of computation has been us ed and developed in very many areas of computing research: from programming languages to processor design,and from signal processing to reconfigurable computing. This paper is a review of the current state-of-the-art in the applications of the dataflow model of computation. It focuses on three areas: multithreaded computing, signal processing and reconfigurable computing.
Symbolic Partitionning and Scheduling of Parameterized Task Graphs
- IN IEEE INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS'98
, 1998
"... The DAG-based task graph model has been found effective in scheduling for performance prediction and optimization of parallel applications. However the scheduling complexity and solution normally depend on the problem size. In this paper, we propose a symbolic scheduling scheme for a parameterized t ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
The DAG-based task graph model has been found effective in scheduling for performance prediction and optimization of parallel applications. However the scheduling complexity and solution normally depend on the problem size. In this paper, we propose a symbolic scheduling scheme for a parameterized task graph which models coarse-grain DAG parallelism independent of the problem size. The algorithm first derives symbolic clusters to group of tasks in order to minimize communication while preserving parallelism and then it evenly assigns task clusters to processors. The runtime system executes clusters on each processor in a multithreaded fashion. This paper also presents preliminary experimental results to demonstrate the effectiveness of our techniques.
X.: Practice of parallelizing network applications on multi-core architectures
- In: Proceedings of the 23rd international conference on Supercomputing, ICS ’09
, 2009
"... The industry wide shift to multi-core architectures arouses great interests in parallelizing sequential applications. However, it is very difficult to parallelize fine-grained applications for multi-core architectures due to insufficient hardware support of fast communication and synchronization. Fo ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
(Show Context)
The industry wide shift to multi-core architectures arouses great interests in parallelizing sequential applications. However, it is very difficult to parallelize fine-grained applications for multi-core architectures due to insufficient hardware support of fast communication and synchronization. Fortunately, network applications can be decomposed into pipelined structures that are amenable to streaming based parallel processing. To realize the potential of pipelining on multi-core architectures, it requires reevaluating the basic tradeoffs in parallel processing, including the ones between load balance and data locality and between general lock mechanisms and special lock-free data structures. This paper presents the practice of building a high-performance multi-core based network processing platform in which connection-affinity and lock-free design principles are applied
Scalable Packet Classification Using Interpreting—A Crossplatform Multi-core Solution
- In PPoPP'08
, 2008
"... Packet classification is an enabling technology to support advanced Internet services. It is still a challenge for a software solution to achieve 10Gbps (line-rate) classification speed. This paper presents a classification algorithm that can be efficiently implemented on a multi-core architecture w ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
(Show Context)
Packet classification is an enabling technology to support advanced Internet services. It is still a challenge for a software solution to achieve 10Gbps (line-rate) classification speed. This paper presents a classification algorithm that can be efficiently implemented on a multi-core architecture with or without cache. The algorithm embraces the holistic notion of exploiting application characteristics, considering the capabilities of the CPU and the memory hierarchy, and performing appropriate data partitioning. The classification algorithm adopts two stages: searching on a reduction tree and searching on a list of ranges. This decision is made based on a classification heuristic: the size of the range list is limited after the first stage search. Optimizations are then designed to speed up the two-stage execution. To exploit the speed gap (1) between the CPU and
Symbolic Partitioning and Scheduling of Parameterized Task Graphs
, 1999
"... All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately. ..."
Abstract
- Add to MetaCart
(Show Context)
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.