An Approach Towards Enabling Intelligent Networking Services for Distributed Multimedia Applications

Size: px
Start display at page:

Download "An Approach Towards Enabling Intelligent Networking Services for Distributed Multimedia Applications"

Transcription

1 An Approach Towards Enabling Intelligent Networking Services for Distributed Multimedia Applications Srikanth Sundaragopalan, Ada Gavrilovska, Sanjay Kumar, Karsten Schwan {srikanth, ada, ksanjay, Center for Experimental Research in Computer Science Georgia Institute of Technology Atlanta, Georgia Abstract An increase in network speeds and addition of new services in the Internet has increased the demand for intelligence and flexibility in network systems. This paper explores the extent to which an emergent class of programmable networking devices network processors, can be used to deliver more efficient, innovative services to multimedia applications. We present and experimentally evaluate an architecture model, which enables the dynamic creation of application-specific services on programmable communication devices, using the Intel IXP2400 network processor and an image manipulation application. 1. Introduction An increase in speed of networks and addition of new services in the Internet has increased the demand for intelligence and flexibility in network systems. Distributed multiplayer video games based on distributed virtual environments are becoming increasingly popular [12]. These applications involve periodic exchange of messages between the participating machines using some distributed agreement protocol [13]. On the other hand distributed streaming applications involve large volumes of audio/video data to be transmitted across a wide range of clients [8]. Both of these classes of applications need two kinds of strict requirements: (1) bandwidth to send large amount of data and (2) processing power in participating hosts to ensure timely processing and distribution of the content on continuous basis. These applications typically rely on the existence of an underlying communication infrastructure, such as application-level overlays or peer-to-peer networks. Application-specific processing actions are executed at intermediate nodes to implement data distribution functionality and ensure end-user QoS requirements. In addition, intermediate nodes may execute other application components. Therefore, the deployment of new distributed services must have predictable impact of the performance of existing CPU loads at these nodes. Recent advancement in the network speed (Gigabit networks) has far reduced our concern about the bandwidth needed for such applications. However, the increase in sustainable bandwidth continues to present challenges. First, the growth in the network speed is fast approaching the memory interface speeds of general-purpose processors, making even a single memory reference impact the performance of the network application. Though the network is able to deliver data at high speeds, the overlay host machine becomes a bottleneck in processing and delivering them. Second, the increase in diverse mobile and wireless end-user platforms has created demand for continuous delivery of application services in resource poor environments (e.g., low-bandwidth connectivity, and limited computation power). In order to meet the high-bandwidth, low and predictable latency requirements of multimedia applications, additional processing needs to be applied to the data path in the overlays. We argue that such processing and distribution of high bandwidth data must be carried out in the fastpath to get the maximum performance. The fastpath processing could be entirely supported by dedicated hardware, if only for their cost and inflexibility. With emergence of high-speed programmable network processors (NPs), software based solutions are considered feasible compared to their costly hardware based counterparts. Performance of previous implementations of software based router and QoS has strengthened the cause of exploring various services that can be added at network processor level [1,5,6,7,11]. Network processors 1

2 such as IXP 2xxx series are one such high-speed programmable processor, which meets these requirements with less cost and greater flexibility through their programmable nature. This paper presents and evaluates a flexible architecture built for platforms with programmable network interfaces, which supports efficient execution of variety of services required in distributed media applications. Experimental results in Section 5 demonstrate the feasibility of implementing media services on communication cores, and the performance improvements, which this approach can deliver. The remainder of the paper is organized as follows. Section 2 discusses the different classes of services needed in distributed media applications targeted by our work. It also briefly describes the Intel IXP NPs, used as an implementation platform in our work. Next, in Section 3 we describe the proposed architecture. Details regarding the IXP2400-based implementation are presented in Section 4. Experimental results appear in Section 5. Concluding remarks and outline of future research directions are given at the end. 2. Distributed Media Services A wide range of multimedia services could be supported at the network level with the use of high-speed network processors. We can broadly classify these services as: Path transforming services (PathX) -- manipulation on the media stream, and Data transforming services (DataX) -- manipulation of the data content. The first set includes services such as routing, multicasting, broadcasting, prioritizing, filtering, etc. For example, removing certain frames from an MPEG stream to produce an MPEG video with reduced, but still satisfactory quality can be easily supported on these platforms. All of these services would require access and manipulation of the protocol and application level headers, but still leave the data content unmodified. The second category includes services such as transforming/transcoding the data in different format representation, down sampling image quality by modifying the image, cropping the image to match to client view, compression algorithms, encryption, etc. Applications which require such functionality at the network level, include visualization of remote experiments, or camera-captured data for a pool of clients with wide range of interests in the ongoing visualization, and/or which use devices with varying computational or networking capabilities [9]. These applications need services which (1) render appropriate view of imaging data on behalf of the clients, and (2) match the quality of the image stream to the client s interests, or device and connection capabilities. Finally, consider services like applicationlevel IP multicasting, needed in variety of distributed media applications [13]. In these services, once the data is received, it needs to be duplicated to n other members, as indicated in the routing tables. Such duplication involves heavy copying, and imposes additional loads on the host s memory and I/O infrastructure. Moreover these packets need to be received by the network adapter, processed through the IP and UDP 1 stacks and then delivered to the application, which manages multicast routing functionality. In addition, for large application level data, we need to consider the delays due to fragmentation and reassembly. One could remove this multi layer approach by handling such duplication at the network level itself. One approach could be to program such multicasting logic into programmable networking devices such as the IXP NPs. Such programmable processors provide sufficient headroom for carrying out application level packet manipulation and still be in fast path [4,2] IXP 2xxx Architecture The proposed architecture assumes the availability of programmable communication cores or network cards at (at least some) nodes participating in the distributed infrastructure. Our implementation uses network processors from the Intel IXP family, specifically the IXP2400, attached to standard Linux-based hosts, to represent these future platforms. Some of the design here assumes that any network processor will have similar architecture and functionality of Intel IXP The feature that is important from the perspective of our design is the presence of multiple processing ens (such as the microens of IXP2400), which can be pipelined in arbitrary manner to perform certain tasks.we briefly outline the key architectural features of the IXP2400 architecture. For more detailed information on the IXP2400 refer to [3]. The Intel IXP2xxx Network Processors have specialized hardware to support network 1 Multicast is supported with UDP only 2

3 operations, which gives the ability to implement network services with high packet throughput and low latency [3]. In addition to the network centric operations, it also provides support for data processing in the form of CRC checksum calculations, integer arithmetic etc. Figure 1 shows the functional units of IXP2400 network processor. The IXP2400 network processor offers an increased number of microens at higher speeds, additional memory, a faster host-anp PCI-based interconnect, and other technology improvements. The IXP2400 chip includes eight 8-way multithreaded microens for data movement and processing, an Xscale core for management and other functions, local SRAM and DRAM controllers and an integral PCI interface with three DMA channels. The Radisys ENP2611 board [9] on which the IXP2400 resides includes a 600MHz IXP2400, 256MB DRAM, 8MB SRAM, a POS-PHY Level 3 FPGA which connects to 3 Gigabit interfaces and a PCI interface. An Xscale core, running Linux, is primarily used for initialization, management and debugging. The IXP2400 is attached to hosts running standard Linux kernels over a PCI interface. Data is delivered to and from the host-resident application components through the IXP s network interfaces. There is wide range of development tools available which aid the application development for IXP Programmers. These include workbench simulators, SDK s, debugging utilities (data watch, memory watch etc), performancegathering utilities, etc. SDRAM CONTR OLLER PCI Interface Intel Xscale Core HASH UNIT MSF SCRATCH PAD SRAM CONTR OLLER Figure 1. IXP2400 Block Architecture (source, Intel IXA) 3. System Architecture Figure 2 presents the high-level view of the distributed infrastructure targeted by our work, and can be described as follows. Media data is exchanged on continuous basis on top of an application-level overlay. The data may traverse intermediate nodes on the path from sources to destinations. The processing at these intermediate nodes may involve solely overlay routing actions, or may require additional application-specific data content and path manipulations. Based on the processing actions required, or the existing loads at intermediate nodes, the data path may be mapped to their dedicated communication cores, at the network interface level. The mapping of processing actions to individual nodes in the overlay is responsibility of the middleware layer utilized by the distributed applications, and is not the focus 3

4 of this paper. Individual nodes enhanced with programmable communication interfaces, similar to those discussed in [2] are depicted in Figure 3. In addition to data distribution functionality, the general purpose host CPU may execute other application components. The distribution of data in the overlay, and deployment of processing on behalf of overlay clients is managed by a Distribution Manager. This layer drives any reconfiguration actions through which the data path through this node can be offloaded onto the communication interface (e.g., programmable NIC, attached NP, dedicated communication core, etc.). hosts Figure 2. Distributed Application Overlays control plane Rx fastpath hosts with programmable NIs Application Data Distribution Mgt data path control Fast Path (Re-)Configuration Data X Path X Tx Figure 3. System Architecture NI H O S T The core functionality implemented on the fast path is encapsulated by the Rx and Tx tasks, which implement receive- and transmit-side protocol processing. Additional DataX and PathX tasks may be deployed on the fast path to implemented new services, thereby forming a processing pipeline. Depending on the underlying platform architecture and resource availability, DataX and PathX tasks are mapped to separate execution contexts (e.g., threads, microens, processing elements). This may be particularly needed for DataX tasks, which have greater computational requirements and include repeated access to data content stored in chip memory. PathX tasks may be combined and executed jointly with the Tx processing. Compositions of DataX and PathX tasks can be formed to represent variety of services, ranging from data filter, transcoding, or multicast [2,7]. The processing executed as part of the DataX and PathX blocks can be dynamically configured through the host-resident FastPath Configuration interface. Fast path state and resource utilization is available at the host, so that host-resident control components can make estimates regarding service levels sustainable with the current fast path configuration. Currently, we assume that all QoS-related data path reconfigurations are triggered by the middleware. Our future work will consider augmenting the architecture with functionality to detect QoS-violating conditions, and trigger appropriate actions. 4. Implementation Detail 4.1. IXP2400-based Architecture Our current implementation assigns individual blocks from the fastpath pipeline to separate microens on the IXP2400 NPs. Each task may be executed by one or more hardware supported context, i.e. microen threads. Communication between pipeline stages is implemented through controlled access to shared ring buffers. SRAM based data descriptors describe the application level data, stored across multiple DRAM resident buffers. Communication with the host-resident control processes occurs via using shared mailboxes implemented on top of a PCI-based interface, similar to [10]. Fastpath reconfiguration requires involvement of the Xscale core, and has already been implemented for the previous generation IXP NPs with practically negligible service interruption (28-30 microseconds) Image Processing Application As an instance of the pipelined architecture, we have implemented a Packet Forwarding Application and an Image Filtering Application to benchmark application level data manipulation 4

5 in fast path. Figure 4 shows the implementation details of the Image Filtering Application. As shown in the figure we have implemented two flavors of the filtering algorithm to compare memory performance. In the first algorithm we assemble the entire Image on to SRAM and then do the gray filtering. We do this instead of directly manipulating the buffers on the DRAM for two reasons. The first being that, there are certain application level processing which needs the complete data to act upon. This makes the data manipulation algorithm simple. Another reason is that, repeated access to the application data would amortize the cost for copying it on to the SRAM as there is a significance cycle count difference between a DRAM read and a SRAM read operation. Also, reading the fragmented data on DRAM may be costly if there is huge number of such data reads. The second algorithm is complex in the sense; it needs to manipulate fragmented data in DRAM. But, avoiding the memory copy from DRAM to SRAM could trade off such a complexity. As said above, this kind of algorithm would make sense where there is a limited amount of application level data manipulation. Receive Micro En Scratch Ring with receiving packet handles Packet Buffers in DRAM 1. Assembles packet handles of all the fragments and forms a linked list. 2. Uses CAM as lookup to assembling the fragment handles 3. Triggers the Filter Micro en once all fragments have been received giving it the linked list Packet_dl Micro En Linked List of Packet Handles in SRAM Next Neighbor Ring used to trigger Filter Micro en Algorithm Assembles the entire application level data onto SRAM 2. Gray scale the Image 3. Fragment the Image 4. Enqueue the packet handles to Transmit Micro en Scratch Ring with to be transmitted packet handles Filter Micro en Algorithm Manipulates the Image directly on DRAM 2. Form the modified linked list of packet handles 4. Enqueue the packet handles to Transmit Micro en Transmit Micro en Figure 4. Image Processing Application 5

6 4.3. Efficient Multicast Implementation The pipeline architecture assumes that each microen is dedicated to a task. The data stream will pass through this pipeline, when it is in the fast path. The ability to assign the same task to multiple processing contexts permits us to exploit the IXP s hardware supported parallelism and create efficient implementations for variety of services. Consider the support for application level multicasting on attached network processors. We could build the routing logic to be executed at the core and do the duplication at the micro ens. This means that the core will get involved in multicast routing algorithms to build the routing tables and store it in shared space which the micro ens can access. When packets arrive from the network, the micro ens can use the routing tables to route the packets to the destinations. A straightforward implementation would be to duplicate these packets n times and send it. But, such an operation would involve a huge number of memory copying. As an alternate approach, one could exploit the parallelism provided by these micro ens and do intelligent duplication. in the routing table to write into the IP header of the packet. It then places a transmit request for that packet to the Transmit driver. The transmit driver transmits the first 64 of the packet and signals back the thread, which placed the request without freeing the packet buffer. Once the signal is received the thread goes ahead and picks up the next destination from the routing table and overwrites the IP header to the new value. This is absolutely acceptable, as the IP header has been transmitted in the first 64 and thus achieving a near zero duplication of packets. Once the thread is done with all destinations, the packet buffer is freed. The next packet from the queue is picked up and the procedure is started all over again. 5. Performance Analysis We have carried out different sets of experiments to analyze the performance of application level data manipulation. Host 2 User Kernel Routing Table Host1 NIC Host3 Rx Driver Thread-1 Rx Driver Figure 6. Host-based Test-bed Thread-2 Signal back Host 2 User Kernel Thread-8 Figure 5. Parallel Packet Processing Host1 IXP Host3 Figure 5 shows such logic. As and when packets arrive, they are assembled and are ready to be transmitted to their respective destinations. Unlike the traditional way of duplicating the packets n times, we will work on individual packets exclusively and modify their headers to reflect the destinations. In our current implementation, each of the eight hardware assisted thread of the micro en, works on individual packets. Each thread picks up a fully assembled packet and takes the first destination Figure 7. IXP-based Test-bed 5.1. General setup We run the set of experiments in different setups. The first configuration, presented in Figure 6, is a purely host-based, in which the data path from source to destination traverses an intermediate general purpose host node. In the second setup, the data path traverses an IXP NP on its route to destination (see Figure 7). In each case, additional processing is applied to the data at the 6

7 intermediate node. Our host machines are 4 CPU with Intel Xeon processors, each with 2.4 GHz clock speed. The IXP test-bed uses the aforementioned Radisys ENP2611 boards, with eight microens running at 600 MHz each. In order to simulate additional processing on the intermediate nodes in a distributed infrastructure, we run a CPU intensive process on the host machines, the applu application from the CPU2000 benchmark suite. The experiments use data streams generated from a flow of PPM images of varying sizes (which span across multiple Ethernet frames), and transmit these at maximum rates. For the host based experiments, images are sent from first host to second host via UDP. Here, the user level application waits for the entire data to be arrived and then either forwards or grey scales the image and sends it to the third host. In the third host, we either use raw sockets to capture Ethernet frames or receive the data using UDP depending on the performance measurement needed. For latency measurements we use UDP itself in host-3, but for measuring the throughput we need to capture Ethernet frames of these UDP packets and so we use raw sockets in host-3. For the IXP based test bed, the image is packetized into Ethernet frames and is pumped into the LAN through raw sockets. The IXP network processor receives them and either forwards or grey filters the Image before dispatching it to the third host. In host-3, we again use raw socket application to capture the Ethernet frames End-to-End latency PPM Images are fragmented into 1500 Ethernet frames and are put into the LAN using raw sockets API. The first timestamp is recorded when all the Ethernet frames are sent into the LAN from host-1 and the second one when we receive all the frames back on host-3. The difference between them is noted. Again, we repeat the same experiment with the host-1 and host-3 interchanged. This is done in order to remove the discrepancies in the two host machines clocks. We add the difference obtained in both the runs and divide it by 2 to get the actual end-to-end latency between these hosts. The experiment is carried out both on the host based test bed and the IXP based test bed. Tables 1 and 2 show the end-to-end latency for image of different sizes for the host-based and the IXP-based configuration, respectively. The results show that, there is a substantial latency difference between host-based test bed and the IXP-based test bed for images of lesser size and significantly less difference when the image size is big. This is because the kernel/user space overhead and the stack-processing (UDP/IP) overhead are quite significant compared to the processing carried out on the data content itself. Note that these latency measurements are for the image processing application, which requires computationally and data intensive processing at the intermediate node. For tasks with lower resource requirements, such as those implementing PathX services, the latency decreases are more significant for larger data sizes. Image Size Number of Ethernet Frame (1490 each) End-to-end Latency (micro secs) frames frames frames frame 1256 Table 1. End-to-end Latency for Host based Test bed Image Size Number of Ethernet Frame (1490 each) End-toend Latency (micro secs) 41 frames % 11 frames % Percentage decrease in Latency 3 frames % 1 frame % Table 2. End-to-end Latency for IXP based Test-bed 5.3. Throughput The throughput measurements are carried out in with the same setup as above. In the host based approach, Image data is sent over UDP from host-1. host-2 receives the entire image and does grey filtering on it and dispatches the filtered image to host-3. In host-3 we run a raw socket application to capture the Ethernet frames of the 7

8 UDP packets. The timestamp when the first Ethernet frame is received is recorded and then again the timestamp when the last one is received is recorded. Again, we evaluate a PathX forwarding task, and a DataX grey scaling task. The results are presented in Table 3. In both case the IXP-based implementation of the service outperforms the host-one. Simulation measurements indicate that the IXP-based implementation will significantly outperform the host-based implementation under increased network loads, in spite of the disparity of the computational capacity of the host CPUs vs. the IXP microens. IXPbased (Mbps) Host-based (Mbps) Gains (%) DataX PathX Table 3. Through measurements for the hostvs. IXP-based test-bed. Other services evaluated in our work on this and previous IXP-based platforms include (1) filtering of entire application-level messages (e.g., image frames), to meet end-user quality requirements, (2) multicast, (3) image cropping to match the data delivered to the client s current interests, (4) format translation, and others. The results of these experiments indicate that programmable networking devices can provide support for performance improvements in distributed applications with strict bandwidth or latency/jitter requirements. 6. Conclusions Technology advances have created the ability to bring intelligence into the networking infrastructure. This work explores one architecture, which using off-the-shelf programmable communication devices, creates enhanced platforms for flexible support for customized delivery of media streams. We demonstrate experimentally performance improvements attained on these platforms, for a range of services needed in distributed multimedia applications. Our future work will continue to investigate mechanisms for exploiting the programmability of the networking infrastructure and delivering scalable and predictable services to different classes of applications. We are currently considering the deployment of middleware-level functionality onto programmable NPs, generalization of the host-np control mechanisms, and techniques for efficient representation and caching of application-level meta-data regarding data formats and processing actions. Acknowledgment: Ramkumar Gandhapuneni and Prashant Thakare worked on the original implementation of the imaging application. References: 1. A. Gavrilovska, S. Kumar, K. Schwan, The Execution of Event-Action Rules on Programmable Network Processors, Workshop on Operating System and Architectural Support for the On-Demand IT Infrastructure (OASIS 2004), held with ASPLOS-XI, Boston, MA, Oct A. Gavrilovska, K. Schwan, O. Nordstrom, H. Seifu, Network Processors as Building Blocks in Overlay Networks, Hot Interconnects 11, Stanford, CA, Aug IXP Intel Network Processor Family, 4. T. Spalink, S. Karlin, L. Peterson, Y. Gottlieb, Building a Robust Software-Based Router Using Network Processors, Proceedings of 18 th SOSP 2001, Banff, Canada, Y-D. Lin, Y-N. Lin, S-C. Yang, Y-S. Lin, DiffServ over Network Processors: Implementation and Evaluation, Proc. of Hot Interconnects 10, Aug X. Zhuang, W. Shi, I. Paul, K. Schwan, Efficient Implementation of the DWCS Algorithm on High-Speed Programmable Network Processors, Proc. of Multimedia Networks and Systems (MMNS), Oct S. Roy, J. Ankcorn, S. Wee, An Architecture for Componentized, Network-Based Media Services, Proc. of IEEE International Conference on Multimedia and Expo, Jul, M. Wolf, Z. Cai, W. Huang, K. Schwan, SmartPointer: Personalized Scientific Data Portals in Your Hand, Supercomputing 2002, Nov Radisys Corporation. ENP-2611 Data Sheet K. Mackenzie, W. Shi, A. McDonald, I. Ganev, An Intel IXP1200-based Network Interface, Proceedings of the Workshop on Novel Uses of System Area Networks at HPCA (SAN ), Anaheim, CA,

9 11. Path 1 Network Technologies, Professional Digital Video Gateways for the Broadcaster and Multi-Service Operator, Delivered by Path 1 Network Technologies* and Intel Network Processors. White Paper, M.R.Stytz, Distributed virtual environments, Computer Graphics and Applications, IEEE, Volume: 16, Issue: 3, May C. Diot, L. Gautier, A distributed architecture for multiplayer interactive applications on the Internet, Network, IEEE, Volume: 13, Issue: 4, July August

Re-architecting Virtualization in Heterogeneous Multicore Systems

Re-architecting Virtualization in Heterogeneous Multicore Systems Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College

More information

Stream Handlers: Application-specific Message Services on Attached Network Processors

Stream Handlers: Application-specific Message Services on Attached Network Processors Stream Handlers: Application-specific Message Services on Attached Network Processors Ada Gavrilovska, Kenneth Mackenzie, Karsten Schwan, and Austen McDonald College of Computing Georgia Institute of Technology

More information

Implementation of Adaptive Buffer in Video Receivers Using Network Processor IXP 2400

Implementation of Adaptive Buffer in Video Receivers Using Network Processor IXP 2400 The International Arab Journal of Information Technology, Vol. 6, No. 3, July 2009 289 Implementation of Adaptive Buffer in Video Receivers Using Network Processor IXP 2400 Kandasamy Anusuya, Karupagouder

More information

Topic & Scope. Content: The course gives

Topic & Scope. Content: The course gives Topic & Scope Content: The course gives an overview of network processor cards (architectures and use) an introduction of how to program Intel IXP network processors some ideas of how to use network processors

More information

Introduction to TCP/IP Offload Engine (TOE)

Introduction to TCP/IP Offload Engine (TOE) Introduction to TCP/IP Offload Engine (TOE) Version 1.0, April 2002 Authored By: Eric Yeh, Hewlett Packard Herman Chao, QLogic Corp. Venu Mannem, Adaptec, Inc. Joe Gervais, Alacritech Bradley Booth, Intel

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

On Distributed Communications, Rand Report RM-3420-PR, Paul Baran, August 1964

On Distributed Communications, Rand Report RM-3420-PR, Paul Baran, August 1964 The requirements for a future all-digital-data distributed network which provides common user service for a wide range of users having different requirements is considered. The use of a standard format

More information

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS

HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS HIGH-PERFORMANCE NETWORKING :: USER-LEVEL NETWORKING :: REMOTE DIRECT MEMORY ACCESS CS6410 Moontae Lee (Nov 20, 2014) Part 1 Overview 00 Background User-level Networking (U-Net) Remote Direct Memory Access

More information

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK

High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK High bandwidth, Long distance. Where is my throughput? Robin Tasker CCLRC, Daresbury Laboratory, UK [r.tasker@dl.ac.uk] DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459

More information

Netronome NFP: Theory of Operation

Netronome NFP: Theory of Operation WHITE PAPER Netronome NFP: Theory of Operation TO ACHIEVE PERFORMANCE GOALS, A MULTI-CORE PROCESSOR NEEDS AN EFFICIENT DATA MOVEMENT ARCHITECTURE. CONTENTS 1. INTRODUCTION...1 2. ARCHITECTURE OVERVIEW...2

More information

TOWARDS IQ-APPLIANCES: QUALITY-AWARENESS IN INFORMATION VIRTUALIZATION

TOWARDS IQ-APPLIANCES: QUALITY-AWARENESS IN INFORMATION VIRTUALIZATION TOWARDS IQ-APPLIANCES: QUALITY-AWARENESS IN INFORMATION VIRTUALIZATION A Thesis Presented to The Academic Faculty by Radhika Niranjan Mysore In Partial Fulfillment of the Requirements for the Degree Master

More information

440GX Application Note

440GX Application Note Overview of TCP/IP Acceleration Hardware January 22, 2008 Introduction Modern interconnect technology offers Gigabit/second (Gb/s) speed that has shifted the bottleneck in communication from the physical

More information

A Content Delivery Accelerator in Data-Intensive Servers

A Content Delivery Accelerator in Data-Intensive Servers A Content Delivery Accelerator in Data-Intensive Servers Joon-Woo Cho, Hyun-Jin Choi, Seung-Ho Lim and Kyu-Ho Park Computer Engineering Research Laboratory, EECS Korea Advanced Institute of Science and

More information

Ruler: High-Speed Packet Matching and Rewriting on Network Processors

Ruler: High-Speed Packet Matching and Rewriting on Network Processors Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)

More information

A Next Generation Home Access Point and Router

A Next Generation Home Access Point and Router A Next Generation Home Access Point and Router Product Marketing Manager Network Communication Technology and Application of the New Generation Points of Discussion Why Do We Need a Next Gen Home Router?

More information

Design and Implementation of A P2P Cooperative Proxy Cache System

Design and Implementation of A P2P Cooperative Proxy Cache System Design and Implementation of A PP Cooperative Proxy Cache System James Z. Wang Vipul Bhulawala Department of Computer Science Clemson University, Box 40974 Clemson, SC 94-0974, USA +1-84--778 {jzwang,

More information

Improving DPDK Performance

Improving DPDK Performance Improving DPDK Performance Data Plane Development Kit (DPDK) was pioneered by Intel as a way to boost the speed of packet API with standard hardware. DPDK-enabled applications typically show four or more

More information

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM Alberto Perez, Technical Manager, Test & Integration John Hildin, Director of Network s John Roach, Vice President

More information

Performance Evaluation of Myrinet-based Network Router

Performance Evaluation of Myrinet-based Network Router Performance Evaluation of Myrinet-based Network Router Information and Communications University 2001. 1. 16 Chansu Yu, Younghee Lee, Ben Lee Contents Suez : Cluster-based Router Suez Implementation Implementation

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001

Utilizing Linux Kernel Components in K42 K42 Team modified October 2001 K42 Team modified October 2001 This paper discusses how K42 uses Linux-kernel components to support a wide range of hardware, a full-featured TCP/IP stack and Linux file-systems. An examination of the

More information

Chapter 5.6 Network and Multiplayer

Chapter 5.6 Network and Multiplayer Chapter 5.6 Network and Multiplayer Multiplayer Modes: Event Timing Turn-Based Easy to implement Any connection type Real-Time Difficult to implement Latency sensitive 2 Multiplayer Modes: Shared I/O Input

More information

Multimedia Streaming. Mike Zink

Multimedia Streaming. Mike Zink Multimedia Streaming Mike Zink Technical Challenges Servers (and proxy caches) storage continuous media streams, e.g.: 4000 movies * 90 minutes * 10 Mbps (DVD) = 27.0 TB 15 Mbps = 40.5 TB 36 Mbps (BluRay)=

More information

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure

6WINDGate. White Paper. Packet Processing Software for Wireless Infrastructure Packet Processing Software for Wireless Infrastructure Last Update: v1.0 - January 2011 Performance Challenges for Wireless Networks As advanced services proliferate and video consumes an ever-increasing

More information

Netchannel 2: Optimizing Network Performance

Netchannel 2: Optimizing Network Performance Netchannel 2: Optimizing Network Performance J. Renato Santos +, G. (John) Janakiraman + Yoshio Turner +, Ian Pratt * + HP Labs - * XenSource/Citrix Xen Summit Nov 14-16, 2007 2003 Hewlett-Packard Development

More information

Supra-linear Packet Processing Performance with Intel Multi-core Processors

Supra-linear Packet Processing Performance with Intel Multi-core Processors White Paper Dual-Core Intel Xeon Processor LV 2.0 GHz Communications and Networking Applications Supra-linear Packet Processing Performance with Intel Multi-core Processors 1 Executive Summary Advances

More information

Case Study: Using System Tracing to Improve Packet Forwarding Performance

Case Study: Using System Tracing to Improve Packet Forwarding Performance Case Study: Using System Tracing to Improve Packet Forwarding Performance Sebastien Marineau-Mes, Senior Networking Architect, sebastien@qnx.com Abstract Symmetric multiprocessing (SMP) can offer enormous

More information

Advanced Computer Networks. End Host Optimization

Advanced Computer Networks. End Host Optimization Oriana Riva, Department of Computer Science ETH Zürich 263 3501 00 End Host Optimization Patrick Stuedi Spring Semester 2017 1 Today End-host optimizations: NUMA-aware networking Kernel-bypass Remote Direct

More information

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory

A Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal

More information

Securing Grid Data Transfer Services with Active Network Portals

Securing Grid Data Transfer Services with Active Network Portals Securing Grid Data Transfer Services with Active Network Portals Onur Demir 1 2 Kanad Ghose 3 Madhusudhan Govindaraju 4 Department of Computer Science Binghamton University (SUNY) {onur 1, mike 2, ghose

More information

Optimizing TCP Receive Performance

Optimizing TCP Receive Performance Optimizing TCP Receive Performance Aravind Menon and Willy Zwaenepoel School of Computer and Communication Sciences EPFL Abstract The performance of receive side TCP processing has traditionally been dominated

More information

Measurement-based Analysis of TCP/IP Processing Requirements

Measurement-based Analysis of TCP/IP Processing Requirements Measurement-based Analysis of TCP/IP Processing Requirements Srihari Makineni Ravi Iyer Communications Technology Lab Intel Corporation {srihari.makineni, ravishankar.iyer}@intel.com Abstract With the

More information

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT

Hardware Assisted Recursive Packet Classification Module for IPv6 etworks ABSTRACT Hardware Assisted Recursive Packet Classification Module for IPv6 etworks Shivvasangari Subramani [shivva1@umbc.edu] Department of Computer Science and Electrical Engineering University of Maryland Baltimore

More information

Implementation and Analysis of Large Receive Offload in a Virtualized System

Implementation and Analysis of Large Receive Offload in a Virtualized System Implementation and Analysis of Large Receive Offload in a Virtualized System Takayuki Hatori and Hitoshi Oi The University of Aizu, Aizu Wakamatsu, JAPAN {s1110173,hitoshi}@u-aizu.ac.jp Abstract System

More information

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap,

Frank Miller, George Apostolopoulos, and Satish Tripathi. University of Maryland. College Park, MD ffwmiller, georgeap, Simple Input/Output Streaming in the Operating System Frank Miller, George Apostolopoulos, and Satish Tripathi Mobile Computing and Multimedia Laboratory Department of Computer Science University of Maryland

More information

NetFPGA Hardware Architecture

NetFPGA Hardware Architecture NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block

More information

Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware

Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware White Paper Introduction to OpenOnload Building Application Transparency and Protocol Conformance into Application Acceleration Middleware Steve Pope, PhD Chief Technical Officer Solarflare Communications

More information

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking

Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Shared Address Space I/O: A Novel I/O Approach for System-on-a-Chip Networking Di-Shi Sun and Douglas M. Blough School of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA

More information

ALCATEL Edge Services Router

ALCATEL Edge Services Router ALCATEL 7420 Edge Services Router Alcatel builds next generation networks, delivering integrated end-to-end voice and data networking solutions to established and new carriers, as well as enterprises and

More information

An Intelligent NIC Design Xin Song

An Intelligent NIC Design Xin Song 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational

More information

Network Processors Outline

Network Processors Outline High-Performance Networking The University of Kansas EECS 881 James P.G. Sterbenz Department of Electrical Engineering & Computer Science Information Technology & Telecommunications Research Center The

More information

A Scalable High-Performance Active Network Node

A Scalable High-Performance Active Network Node A Scalable High-Performance Active Network Node D. Decasper, B. Plattner ETH Zurich G. Parulkar, S. Choi, J. DeHart, T. Wolf Washington U Presented by Jacky Chu Motivation Apply Active Network over gigabits

More information

ANIC Host CPU Offload Features Overview An Overview of Features and Functions Available with ANIC Adapters

ANIC Host CPU Offload Features Overview An Overview of Features and Functions Available with ANIC Adapters ANIC Host CPU Offload Features Overview An Overview of Features and Functions Available with ANIC Adapters ANIC Adapters Accolade s ANIC line of FPGA-based adapters/nics help accelerate security and networking

More information

FPGA Augmented ASICs: The Time Has Come

FPGA Augmented ASICs: The Time Has Come FPGA Augmented ASICs: The Time Has Come David Riddoch Steve Pope Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. Hardware acceleration is Niche (With the obvious exception of graphics

More information

RealMedia Streaming Performance on an IEEE b Wireless LAN

RealMedia Streaming Performance on an IEEE b Wireless LAN RealMedia Streaming Performance on an IEEE 802.11b Wireless LAN T. Huang and C. Williamson Proceedings of IASTED Wireless and Optical Communications (WOC) Conference Banff, AB, Canada, July 2002 Presented

More information

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects

1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects

More information

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+ Product Specification NIC-10G-2BF A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+ Apply Dual-port 10 Gigabit Fiber SFP+ server connections, These Server Adapters Provide Ultimate Flexibility and Scalability

More information

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract

LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used

More information

Whitepaper / Benchmark

Whitepaper / Benchmark Whitepaper / Benchmark Web applications on LAMP run up to 8X faster with Dolphin Express DOLPHIN DELIVERS UNPRECEDENTED PERFORMANCE TO THE LAMP-STACK MARKET Marianne Ronström Open Source Consultant iclaustron

More information

Evaluating the Impact of RDMA on Storage I/O over InfiniBand

Evaluating the Impact of RDMA on Storage I/O over InfiniBand Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline

More information

Intra-MIC MPI Communication using MVAPICH2: Early Experience

Intra-MIC MPI Communication using MVAPICH2: Early Experience Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University

More information

An FPGA-Based Optical IOH Architecture for Embedded System

An FPGA-Based Optical IOH Architecture for Embedded System An FPGA-Based Optical IOH Architecture for Embedded System Saravana.S Assistant Professor, Bharath University, Chennai 600073, India Abstract Data traffic has tremendously increased and is still increasing

More information

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin,

Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, Fundamental Questions to Answer About Computer Networking, Jan 2009 Prof. Ying-Dar Lin, ydlin@cs.nctu.edu.tw Chapter 1: Introduction 1. How does Internet scale to billions of hosts? (Describe what structure

More information

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision At-A-Glance Unified Computing Realized Today, IT organizations assemble their data center environments from individual components.

More information

Data Path acceleration techniques in a NFV world

Data Path acceleration techniques in a NFV world Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

A Multimedia Streaming Server/Client Framework for DM64x

A Multimedia Streaming Server/Client Framework for DM64x SEE THEFUTURE. CREATE YOUR OWN. A Multimedia Streaming Server/Client Framework for DM64x Bhavani GK Senior Engineer Ittiam Systems Pvt Ltd bhavani.gk@ittiam.com Agenda Overview of Streaming Application

More information

1. INTRODUCTION light tree First Generation Second Generation Third Generation

1. INTRODUCTION light tree First Generation Second Generation Third Generation 1. INTRODUCTION Today, there is a general consensus that, in the near future, wide area networks (WAN)(such as, a nation wide backbone network) will be based on Wavelength Division Multiplexed (WDM) optical

More information

A Convergence Architecture for GRID Computing and Programmable Networks

A Convergence Architecture for GRID Computing and Programmable Networks A Convergence Architecture for GRID Computing and Programmable Networks Christian Bachmeir, Peter Tabery, Dimitar Marinov, Georgi Nachev, and Jörg Eberspächer Munich University of Technology, Institute

More information

On Network CoProcessors for Scalable, Predictable Media Services

On Network CoProcessors for Scalable, Predictable Media Services IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 14, NO. 7, JULY 2003 655 On Network CoProcessors for Scalable, Predictable Media Services Raj Krishnamurthy, Student Member, IEEE, Karsten Schwan,

More information

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor

Road Map. Road Map. Motivation (Cont.) Motivation. Intel IXA 2400 NP Architecture. Performance of Embedded System Application on Network Processor Performance of Embedded System Application on Network Processor 2006 Spring Directed Study Project Danhua Guo University of California, Riverside dguo@cs.ucr.edu 06-07 07-2006 Motivation NP Overview Programmability

More information

RDMA-like VirtIO Network Device for Palacios Virtual Machines

RDMA-like VirtIO Network Device for Palacios Virtual Machines RDMA-like VirtIO Network Device for Palacios Virtual Machines Kevin Pedretti UNM ID: 101511969 CS-591 Special Topics in Virtualization May 10, 2012 Abstract This project developed an RDMA-like VirtIO network

More information

Network Processors. Douglas Comer. Computer Science Department Purdue University 250 N. University Street West Lafayette, IN

Network Processors. Douglas Comer. Computer Science Department Purdue University 250 N. University Street West Lafayette, IN Network Processors Douglas Comer Computer Science Department Purdue University 250 N. University Street West Lafayette, IN 47907-2066 http://www.cs.purdue.edu/people/comer Copyright 2003. All rights reserved.

More information

NETWORK SIMULATION USING NCTUns. Ankit Verma* Shashi Singh* Meenakshi Vyas*

NETWORK SIMULATION USING NCTUns. Ankit Verma* Shashi Singh* Meenakshi Vyas* NETWORK SIMULATION USING NCTUns Ankit Verma* Shashi Singh* Meenakshi Vyas* 1. Introduction: Network simulator is software which is very helpful tool to develop, test, and diagnose any network protocol.

More information

Figure 1: Ad-Hoc routing protocols.

Figure 1: Ad-Hoc routing protocols. Performance Analysis of Routing Protocols for Wireless Ad-Hoc Networks Sukhchandan Lally and Ljiljana Trajković Simon Fraser University Vancouver, British Columbia Canada E-mail: {lally, ljilja}@sfu.ca

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

An Extensible Message-Oriented Offload Model for High-Performance Applications

An Extensible Message-Oriented Offload Model for High-Performance Applications An Extensible Message-Oriented Offload Model for High-Performance Applications Patricia Gilfeather and Arthur B. Maccabe Scalable Systems Lab Department of Computer Science University of New Mexico pfeather@cs.unm.edu,

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

A 400Gbps Multi-Core Network Processor

A 400Gbps Multi-Core Network Processor A 400Gbps Multi-Core Network Processor James Markevitch, Srinivasa Malladi Cisco Systems August 22, 2017 Legal THE INFORMATION HEREIN IS PROVIDED ON AN AS IS BASIS, WITHOUT ANY WARRANTIES OR REPRESENTATIONS,

More information

LM1000STXR4 Gigabit Ethernet Load Module

LM1000STXR4 Gigabit Ethernet Load Module Gigabit Ethernet Load Module Gigabit Ethernet Load Module Ixia's Gigabit Ethernet Load Modules offer complete Layer 2-3 network and routing/bridging protocol testing functionality in a single platform.

More information

Video capture using GigE Vision with MIL. What is GigE Vision

Video capture using GigE Vision with MIL. What is GigE Vision What is GigE Vision GigE Vision is fundamentally a standard for transmitting video from a camera (see Figure 1) or similar device over Ethernet and is primarily intended for industrial imaging applications.

More information

Reliable Stream Analysis on the Internet of Things

Reliable Stream Analysis on the Internet of Things Reliable Stream Analysis on the Internet of Things ECE6102 Course Project Team IoT Submitted April 30, 2014 1 1. Introduction Team IoT is interested in developing a distributed system that supports live

More information

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005

Towards Effective Packet Classification. J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Towards Effective Packet Classification J. Li, Y. Qi, and B. Xu Network Security Lab RIIT, Tsinghua University Dec, 2005 Outline Algorithm Study Understanding Packet Classification Worst-case Complexity

More information

CS 640: Introduction to Computer Networks

CS 640: Introduction to Computer Networks CS 640: Introduction to Computer Networks Midterm I 10/19/2006 Allotted time: 11:00AM to 12:30 PM (90 minutes) Name: Answers in bold italic font UW -ID Number: 1. There are 6 questions in this mid-term.

More information

Message Passing Architecture in Intra-Cluster Communication

Message Passing Architecture in Intra-Cluster Communication CS213 Message Passing Architecture in Intra-Cluster Communication Xiao Zhang Lamxi Bhuyan @cs.ucr.edu February 8, 2004 UC Riverside Slide 1 CS213 Outline 1 Kernel-based Message Passing

More information

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski

Operating Systems. 16. Networking. Paul Krzyzanowski. Rutgers University. Spring /6/ Paul Krzyzanowski Operating Systems 16. Networking Paul Krzyzanowski Rutgers University Spring 2015 1 Local Area Network (LAN) LAN = communications network Small area (building, set of buildings) Same, sometimes shared,

More information

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs.

Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Internetworking Multiple networks are a fact of life: Growth. Individual departments in a university buy LANs for their own machines and eventually want to interconnect with other campus LANs. Fault isolation,

More information

MA5400 IP Video Gateway. Introduction. Summary of Features

MA5400 IP Video Gateway. Introduction. Summary of Features MA5400 IP Video Gateway Introduction The MA5400 IP Video Gateway bridges the gap between the MPEG-2 and IP domains providing an innovative solution to the need to transport real-time broadcast quality

More information

Adaptive Video Acceleration. White Paper. 1 P a g e

Adaptive Video Acceleration. White Paper. 1 P a g e Adaptive Video Acceleration White Paper 1 P a g e Version 1.0 Veronique Phan Dir. Technical Sales July 16 th 2014 2 P a g e 1. Preface Giraffic is the enabler of Next Generation Internet TV broadcast technology

More information

Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks

Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks Impact of End-to-end QoS Connectivity on the Performance of Remote Wireless Local Networks Veselin Rakocevic School of Engineering and Mathematical Sciences City University London EC1V HB, UK V.Rakocevic@city.ac.uk

More information

OpenFlow Software Switch & Intel DPDK. performance analysis

OpenFlow Software Switch & Intel DPDK. performance analysis OpenFlow Software Switch & Intel DPDK performance analysis Agenda Background Intel DPDK OpenFlow 1.3 implementation sketch Prototype design and setup Results Future work, optimization ideas OF 1.3 prototype

More information

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Technology Brief Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances The world

More information

Enabling Efficient and Scalable Zero-Trust Security

Enabling Efficient and Scalable Zero-Trust Security WHITE PAPER Enabling Efficient and Scalable Zero-Trust Security FOR CLOUD DATA CENTERS WITH AGILIO SMARTNICS THE NEED FOR ZERO-TRUST SECURITY The rapid evolution of cloud-based data centers to support

More information

Networking in a Vertically Scaled World

Networking in a Vertically Scaled World Networking in a Vertically Scaled World David S. Miller Red Hat Inc. LinuxTAG, Berlin, 2008 OUTLINE NETWORK PRINCIPLES MICROPROCESSOR HISTORY IMPLICATIONS FOR NETWORKING LINUX KERNEL HORIZONTAL NETWORK

More information

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments

Implementing RapidIO. Travis Scheckel and Sandeep Kumar. Communications Infrastructure Group, Texas Instruments White Paper Implementing RapidIO Travis Scheckel and Sandeep Kumar Communications Infrastructure Group, Texas Instruments In today s telecommunications market, slow and proprietary is not the direction

More information

Linux Storage System Analysis for e.mmc With Command Queuing

Linux Storage System Analysis for e.mmc With Command Queuing Linux Storage System Analysis for e.mmc With Command Queuing Linux is a widely used embedded OS that also manages block devices such as e.mmc, UFS and SSD. Traditionally, advanced embedded systems have

More information

19: Networking. Networking Hardware. Mark Handley

19: Networking. Networking Hardware. Mark Handley 19: Networking Mark Handley Networking Hardware Lots of different hardware: Modem byte at a time, FDDI, SONET packet at a time ATM (including some DSL) 53-byte cell at a time Reality is that most networking

More information

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance STAC Summit: Panel: FPGA for trading today: December 2015 John W. Lockwood, PhD, CEO Algo-Logic Systems, Inc. JWLockwd@algo-logic.com

More information

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks

Hierarchically Aggregated Fair Queueing (HAFQ) for Per-flow Fair Bandwidth Allocation in High Speed Networks Hierarchically Aggregated Fair Queueing () for Per-flow Fair Bandwidth Allocation in High Speed Networks Ichinoshin Maki, Hideyuki Shimonishi, Tutomu Murase, Masayuki Murata, Hideo Miyahara Graduate School

More information

OPEN COMPUTE PLATFORMS POWER SOFTWARE-DRIVEN PACKET FLOW VISIBILITY, PART 2 EXECUTIVE SUMMARY. Key Takeaways

OPEN COMPUTE PLATFORMS POWER SOFTWARE-DRIVEN PACKET FLOW VISIBILITY, PART 2 EXECUTIVE SUMMARY. Key Takeaways OPEN COMPUTE PLATFORMS POWER SOFTWARE-DRIVEN PACKET FLOW VISIBILITY, PART 2 EXECUTIVE SUMMARY This is the second of two white papers that describe how the shift from monolithic, purpose-built, network

More information

Introduction to Ethernet Latency

Introduction to Ethernet Latency Introduction to Ethernet Latency An Explanation of Latency and Latency Measurement The primary difference in the various methods of latency measurement is the point in the software stack at which the latency

More information

P51: High Performance Networking

P51: High Performance Networking P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed

More information

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a

More information

OSI Layer OSI Name Units Implementation Description 7 Application Data PCs Network services such as file, print,

OSI Layer OSI Name Units Implementation Description 7 Application Data PCs Network services such as file, print, ANNEX B - Communications Protocol Overheads The OSI Model is a conceptual model that standardizes the functions of a telecommunication or computing system without regard of their underlying internal structure

More information

HPCC Random Access Benchmark Excels on Data Vortex

HPCC Random Access Benchmark Excels on Data Vortex HPCC Random Access Benchmark Excels on Data Vortex Version 1.1 * June 7 2016 Abstract The Random Access 1 benchmark, as defined by the High Performance Computing Challenge (HPCC), tests how frequently

More information

Network Processors and their memory

Network Processors and their memory Network Processors and their memory Network Processor Workshop, Madrid 2004 Nick McKeown Departments of Electrical Engineering and Computer Science, Stanford University nickm@stanford.edu http://www.stanford.edu/~nickm

More information

Remote Health Monitoring for an Embedded System

Remote Health Monitoring for an Embedded System July 20, 2012 Remote Health Monitoring for an Embedded System Authors: Puneet Gupta, Kundan Kumar, Vishnu H Prasad 1/22/2014 2 Outline Background Background & Scope Requirements Key Challenges Introduction

More information

EqualLogic Storage and Non-Stacking Switches. Sizing and Configuration

EqualLogic Storage and Non-Stacking Switches. Sizing and Configuration EqualLogic Storage and Non-Stacking Switches Sizing and Configuration THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS

More information

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper A Low Latency Solution Stack for High Frequency Trading White Paper High-Frequency Trading High-frequency trading has gained a strong foothold in financial markets, driven by several factors including

More information

TCP/IP THE TCP/IP ARCHITECTURE

TCP/IP THE TCP/IP ARCHITECTURE TCP/IP-1 The Internet Protocol (IP) enables communications across a vast and heterogeneous collection of networks that are based on different technologies. Any host computer that is connected to the Internet

More information