TCP offload engines for high-speed data processing

Similar documents
Introduction to TCP/IP Offload Engine (TOE)

The NE010 iwarp Adapter

Newsletter Contents: White paper: Ethernet in the Embedded Space 6/5/2013. Dear Engineer, White Paper: Ethernet in the Embedded Space


Performance of ORBs on Switched Fabric Transports

QuickSpecs. HP Z 10GbE Dual Port Module. Models

Bridging and Switching Basics

Introduction to iscsi

iscsi Technology: A Convergence of Networking and Storage

Use of the Internet SCSI (iscsi) protocol

CHAPTER 2 - NETWORK DEVICES

NVMe over Universal RDMA Fabrics

Advanced Computer Networks. End Host Optimization

Chapter 4 NETWORK HARDWARE

How to Choose the Right Bus for Your Measurement System

Benefits of full TCP/IP offload in the NFS

HP BladeSystem c-class Ethernet network adaptors

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

INT-1010 TCP Offload Engine

Single Root I/O Virtualization (SR-IOV) and iscsi Uncompromised Performance for Virtual Server Environments Leonid Grossman Exar Corporation

COSC6376 Cloud Computing Lecture 17: Storage Systems

HP BladeSystem c-class Ethernet network adapters

QuickSpecs. Models. HP NC510C PCIe 10 Gigabit Server Adapter. Overview

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Networking 101. Introduction to Ethernet networking basics; Network types, components, configurations. Routers. Switches. Servers.

Storage Protocol Offload for Virtualized Environments Session 301-F

Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch

Full file at

Use Cases for iscsi and FCoE: Where Each Makes Sense

Network protocols and. network systems INTRODUCTION CHAPTER

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

NOTE: A minimum of 1 gigabyte (1 GB) of server memory is required per each NC510F adapter. HP NC510F PCIe 10 Gigabit Server Adapter

CH : 15 LOCAL AREA NETWORK OVERVIEW

ARISTA: Improving Application Performance While Reducing Complexity

COMS Introduction to Computers. Networking

Internetworking is connecting two or more computer networks with some sort of routing device to exchange traffic back and forth, and guide traffic on

Ethernet: The High Bandwidth Low-Latency Data Center Switching Fabric

QLogicʼs 3GCNA: SAN Leadership Meets LAN Convergence

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

440GX Application Note

Optimize New Intel Xeon E based Ser vers with Emulex OneConnect and OneCommand Manager

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

Extending PCI-Express in MicroTCA Platforms. Whitepaper. Joey Maitra of Magma & Tony Romero of Performance Technologies

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

EXAM Interconnecting Cisco Networking Devices Part 1 (ICND1) v3. For More Information:

Multifunction Networking Adapters

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Deployment Guide: Network Convergence with Emulex OneConnect FCoE CNA and Windows Server Platform

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

Memory Management Strategies for Data Serving with RDMA

iscsi or iser? Asgeir Eiriksson CTO Chelsio Communications Inc

E-Seminar. Storage Networking. Internet Technology Solution Seminar

PCI Express x8 Single Port SFP+ 10 Gigabit Server Adapter (Intel 82599ES Based) Single-Port 10 Gigabit SFP+ Ethernet Server Adapters Provide Ultimate

QLogic 10GbE High-Performance Adapters for Dell PowerEdge Servers

REAL SPEED. Neterion : The Leader in 10 Gigabit Ethernet adapters

Routing Basics. What is Routing? Routing Components. Path Determination CHAPTER

Comparing Server I/O Consolidation Solutions: iscsi, InfiniBand and FCoE. Gilles Chekroun Errol Roberts

ก ก Information Technology II

Seven Criteria for a Sound Investment in WAN Optimization

Intel PRO/1000 PT and PF Quad Port Bypass Server Adapters for In-line Server Appliances

INT 1011 TCP Offload Engine (Full Offload)

USING ISCSI AND VERITAS BACKUP EXEC 9.0 FOR WINDOWS SERVERS BENEFITS AND TEST CONFIGURATION

6.9. Communicating to the Outside World: Cluster Networking

Network Devices,Frame Relay and X.25

New Ethernet Applications - Accumulated switch latency in industrial applications

Part VI. Appendixes. Appendix A OSI Model and Internet Protocols Appendix B About the CD

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Enabling Embedded Systems to access Internet Resources

Low latency and high throughput storage access

Hubs. twisted pair. hub. 5: DataLink Layer 5-1

DESIGN AND IMPLEMENTATION OF AN AVIONICS FULL DUPLEX ETHERNET (A664) DATA ACQUISITION SYSTEM

A-GEAR 10Gigabit Ethernet Server Adapter X520 2xSFP+

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

NVMe Direct. Next-Generation Offload Technology. White Paper

Introduction to Networking Devices

Sun N1: Storage Virtualization and Oracle

1. What is a Computer Network? interconnected collection of autonomous computers connected by a communication technology

COS 140: Foundations of Computer Science

Data & Computer Communication

CHAPTER 6 FPGA IMPLEMENTATION OF ARBITERS ALGORITHM FOR NETWORK-ON-CHIP

Creating High Performance Clusters for Embedded Use

CHAPTER 17 - NETWORK AMD DISTRIBUTED SYSTEMS

VN2VN: A new framework for L2 DAS Prafulla Deuskar Mark Wunderlich Intel

Performance Analysis of iscsi Middleware Optimized for Encryption Processing in a Long-Latency Environment

Future Routing Schemes in Petascale clusters

Presentation_ID. 2002, Cisco Systems, Inc. All rights reserved.

INT G bit TCP Offload Engine SOC

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Objectives. Learn how computers are connected. Become familiar with different types of transmission media

Network Superhighway CSCD 330. Network Programming Winter Lecture 13 Network Layer. Reading: Chapter 4

Module 6: INPUT - OUTPUT (I/O)

HP Cluster Interconnects: The Next 5 Years

SAN Virtuosity Fibre Channel over Ethernet

Objectives. Hexadecimal Numbering and Addressing. Ethernet / IEEE LAN Technology. Ethernet

Copyright 2006 Penton Media, Inc., All rights reserved. Printing of this document is for personal use only. Reprints

Lecture 23. Finish-up buses Storage

CN-100 Network Analyzer Product Overview

Lecture (02) Network Protocols and Standards

MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA

King Fahd University of Petroleum & Minerals Electrical Engineering Department EE 400, Experiment # 2

Transcription:

TCP offload engines for high-speed data processing TCP/IP over ethernet has become the most dominant packet processing protocol. Ethernet networks are now running at higher and higher speeds with the development of 1, 10 and 40Gbps technologies. The flip side of this is that TCP/IP processing has to be scaled to these speeds. This means a tremendous amount of logic needs to be executed which can chew up a lot of CPU time and memory depending on system architecture and bus speeds. The faster the computer and related data buses, the more seamless the information transfer. There are many ways of combining the hardware interface and intensive OSI (Open Systems Interconnect) TCP/IP software protocol. Read on to see some popular options and how they may be employed. A logical first step is to use the CPU to drive the OSI stack. In a single core computer, the logic to execute the TCP/IP stack may be sitting in an executable part of that computer. Merely sitting there will occupy some memory resources and consumes a lot of CPU resources to drive this stack. The faster the stack needs to execute, the less time is available to execute other applications in that computer. When a multi-core CPU is used, one of the simplest things to do is to move the OSI TCP/IP stack software to a core all by itself. In this way, the dedicated core will execute the communications protocol on its own, leaving the additional cores to do any other work that needs to be done at that node. Another option is to use OSI-enhanced performance architecture. If you look at the OSI architecture, you have the Physical Layer to which no changes can be made. Next there is the Data Link layer which must be present in some form to resolve station-to-station addressing. However, by streamlining the communication between the MAC Layer and the Application Layer, basically consolidating the logic normally executing at the Presentation, Session, Transport, and Networking Layers, a tremendous reduction of code execution can be achieved. This reduces the size of the stack logic, the load on the CPU, and the associated memory requirements, which means faster execution. The cost of this type of set up is sacrificing some protocols and other implementations that might be supported in layers 3-6.

The silicon stack Use of a silicon stack to offload work is another way of increasing performance. A silicon stack is basically an auxiliary CPU, the sole purpose of which is to process communications. Silicon stacks provide additional capabilities such as IPV4/IPV6, iwarp RDNA, iscsi, FCoE, TCP DDP, and full TCP offload. The elegance of the silicon stack is that the entire OSI TCP/IP stack can be implemented without impacting application logic performance. Silicon stack vs software stack performance Examining throughput per port, the software stack is limited to roughly 40MB per second whereas the silicon stack can sustain 250MBps on 1GbE and 2500MBps on 10GbE.

The host CPU overhead for the silicon stack implementation is extremely low, as the silicon stack in essence is a parallel engine to the CPU. The host CPU overhead for the software stack is comparatively high since the software stack competes for CPU resources at an increasing level as the communication speed increases. Latency is the time that it takes for the transmission to start after you ve preconfigured all of the parameters for the transmission. Here too it is obvious that the silicon stack exceeds in performance over the software stack as CPU resources are used only minimally for the silicon stack implementation. Determinism is the variation on the latency for sending and receiving transmission packets. Again the silicon stack wins due to its limited CPU resource impact. As for reliability under load, the silicon stack experiences no noticeable change in performance while the software stack will be impacted as resources are shared with any other applications. TCP offload engine How is a TCP offload engine (TOE) integrated into an embedded system? Many single-board computers have XMC sites that can be used to plug in an XMC form-factored TCP offload engine which can potentially support multiple 10Gb ethernet ports. The SBC will most likely have one or more 1Gb ethernet ports as well, and those will be driven by the software stack executing on the SBC itself. In the context of the overall system, the TCP offload engine can provide several extremely high-speed ports executing in parallel with the 1Gb ethernet port(s) on the SBC itself. The TOE card can be implemented as a switch, and information can be routed from one network to another. It could be routed from one 10Gb port though the SBC for some packet work and modification there and then shipped off across the bus. It could be used with information coming in for processing and then going out on the 1Gb ethernet port. The advantage of adding a TOE is being able to access 10Gb ethernet ports on an SBC and not impacting the processing power of the SBC.

Special FPGA packet processing Processes may require special packeting or manipulating. Using an FPGA in one of the XMC sites, this work can be offloaded so that as little of the processing as possible is done on the SBC. For instance, images captured and coming in on gigabit ethernet may need to be overlaid. This could be done right within the FPGA, then sent back to the CPU if there is any additional processing, and then back across the bus to some other location. The Acromag XMC 6280 mezzanine card available via Acal BFi are an ideal TCP/IP offload engine with four 10GbE ports.

Heightened expectations for technology today The latest trend in ethernet traffic is sensors. More and more of these are being used in almost every embedded application, a typical application being situational awareness. To obtain a better understanding what is going on in the world, more data has to be captured and analysed. More data spills over into increased storage requirements and all of the systems architecture issues that go along with it, whether this is being processed or being moved from place to place. Expectations have been amplified in terms of better results, sophisticated algorithms, and faster CPUs. Technologies are continuing to move forward and engineers concentrate on using the latest technologies. Communications technology needs to meet the requirements of today, as well making it possible to keep ahead of new specification requirements as the future may dictate.