Implementing Ultra Low Latency Data Center Services with Programmable Logic

Size: px
Start display at page:

Download "Implementing Ultra Low Latency Data Center Services with Programmable Logic"

Transcription

1 Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. (408) D Martin Ave., Santa Clara, CA 95050

2 INCREASING SILICON DIE AREA Why the Move to Programmable Logic? There are large challenges in scaling the performance of software now. The question is: What s next? We took a bet on programmable hardware. - Doug Burger, Microsoft Driving Metrics in the Data Center Latency: Reduce delay Avoid jitter Throughput Processing packets at line rate Handle 10G, 25G, 40G, and 100G Power: Driving cost of OpEx TIME Micro 1971 Sequential Processing CPU 1985 Optimized Sequential CPU CPU 2006 Multicore GPU CPU GPU CPU CPU FPGA Field Programmable Gate Array (FPGA) logic moves into the CPU Microsoft accelerates BING search with FPGA Intel acquires Altera CPU 2013 Add Vector Processing = SILICONE DIE AREA Add Logic Processing 2015 Algo-Logic Systems Inc., All rights reserved. 2

3 Servers Accelerated with FPGA Gateware FPGA Augments Existing Servers Can run on an expansion card (same size as a GPU) Or may be integrated into the CPU socket GDN Applications run on FPGA Implements low-latency, low-power, high-throughput data processing Accelerated Server RAM RAM RAM RAM HDD SSD RAM FPGA RAM CPU Cores CPU 10G 40G 100 GE Rack Server 2015 Algo-Logic Systems Inc., All rights reserved. 3

4 Example of Low Latency Service: Key/Value Store Examples: Directory Forwarding Tables Data Deduplication Key Company Phone # IP Address Content Hash Value Algo-Logic (408) Interface : MAC Address Eth6 : 02:33:29:F2:AB:CC Storage Block ID XYZ Order ID Symbol, Side, Price Stock Trading ATY AAPL, B, Virtex Edge List Graph Search v140 v201, v206, v225 Key/Value Store (KVS) Simplifies implementation of large-scale distributed computation algorithms Data Center Servers exchanges data over standard Ethernet Challenges Operating System delays packets and limits throughput Per-core processing inefficient at high-speed packet processing Solutions Bypass kernel bypass with DPDK Offload of packet processing with FPGA 2015 Algo-Logic Systems Inc., All rights reserved. 4

5 Mobile Application Servers Need Fast Key/Value Stores Scalable backend services to share data Sensors { location, bio, movement,.. } Social { status, dating, updates, multi-player games.. } Media { video/security, audio/music,.. } Communication { network status, handoff, short messages.. } Database { users, providers, payments, travel, authentication,.. } Must be able to scale as the number of users grows Scale up to provide the best latency, throughput, and power Scale out to increase storage capacity, throughput, and redundancy Example: Mobile location sharing 2015 Algo-Logic Systems Inc., All rights reserved. 5

6 Case Study: Implementing Uber with KVS Uber in ,037 drivers in the US completed 4 or more trips New drivers doubled every 6 months for past 2 years Number of Uber Users = 8M Number of Cities = 290 Total trips = 140M Daily Trips = 1M Analysis and Assumptions Assuming 25% of the drivers are active 25% of 160k drivers = 40k active cars (<48k) Drivers update position once per second = 40k IOW Implementation Uber with on an Algo-Logic KVS card Washington Post, Dec Algo-Logic Systems Inc., All rights reserved. 6

7 Algo-Logic s KVS solution for Mobile Applications KVS on FPGA Application Server Scale UP for FASTER access to shared data 2015 Algo-Logic Systems Inc., All rights reserved. 7

8 Algo-Logic s KVS solution for Mobile Applications KVS on FPGA KVS on FPGA 2U KVS System on FPGA 2U KVS System on FPGA KVS on FPGA 2U KVS System on FPGA 2U System 2U System App Servers And SCALE-OUT quickly to increase storage capacity and throughput 2015 Algo-Logic Systems Inc., All rights reserved. 8

9 Provisioning and Measurements with GDN-Switch 2015 Algo-Logic Systems Inc., All rights reserved. 9

10 Linux Software Socket KVS OCSM 10g Ethernet Intel 10G NIC Kernel Driver Message Process LEGEND Data Transfer = Algo-Logic software on Intel GE NIC and Core i7-4770k CPU 2015 Algo-Logic Systems Inc., All rights reserved. 10

11 Algo-Logic KVS with DPDK: Bypass the Kernel Receive Queue Dequeue Enqueue OCSM LEGEND Control Handoff = 10g Ethernet Intel DPDK Supported NIC Store Dequeue Message Buffer Message Process Note: Message read once into CPU Cache Response Generation Data Transfer = Transmit Queue Enqueue Algo-Logic software on Intel GE NIC and Core i7-4770k CPU 2015 Algo-Logic Systems Inc., All rights reserved. 11

12 Algo-Logic GDN-Search: ALGO-LOGIC S KVS FPGA SOLUTION in FPGA FOR KEY/VALUE SEARCH OPTIMISED FOR LARGE NUMBER OF KEY/VALUE SEARCH ENTRIES OCSM 10g Ethernet FPGA MAC0 RX MAC0 TX Parser Reconstruct Handler REQUEST GENERATOR 32B OCSM Header Identifier RESPONSE GENERATOR 32B OCSM Header Reconstruct Key-Value Extractor Key-Value Response Decoder Key-Value Search Key: 96b, Value: 96b EMSE2 On-Chip Memory OCSM 10g Ethernet MAC1 RX MAC1 TX Parser Reconstruct Handler REQUEST GENERATOR 64B OCSM Header Identifier RESPONSE GENERATOR 64B OCSM Header Reconstruct Key-Value Extractor Key-Value Response Decoder Key-Value Search Key: 96b, Value: 352b EMSE2 Off-Chip Memory Algo-Logic gateware on Nallatech P385 with Altera Stratix V A7 FPGA 2015 Algo-Logic Systems Inc., All rights reserved. 12

13 Trends for Adding Storage around FPGAs Six banks of memory controllers QDR SRAM RLDRAM DDR3, DDR4 64 lanes of SERDES SATA disk and Flash Serial memories Hybrid Memory Cube (HMC) Mosys Bandwidth Engine (BE2, BE3) New Memories 3D Xpoint with DDR4 Interface Potential for Terabytes of Memory on each card 2015 Algo-Logic Systems Inc., All rights reserved. 13

14 Implementation of KVS with Socket I/O, DPDK, and FPGA Benchmark same application Key/Value Store (KVS) Running on the same PC Intel i7-4770k CPU, NIC, and Altera Stratix V A7 FPGA With three different implementations Socket I/O, DPDK, FPGA OCSM LEGEND Data Transfer = 10g Ethernet Socket I/O Intel 10G NIC Kernel Driver Message Process DPDK Algo-Logic software on Intel GE NIC and Core i7-4770k CPU Receive Queue Dequeue FPGA Enqueue OCSM LEGEND Control Handoff = 10g Ethernet Intel DPDK Supported NIC Store Dequeue Message Buffer Message Process Note: Message read once into CPU Cache Response Generation OCSM 10g Ethernet Parser Modifier REQUEST GENERATOR OCSM Header Identifier RESPONSE GENERATOR OCSM Header Reconstruct Key/Value Extractor Key/Value Search Response Decoder Exact Match Search Engine (EMSE) Data Transfer = Transmit Queue Enqueue Algo-Logic gateware Algo-Logic software on Nallatech P385 with on Intel GE NIC Altera Stratix V A7 FPGA and Core i7-4770k CPU 2015 Algo-Logic Systems Inc., All rights reserved. 14

15 KVS Hardware in Data Center Rack FPGA GDN-Classify Associative Rule-Match CAM PHY MAC Key Extractor Parser Flow or ACL Target Queues MAC s PHYs KVS in Software KVS in DPDK KVS in FPGA Rack of Search Servers 10G 40G Additional KVS Servers Provision Controller UPS Power 2015 Algo-Logic Systems Public 2015 Algo-Logic Systems Inc., All rights reserved. 15

16 Load Testing off the KVS Implementations GEN: Traffic Generator KVS Implementations Mellanox 10G NIC Intel G NIC Kernel Driver Process Message Traffic Generator Intel G NIC Intel G NIC Intel DPDK EAL Process Message Mellanox Nallatech Process 10G NIC P385 10G Message 2015 Algo-Logic Systems Inc., All rights reserved. 16

17 Full-Rate Processing (100% of TX Load) Stratix V FPGA with EMSE-2 Drops no s DPDK Drops s Software Drops s 2015 Algo-Logic Systems Inc., All rights reserved. 17

18 Sockets Power Consumption Profile (10M s with 40 CSM Messages/) 2015 Algo-Logic Systems Inc., All rights reserved. 18

19 DPDK Power Consumption Profile (10M s with 40 CSM Messages/) 2015 Algo-Logic Systems Inc., All rights reserved. 19

20 Latency Measurement with GDN-Classify 10G Rx T_out- T_in 40G Tx KVS Server Loopback GDN Switch 10G Tx Time Stamp 40G Rx Search Latency (Different for Software, DPDK, and GDN-Search) + Switch Latency (constant for GDN-Switch) Round trip latency of GDN Switch is deterministic (constant) Round trip latency of KVS Sever = Total Round trip Time Round trip time with 10G loopback on GDN Switch 2015 Algo-Logic Systems Inc., All rights reserved. 20

21 Tighter Spread = Less Jitter Peercentage of Observed s Percentage of s Observed [%] KVS Latency in FPGA, DPDK, and Sockets 50.00% 45.00% 40.00% 35.00% Latency Comparison 100k packets, 1 OCSM per packet, 1k pps Altera Stratix V RTL Average: 0.467µs KVS in FPGA: Best Latency, No Jitter 0.70% 0.60% KVS in Software Worst Latency Worst Jitter Socket Implementation Latency Distribution with One OCSM/ Intel i7 Average: 41.54µs 30.00% 25.00% KVS in DPDK: Lowers Latency, Some Jitter 0.50% 0.40% 0.30% 0.20% Sockets RTL Sockets DPDK 20.00% 0.10% 15.00% DPDK Average: 6.29µs 0.00% Latency Distribution [µs] 10.00% 5.00% Sockets Average: 41.40µs 0.00% Latency Distribution [µs] Lowest Lower Latency = Faster Response 2015 Algo-Logic Systems Inc., All rights reserved. 21

22 Measured Latency, Throughput, and Power Results FPGA PHY MAC GDN-Classify Associative Rule-Match CAM Key Extractor Parser Flow or ACL Target Queues MACs PHYs All Datapaths Summary Latency (µseconds) Tested Throughput (CSMs/sec) Power (µjoules/csm) KVS in Software Sockets KVS in DPDK KVS in FPGA Rack of Search Servers Additional KVS Servers Provision Controller 10G 40G DPDK RTL All Datapaths Summary Latency (µseconds) Maximum Throughput (CSMs/sec) Power (µjoules/csm) GDN vs. Sockets 88x less 13x 21x less GDN vs. DPDK 14x less 3.2x 13x less UPS Power 2015 Algo-Logic Systems Inc., All rights reserved. 22

23 Conclusions: Programmable Hardware in the Data Center Lowers Latency 88x faster than Linux networking sockets 14x faster than optimized DPDK (kernel bypass) Increases Throughput (IOPs) 3x to 13x improvement in throughput Lowers Capital Expenditures (CapEx) Reduces Power 13x to 21x reduction in power Reduces Operating Expenditures (OpEx) 2015 Algo-Logic Systems Inc., All rights reserved. 23

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance STAC Summit: Panel: FPGA for trading today: December 2015 John W. Lockwood, PhD, CEO Algo-Logic Systems, Inc. JWLockwd@algo-logic.com

More information

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, Madhu Monga Algo-Logic Systems, Inc., Santa Clara, CA 95050 JWLockwd@Algo-Logic.com, Madhu@Algo-Logic.com Abstract

More information

Hardware NVMe implementation on cache and storage systems

Hardware NVMe implementation on cache and storage systems Hardware NVMe implementation on cache and storage systems Jerome Gaysse, IP-Maker Santa Clara, CA 1 Agenda Hardware architecture NVMe for storage NVMe for cache/application accelerator NVMe for new NVM

More information

High Performance Packet Processing with FlexNIC

High Performance Packet Processing with FlexNIC High Performance Packet Processing with FlexNIC Antoine Kaufmann, Naveen Kr. Sharma Thomas Anderson, Arvind Krishnamurthy University of Washington Simon Peter The University of Texas at Austin Ethernet

More information

INT 1011 TCP Offload Engine (Full Offload)

INT 1011 TCP Offload Engine (Full Offload) INT 1011 TCP Offload Engine (Full Offload) Product brief, features and benefits summary Provides lowest Latency and highest bandwidth. Highly customizable hardware IP block. Easily portable to ASIC flow,

More information

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper

A Low Latency Solution Stack for High Frequency Trading. High-Frequency Trading. Solution. White Paper A Low Latency Solution Stack for High Frequency Trading White Paper High-Frequency Trading High-frequency trading has gained a strong foothold in financial markets, driven by several factors including

More information

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes Data Sheet Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes Fast and Flexible Hyperconverged Systems You need systems that can adapt to match the speed of your business. Cisco HyperFlex Systems

More information

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs Jack Zhang yuan.zhang@intel.com, Cloud & Enterprise Storage Architect Santa Clara, CA 1 Agenda Memory Storage Hierarchy

More information

Accelerating Ceph with Flash and High Speed Networks

Accelerating Ceph with Flash and High Speed Networks Accelerating Ceph with Flash and High Speed Networks Dror Goldenberg VP Software Architecture Santa Clara, CA 1 The New Open Cloud Era Compute Software Defined Network Object, Block Software Defined Storage

More information

Experience with the NetFPGA Program

Experience with the NetFPGA Program Experience with the NetFPGA Program John W. Lockwood Algo-Logic Systems Algo-Logic.com With input from the Stanford University NetFPGA Group & Xilinx XUP Program Sunday, February 21, 2010 FPGA-2010 Pre-Conference

More information

Netronome NFP: Theory of Operation

Netronome NFP: Theory of Operation WHITE PAPER Netronome NFP: Theory of Operation TO ACHIEVE PERFORMANCE GOALS, A MULTI-CORE PROCESSOR NEEDS AN EFFICIENT DATA MOVEMENT ARCHITECTURE. CONTENTS 1. INTRODUCTION...1 2. ARCHITECTURE OVERVIEW...2

More information

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1 How to Network Flash Storage Efficiently at Hyperscale Manoj Wadekar Michael Kagan Flash Memory Summit 2017 Santa Clara, CA 1 ebay Hyper scale Infrastructure Search Front-End & Product Hadoop Object Store

More information

Agilio CX 2x40GbE with OVS-TC

Agilio CX 2x40GbE with OVS-TC PERFORMANCE REPORT Agilio CX 2x4GbE with OVS-TC OVS-TC WITH AN AGILIO CX SMARTNIC CAN IMPROVE A SIMPLE L2 FORWARDING USE CASE AT LEAST 2X. WHEN SCALED TO REAL LIFE USE CASES WITH COMPLEX RULES TUNNELING

More information

INT G bit TCP Offload Engine SOC

INT G bit TCP Offload Engine SOC INT 10011 10 G bit TCP Offload Engine SOC Product brief, features and benefits summary: Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx/Altera FPGAs or Structured ASIC flow.

More information

InfiniBand Networked Flash Storage

InfiniBand Networked Flash Storage InfiniBand Networked Flash Storage Superior Performance, Efficiency and Scalability Motti Beck Director Enterprise Market Development, Mellanox Technologies Flash Memory Summit 2016 Santa Clara, CA 1 17PB

More information

Extremely Fast Distributed Storage for Cloud Service Providers

Extremely Fast Distributed Storage for Cloud Service Providers Solution brief Intel Storage Builders StorPool Storage Intel SSD DC S3510 Series Intel Xeon Processor E3 and E5 Families Intel Ethernet Converged Network Adapter X710 Family Extremely Fast Distributed

More information

Pactron FPGA Accelerated Computing Solutions

Pactron FPGA Accelerated Computing Solutions Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market

More information

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud

Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Catapult: A Reconfigurable Fabric for Petaflop Computing in the Cloud Doug Burger Director, Hardware, Devices, & Experiences MSR NExT November 15, 2015 The Cloud is a Growing Disruptor for HPC Moore s

More information

INT-1010 TCP Offload Engine

INT-1010 TCP Offload Engine INT-1010 TCP Offload Engine Product brief, features and benefits summary Highly customizable hardware IP block. Easily portable to ASIC flow, Xilinx or Altera FPGAs INT-1010 is highly flexible that is

More information

An Intelligent NIC Design Xin Song

An Intelligent NIC Design Xin Song 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) An Intelligent NIC Design Xin Song School of Electronic and Information Engineering Tianjin Vocational

More information

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research Fast packet processing in the cloud Dániel Géhberger Ericsson Research Outline Motivation Service chains Hardware related topics, acceleration Virtualization basics Software performance and acceleration

More information

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet

Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Reducing CPU and network overhead for small I/O requests in network storage protocols over raw Ethernet Pilar González-Férez and Angelos Bilas 31 th International Conference on Massive Storage Systems

More information

CS 856 Latency in Communication Systems

CS 856 Latency in Communication Systems CS 856 Latency in Communication Systems Winter 2010 Latency Challenges CS 856, Winter 2010, Latency Challenges 1 Overview Sources of Latency low-level mechanisms services Application Requirements Latency

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

Getting Real Performance from a Virtualized CCAP

Getting Real Performance from a Virtualized CCAP Getting Real Performance from a Virtualized CCAP A Technical Paper prepared for SCTE/ISBE by Mark Szczesniak Software Architect Casa Systems, Inc. 100 Old River Road Andover, MA, 01810 978-688-6706 mark.szczesniak@casa-systems.com

More information

10G bit UDP Offload Engine (UOE) MAC+ PCIe SOC IP

10G bit UDP Offload Engine (UOE) MAC+ PCIe SOC IP Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 10G bit UDP Offload Engine (UOE) MAC+ PCIe INT 15012 (Ultra-Low Latency SXUOE+MAC+PCIe+Host_I/F)

More information

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li

Be Fast, Cheap and in Control with SwitchKV. Xiaozhou Li Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level

More information

Higher Level Programming Abstractions for FPGAs using OpenCL

Higher Level Programming Abstractions for FPGAs using OpenCL Higher Level Programming Abstractions for FPGAs using OpenCL Desh Singh Supervising Principal Engineer Altera Corporation Toronto Technology Center ! Technology scaling favors programmability CPUs."#/0$*12'$-*

More information

A New Era of Hardware Microservices in the Cloud. Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017

A New Era of Hardware Microservices in the Cloud. Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017 A New Era of Hardware Microservices in the Cloud Doug Burger Distinguished Engineer, Microsoft UW Cloud Workshop March 31, 2017 Moore s Law Dennard Scaling has been dead for a decade Moore s La is o er

More information

Persistent Memory. High Speed and Low Latency. White Paper M-WP006

Persistent Memory. High Speed and Low Latency. White Paper M-WP006 Persistent Memory High Speed and Low Latency White Paper M-WP6 Corporate Headquarters: 3987 Eureka Dr., Newark, CA 9456, USA Tel: (51) 623-1231 Fax: (51) 623-1434 E-mail: info@smartm.com Customer Service:

More information

NVMe : Redefining the Hardware/Software Architecture

NVMe : Redefining the Hardware/Software Architecture NVMe : Redefining the Hardware/Software Architecture Jérôme Gaysse, IP-Maker Santa Clara, CA 1 NVMe Protocol How to implement the NVMe protocol? SW, HW/SW or HW? 2- NVMe command ready CPU 1-Host driver

More information

Efficient Data Movement in Modern SoC Designs Why It Matters

Efficient Data Movement in Modern SoC Designs Why It Matters WHITE PAPER Efficient Data Movement in Modern SoC Designs Why It Matters COPROCESSORS OFFLOAD AND ACCELERATE SPECIFIC WORKLOADS, HOWEVER DATA MOVEMENT EFFICIENCY ACROSS THE PROCESSING CORES AND MEMORY

More information

ARISTA: Improving Application Performance While Reducing Complexity

ARISTA: Improving Application Performance While Reducing Complexity ARISTA: Improving Application Performance While Reducing Complexity October 2008 1.0 Problem Statement #1... 1 1.1 Problem Statement #2... 1 1.2 Previous Options: More Servers and I/O Adapters... 1 1.3

More information

6.9. Communicating to the Outside World: Cluster Networking

6.9. Communicating to the Outside World: Cluster Networking 6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers

The Missing Piece of Virtualization. I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers The Missing Piece of Virtualization I/O Virtualization on 10 Gb Ethernet For Virtualized Data Centers Agenda 10 GbE Adapters Built for Virtualization I/O Throughput: Virtual & Non-Virtual Servers Case

More information

Cisco HyperFlex HX220c Edge M5

Cisco HyperFlex HX220c Edge M5 Data Sheet Cisco HyperFlex HX220c Edge M5 Hyperconvergence engineered on the fifth-generation Cisco UCS platform Rich digital experiences need always-on, local, high-performance computing that is close

More information

Programmable Server Adapters: Key Ingredients for Success

Programmable Server Adapters: Key Ingredients for Success WHITE PAPER Programmable Server Adapters: Key Ingredients for Success IN THIS PAPER, WE DIS- CUSS ARCHITECTURE AND PRODUCT REQUIREMENTS RELATED TO PROGRAM- MABLE SERVER ADAPTERS FORHOST-BASED SDN, AS WELL

More information

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI Mellanox Technologies Inc. Motti Beck, Director Marketing Motti@mellanox.com Topics Introduction to Mellanox Technologies Inc. Why Cloud SLA

More information

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes

Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes Data Sheet Cisco HyperFlex HX220c M4 and HX220c M4 All Flash Nodes Fast and Flexible Hyperconverged Systems You need systems that can adapt to match the speed of your business. Cisco HyperFlex Systems

More information

NetFPGA Hardware Architecture

NetFPGA Hardware Architecture NetFPGA Hardware Architecture Jeffrey Shafer Some slides adapted from Stanford NetFPGA tutorials NetFPGA http://netfpga.org 2 NetFPGA Components Virtex-II Pro 5 FPGA 53,136 logic cells 4,176 Kbit block

More information

Cisco HyperFlex HX220c M4 Node

Cisco HyperFlex HX220c M4 Node Data Sheet Cisco HyperFlex HX220c M4 Node A New Generation of Hyperconverged Systems To keep pace with the market, you need systems that support rapid, agile development processes. Cisco HyperFlex Systems

More information

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim

PVPP: A Programmable Vector Packet Processor. Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim PVPP: A Programmable Vector Packet Processor Sean Choi, Xiang Long, Muhammad Shahbaz, Skip Booth, Andy Keep, John Marshall, Changhoon Kim Fixed Set of Protocols Fixed-Function Switch Chip TCP IPv4 IPv6

More information

Improving Ceph Performance while Reducing Costs

Improving Ceph Performance while Reducing Costs Improving Ceph Performance while Reducing Costs Applications and Ecosystem Solutions Development Rick Stehno Santa Clara, CA 1 Flash Application Acceleration Three ways to accelerate application performance

More information

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades Evaluation report prepared under contract with Dot Hill August 2015 Executive Summary Solid state

More information

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric Flash Memory Summit 2018 Santa Clara, CA FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric Paper Abstract: The move from direct-attach to Composable Infrastructure is being

More information

Using FPGAs to accelerate NVMe-oF based Storage Networks

Using FPGAs to accelerate NVMe-oF based Storage Networks Using FPGAs to accelerate NVMe-oF based Storage Networks Deboleena Sakalley IP & Solutions Architect, Xilinx Santa Clara, CA 1 Agenda NVMe-oF Offload in FPGA NVMe-oF Integrated Solution Solution Architecture

More information

High Performance Memory Opportunities in 2.5D Network Flow Processors

High Performance Memory Opportunities in 2.5D Network Flow Processors High Performance Memory Opportunities in 2.5D Network Flow Processors Jay Seaton, VP Silicon Operations, Netronome Larry Zu, PhD, President, Sarcina Technology LLC August 6, 2013 2013 Netronome 1 Netronome

More information

Next Generation Architecture for NVM Express SSD

Next Generation Architecture for NVM Express SSD Next Generation Architecture for NVM Express SSD Dan Mahoney CEO Fastor Systems Copyright 2014, PCI-SIG, All Rights Reserved 1 NVMExpress Key Characteristics Highest performance, lowest latency SSD interface

More information

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system

Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system Fusion Engine Next generation storage engine for Flash- SSD and 3D XPoint storage system Fei Liu, Sheng Qiu, Jianjian Huo, Shu Li Alibaba Group Santa Clara, CA 1 Software overhead become critical Legacy

More information

Accelerated Programmable Services. FPGA and GPU augmented infrastructure.

Accelerated Programmable Services. FPGA and GPU augmented infrastructure. Accelerated Programmable Services FPGA and GPU augmented infrastructure graeme.burnett@hatstand.com Here and Now Market data 10GbE feeds common moving to 40GbE then 100GbE Software feed handlers can barely

More information

Learning with Purpose

Learning with Purpose Network Measurement for 100Gbps Links Using Multicore Processors Xiaoban Wu, Dr. Peilong Li, Dr. Yongyi Ran, Prof. Yan Luo Department of Electrical and Computer Engineering University of Massachusetts

More information

Virtual Switch Acceleration with OVS-TC

Virtual Switch Acceleration with OVS-TC WHITE PAPER Virtual Switch Acceleration with OVS-TC HARDWARE ACCELERATED OVS-TC PROVIDES BETTER CPU EFFICIENCY, LOWER COMPLEXITY, ENHANCED SCALABILITY AND INCREASED NETWORK PERFORMANCE COMPARED TO KERNEL-

More information

HADP Talk BlueDBM: An appliance for Big Data Analytics

HADP Talk BlueDBM: An appliance for Big Data Analytics HADP Talk BlueDBM: An appliance for Big Data Analytics Sang-Woo Jun* Ming Liu* Sungjin Lee* Jamey Hicks+ John Ankcorn+ Myron King+ Shuotao Xu* Arvind* *MIT Computer Science and Artificial Intelligence

More information

Flash vs. Disk Storage: Testing Workloads is Key

Flash vs. Disk Storage: Testing Workloads is Key Flash vs. Disk Storage: Testing Workloads is Key Len Rosenthal VP of Marketing Flash Memory Summit 2013 Santa Clara, CA 1 Overview The leader in Storage Performance Validation. Our Mission: To provide

More information

Ruler: High-Speed Packet Matching and Rewriting on Network Processors

Ruler: High-Speed Packet Matching and Rewriting on Network Processors Ruler: High-Speed Packet Matching and Rewriting on Network Processors Tomáš Hrubý Kees van Reeuwijk Herbert Bos Vrije Universiteit, Amsterdam World45 Ltd. ANCS 2007 Tomáš Hrubý (VU Amsterdam, World45)

More information

Product Overview. Programmable Network Cards Network Appliances FPGA IP Cores

Product Overview. Programmable Network Cards Network Appliances FPGA IP Cores 2018 Product Overview Programmable Network Cards Network Appliances FPGA IP Cores PCI Express Cards PMC/XMC Cards The V1151/V1152 The V5051/V5052 High Density XMC Network Solutions Powerful PCIe Network

More information

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES

PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES PUSHING THE LIMITS, A PERSPECTIVE ON ROUTER ARCHITECTURE CHALLENGES Greg Hankins APRICOT 2012 2012 Brocade Communications Systems, Inc. 2012/02/28 Lookup Capacity and Forwarding

More information

Aerospike Scales with Google Cloud Platform

Aerospike Scales with Google Cloud Platform Aerospike Scales with Google Cloud Platform PERFORMANCE TEST SHOW AEROSPIKE SCALES ON GOOGLE CLOUD Aerospike is an In-Memory NoSQL database and a fast Key Value Store commonly used for caching and by real-time

More information

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage Accelerating Real-Time Big Data Breaking the limitations of captive NVMe storage 18M IOPs in 2u Agenda Everything related to storage is changing! The 3rd Platform NVM Express architected for solid state

More information

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD

Maximum Performance. How to get it and how to avoid pitfalls. Christoph Lameter, PhD Maximum Performance How to get it and how to avoid pitfalls Christoph Lameter, PhD cl@linux.com Performance Just push a button? Systems are optimized by default for good general performance in all areas.

More information

Flash Controller Solutions in Programmable Technology

Flash Controller Solutions in Programmable Technology Flash Controller Solutions in Programmable Technology David McIntyre Senior Business Unit Manager Computer and Storage Business Unit Altera Corp. dmcintyr@altera.com Flash Memory Summit 2012 Santa Clara,

More information

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance

Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and Data Center Infrastructure Performance WHITE PAPER Netronome 25GbE SmartNICs with Open vswitch Hardware Offload Drive Unmatched Cloud and NETRONOME AGILIO CX 25GBE SMARTNICS SIGNIFICANTLY OUTPERFORM MELLANOX CONNECTX-5 25GBE NICS UNDER HIGH-STRESS

More information

P51: High Performance Networking

P51: High Performance Networking P51: High Performance Networking Lecture 6: Programmable network devices Dr Noa Zilberman noa.zilberman@cl.cam.ac.uk Lent 2017/18 High Throughput Interfaces Performance Limitations So far we discussed

More information

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark Testing validation report prepared under contract with Dell Introduction As innovation drives

More information

Toward Seamless Integration of RAID and Flash SSD

Toward Seamless Integration of RAID and Flash SSD Toward Seamless Integration of RAID and Flash SSD Sang-Won Lee Sungkyunkwan Univ., Korea (Joint-Work with Sungup Moon, Bongki Moon, Narinet, and Indilinx) Santa Clara, CA 1 Table of Contents Introduction

More information

Introduction to Infiniband

Introduction to Infiniband Introduction to Infiniband FRNOG 22, April 4 th 2014 Yael Shenhav, Sr. Director of EMEA, APAC FAE, Application Engineering The InfiniBand Architecture Industry standard defined by the InfiniBand Trade

More information

Solid Access Technologies, LLC

Solid Access Technologies, LLC Newburyport, MA, USA USSD 200 USSD 200 The I/O Bandwidth Company Solid Access Technologies, LLC Solid Access Technologies, LLC Why Are We Here? The Storage Perfect Storm Traditional I/O Bottleneck Reduction

More information

NVMe Over Fabrics (NVMe-oF)

NVMe Over Fabrics (NVMe-oF) NVMe Over Fabrics (NVMe-oF) High Performance Flash Moves to Ethernet Rob Davis Vice President Storage Technology, Mellanox Santa Clara, CA 1 Access Time Access in Time Micro (micro-sec) Seconds Why NVMe

More information

PacketShader: A GPU-Accelerated Software Router

PacketShader: A GPU-Accelerated Software Router PacketShader: A GPU-Accelerated Software Router Sangjin Han In collaboration with: Keon Jang, KyoungSoo Park, Sue Moon Advanced Networking Lab, CS, KAIST Networked and Distributed Computing Systems Lab,

More information

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about

More information

FlexNIC: Rethinking Network DMA

FlexNIC: Rethinking Network DMA FlexNIC: Rethinking Network DMA Antoine Kaufmann Simon Peter Tom Anderson Arvind Krishnamurthy University of Washington HotOS 2015 Networks: Fast and Growing Faster 1 T 400 GbE Ethernet Bandwidth [bits/s]

More information

FPGA Implementation of Erasure Codes in NVMe based JBOFs

FPGA Implementation of Erasure Codes in NVMe based JBOFs FPGA Implementation of Erasure Codes in NVMe based JBOFs Manoj Roge Director, Data Center Xilinx Inc. Santa Clara, CA 1 Acknowledgement Shre Shah Data Center Architect, Xilinx Santa Clara, CA 2 Agenda

More information

Storage Protocol Offload for Virtualized Environments Session 301-F

Storage Protocol Offload for Virtualized Environments Session 301-F Storage Protocol Offload for Virtualized Environments Session 301-F Dennis Martin, President August 2016 1 Agenda About Demartek Offloads I/O Virtualization Concepts RDMA Concepts Overlay Networks and

More information

Enyx soft-hardware design services and development framework for FPGA & SoC

Enyx soft-hardware design services and development framework for FPGA & SoC soft-hardware design services and development framework for FPGA & SoC Smart NIC Smart Switch Your custom hardware hardware acceleration experts 3rd party IP Cores AXI ARM DMA CPU Your own soft-hardware

More information

N V M e o v e r F a b r i c s -

N V M e o v e r F a b r i c s - N V M e o v e r F a b r i c s - H i g h p e r f o r m a n c e S S D s n e t w o r k e d f o r c o m p o s a b l e i n f r a s t r u c t u r e Rob Davis, VP Storage Technology, Mellanox OCP Evolution Server

More information

Data Path acceleration techniques in a NFV world

Data Path acceleration techniques in a NFV world Data Path acceleration techniques in a NFV world Mohanraj Venkatachalam, Purnendu Ghosh Abstract NFV is a revolutionary approach offering greater flexibility and scalability in the deployment of virtual

More information

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications

NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan

More information

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129

An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129 July 14, 2009 An Introduction to the QorIQ Data Path Acceleration Architecture (DPAA) AN129 David Lapp Senior System Architect What is the Datapath Acceleration Architecture (DPAA)? The QorIQ DPAA is a

More information

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011

The Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011 The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities

More information

Development of Communication LSI for 10G-EPON

Development of Communication LSI for 10G-EPON INFORMATION & COMMUNICATIONS Development of Communication LSI for Fumio DAIDO*, Akinobu YOSHIMURA, Shinichi KOUYAMA and Shingo NISHIOKA GE-PON (Gigabit Ethernet Passive Optical Network) systems have been

More information

Today s Data Centers. How can we improve efficiencies?

Today s Data Centers. How can we improve efficiencies? Today s Data Centers O(100K) servers/data center Tens of MegaWatts, difficult to power and cool Very noisy Security taken very seriously Incrementally upgraded 3 year server depreciation, upgraded quarterly

More information

Total Cost of Ownership Analysis for a Wireless Access Gateway

Total Cost of Ownership Analysis for a Wireless Access Gateway white paper Communications Service Providers TCO Analysis Total Cost of Ownership Analysis for a Wireless Access Gateway An analysis of the total cost of ownership of a wireless access gateway running

More information

Evaluation of the Chelsio T580-CR iscsi Offload adapter

Evaluation of the Chelsio T580-CR iscsi Offload adapter October 2016 Evaluation of the Chelsio T580-CR iscsi iscsi Offload makes a difference Executive Summary As application processing demands increase and the amount of data continues to grow, getting this

More information

Cloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera

Cloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera Cloud Acceleration with FPGA s Mike Strickland, Director, Computer & Storage BU, Altera Agenda Mission Alignment & Data Center Trends OpenCL and Algorithm Acceleration Networking Acceleration Data Access

More information

LegUp: Accelerating Memcached on Cloud FPGAs

LegUp: Accelerating Memcached on Cloud FPGAs 0 LegUp: Accelerating Memcached on Cloud FPGAs Xilinx Developer Forum December 10, 2018 Andrew Canis & Ruolong Lian LegUp Computing Inc. 1 COMPUTE IS BECOMING SPECIALIZED 1 GPU Nvidia graphics cards are

More information

Industry Collaboration and Innovation

Industry Collaboration and Innovation Industry Collaboration and Innovation Open Coherent Accelerator Processor Interface OpenCAPI TM - A New Standard for High Performance Memory, Acceleration and Networks Jeff Stuecheli April 10, 2017 What

More information

Network Design Considerations for Grid Computing

Network Design Considerations for Grid Computing Network Design Considerations for Grid Computing Engineering Systems How Bandwidth, Latency, and Packet Size Impact Grid Job Performance by Erik Burrows, Engineering Systems Analyst, Principal, Broadcom

More information

Hardware Evolution in Data Centers

Hardware Evolution in Data Centers Hardware Evolution in Data Centers 2004 2008 2011 2000 2013 2014 Trend towards customization Increase work done per dollar (CapEx + OpEx) Paolo Costa Rethinking the Network Stack for Rack-scale Computers

More information

OpenStack Networking: Where to Next?

OpenStack Networking: Where to Next? WHITE PAPER OpenStack Networking: Where to Next? WHAT IS STRIKING IS THE PERVASIVE USE OF OPEN VSWITCH (OVS), AND AMONG NEUTRON FEATURES, THE STRONG INTEREST IN SOFTWARE- BASED NETWORKING ON THE SERVER,

More information

FPGAs and Networking

FPGAs and Networking FPGAs and Networking Marc Kelly & Richard Hughes-Jones University of Manchester 12th July 27 1 Overview of Work Looking into the usage of FPGA's to directly connect to Ethernet for DAQ readout purposes.

More information

1G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core

1G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 1G bit TCP+UDP Offload Engine MAC + Host_IF (Same PHY Port) INT 2511 (Ultra-Low

More information

10 G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core

10 G Bit TCP+UDP Offload Engine (TOE+UOE) Hardware IP Core Intilop Corporation 4800 Great America Pkwy Ste-231 Santa Clara, CA 95054 Ph: 408-496-0333 Fax:408-496-0444 www.intilop.com 10G bit TCP+UDP Offload Engine MAC + PCIe + Host_IF (Same PHY Port) INT 25012

More information

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom HPE SimpliVity The new powerhouse in hyperconvergence Boštjan Dolinar HPE Maribor Lancom 2.2.2018 Changing requirements drive the need for Hybrid IT Application explosion Hybrid growth 2014 5,500 2015

More information

Speeding up Linux TCP/IP with a Fast Packet I/O Framework

Speeding up Linux TCP/IP with a Fast Packet I/O Framework Speeding up Linux TCP/IP with a Fast Packet I/O Framework Michio Honda Advanced Technology Group, NetApp michio@netapp.com With acknowledge to Kenichi Yasukata, Douglas Santry and Lars Eggert 1 Motivation

More information

CS 261 Fall Mike Lam, Professor. Memory

CS 261 Fall Mike Lam, Professor. Memory CS 261 Fall 2016 Mike Lam, Professor Memory Topics Memory hierarchy overview Storage technologies SRAM DRAM PROM / flash Disk storage Tape and network storage I/O architecture Storage trends Latency comparisons

More information

MIPI : Advanced Driver Assistance System

MIPI : Advanced Driver Assistance System MIPI : Advanced Driver Assistance System application and system development Richard Sproul Charles Qi - Gabriele Zarri (Cadence) esame Conference Sophia Antipolis 05 October 2015 ADAS : some history FORD

More information

Designing Next Generation FS for NVMe and NVMe-oF

Designing Next Generation FS for NVMe and NVMe-oF Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO @liranzvibel Santa Clara, CA 1 Designing Next Generation FS for NVMe and NVMe-oF Liran Zvibel CTO, Co-founder Weka.IO

More information

Five Key Steps to High-Speed NAND Flash Performance and Reliability

Five Key Steps to High-Speed NAND Flash Performance and Reliability Five Key Steps to High-Speed Flash Performance and Reliability Presenter Bob Pierce Flash Memory Summit 2010 Santa Clara, CA 1 NVM Performance Trend ONFi 2 PCM Toggle ONFi 2 DDR SLC Toggle Performance

More information

Flash Trends: Challenges and Future

Flash Trends: Challenges and Future Flash Trends: Challenges and Future John D. Davis work done at Microsoft Researcher- Silicon Valley in collaboration with Laura Caulfield*, Steve Swanson*, UCSD* 1 My Research Areas of Interest Flash characteristics

More information