Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

Size: px

Start display at page:

Download "Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance"

Harvey Lawrence
5 years ago
Views:

1 Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance STAC Summit: Panel: FPGA for trading today: December 2015 John W. Lockwood, PhD, CEO Algo-Logic Systems, Inc. (408) Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel

2 GDN Powers Algo-Logic IP Cores, Pre-built FPGA Applications, and Systems GDN Gateware Defined Networking Accelerated Server IP CORE FPGA Gateware HDD SSD NIC+FPGA CPU Cores CPU 10G 40G 100 GE Low Latency MAC,TCP, Protocol Parsers Order Book cores Pre-Programmed apps in multiple FPGA vendor devices Pre-Programmed apps in multiple FPGA Cards Integrated Switch Solutions Integrated Server Systems Data Center Deployments to Co-Location Facility 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 2

Algo-Logic s Family of Accelerated Finance Applications Tick-to-Trade System Low Latency Library Full Order Book Low Latency TCP 76 ns MAC to Application 10GE PHY/MAC 89 ns Round-trip latency Market

3 Algo-Logic s Family of Accelerated Finance Applications Tick-to-Trade System Low Latency Library Full Order Book Low Latency TCP 76 ns MAC to Application 10GE PHY/MAC 89 ns Round-trip latency Market Data Filter Protocol Parsers All major exchanges Accelerated Server FPGA HDD SSD CPU Cores CPU 10G 40G 100 GE Algorithms in Logic: All apps run in FPGA Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 3

4 Algo-Logic s Tick-to-Trade System: Full Offload to FPGA System Software Your control/gui interface(s) running on your server Your Unique Design Requirements In-House integration & API customizations GDN* Design Customization Areas Top of Order- Book Info Orders, Symbols, Trigger criteria etc. Heartbeats, echo info, stats & status Execution Reports & logs * Gateware Defined Networking (GDN) FPGA Card FPGA interfaces + API customizations 10GE multicast data Algo- Logic ULL PHY+ MAC IP Customizations Algo-Logic Market Data Filter Module UDP Parser IP Customizations Algo-Logic Protocol Parsing Libraries IP Customizations Algo-Logic Full Order- Book Processing Filtered Trade events & Top-of- Book data Your Trading Logic, Algorithms, Order Criteria & triggers Risk Checks module Inject market Orders IP Customizations Algo-Logic 76-nanosecond TCP/UDP Endpoint Algo- Logic ULL PHY+ MAC Orders to Exchange(s) Execution Reports from Exchange(s) (at your co-location site) Algo-Logic Systems GDN ULL Trading Solutions are sub-microsecond 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 4 Traditional Legacy Software Trading Systems at 20 to 50 microseconds

5 Algo-Logic s Key Value Store (KVS) Examples: Directory Key Value Company Phone # Algo-Logic (408) Key/Value Store (KVS) Simplifies implementation of large-scale distributed computation algorithms Data Center Servers exchanges data over standard Ethernet Forwarding Tables Data Deduplication IP Address Interface : MAC Address Eth6 : 02:33:29:F2:AB:CC Content Hash Storage Block ID XYZ Order ID Symbol, Side, Price Stock Trading ATY AAPL, B, Virtex Edge List Graph Search v140 v201, v206, v225 Challenges Operating System delays packets and limits throughput Per-core processing inefficient at high-speed packet processing Solutions Bypass kernel bypass with DPDK Offload of packet processing with FPGA 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 5

Implementation of KVS with Socket I/O, Kernel Bypass, and GDN in FPGA Benchmark same application Key/Value Store (KVS) Running on the same PC Intel i7-4770k CPU, 82598 NIC, and Altera Stratix V A7

6 Implementation of KVS with Socket I/O, Kernel Bypass, and GDN in FPGA Benchmark same application Key/Value Store (KVS) Running on the same PC Intel i7-4770k CPU, NIC, and Altera Stratix V A7 FPGA With three different implementations Socket I/O, Kernel bypass with DPDK, FPGA OCSM LEGEND Data Transfer = 10g Ethernet Traditional Socket I/O Intel 10G NIC Kernel Driver Message Process Kernel Bypass with DPDK Algo-Logic software on Intel GE NIC and Core i7-4770k CPU Receive Queue Dequeue GDN in FPGA Enqueue OCSM LEGEND Control Handoff = 10g Ethernet Intel DPDK Supported NIC Store Dequeue Message Buffer Message Process Note: Message read once into CPU Cache Response Generation OCSM 10g Ethernet Parser Modifier REQUEST GENERATOR OCSM Header Identifier RESPONSE GENERATOR OCSM Header Reconstruct Key/Value Extractor Key/Value Search Response Decoder Exact Match Search Engine (EMSE) Data Transfer = Transmit Queue Enqueue Algo-Logic gateware Algo-Logic software on Nallatech P385 with on Intel GE NIC Altera Stratix V A7 FPGA and Core i7-4770k CPU 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 6

7 Measured Latency, Throughput, and Power Results FPGA PHY MAC GDN-Traffic Classifier Associative Rule-Match CAM Key Extractor Parser Flow or ACL Target Queues MACs PHYs All Datapaths Summary Latency (µseconds) Tested Throughput (CSMs/sec) Power (µjoules/csm) KVS in Software Sockets KVS in DPDK KVS in FPGA Rack of Search Servers Additional KVS Servers Provision Controller UPS Power 10G 40G DPDK RTL All Datapaths Summary Latency (µseconds) Maximum Throughput (CSMs/sec) Power (µjoules/csm) GDN vs. Sockets 88x less 13x 21x less GDN vs. DPDK 14x less 3.2x 13x less Advance Results: Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 7

8 Tighter Spread = Less Jitter Peercentage of Observed s Percentage of s Observed [%] KVS Latency in FPGA, DPDK, and Sockets 50.00% 45.00% 40.00% 35.00% Latency Comparison 100k packets, 1 OCSM per packet, 1k pps Altera Stratix V RTL Average: 0.467µs KVS in FPGA: Best Latency, No Jitter 0.70% 0.60% KVS in Software Worst Latency Worst Jitter Socket Implementation Latency Distribution with One OCSM/ Intel i7 Average: 41.54µs 30.00% 25.00% KVS in DPDK: Lowers Latency, Some Jitter 0.50% 0.40% 0.30% 0.20% Sockets RTL Sockets DPDK 20.00% 0.10% 15.00% 10.00% 5.00% DPDK Average: 6.29µs 0.00% Latency Distribution [µs] Sockets Average: 41.40µs Advance Results: % Latency Distribution [µs] Lowest Lower Latency = Faster Response Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 8

9 Key Results: Gateware Defined Networking Gateware Defined Networking (GDN) Lowers Latency 7x to 45x over optimized DPDK and traditional Linux networking software Increases Throughput 3x to 13x improvement in Throughput / Server Reduces Power 13x to 21x less Power / Server Advance Results: Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 9

Implementing Ultra Low Latency Data Center Services with Programmable Logic

Implementing Ultra Low Latency Data Center Services with Programmable Logic John W. Lockwood, CEO: Algo-Logic Systems, Inc. http://algo-logic.com Solutions@Algo-Logic.com (408) 707-3740 2255-D Martin Ave.,