Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance

Gateware Defined Networking (GDN) for Ultra Low Latency Trading and Compliance STAC Summit: Panel: FPGA for trading today: December 2015 John W. Lockwood, PhD, CEO Algo-Logic Systems, Inc. JWLockwd@algo-logic.com (408) 707-3746 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel http://algo-logic.com

GDN Powers Algo-Logic IP Cores, Pre-built FPGA Applications, and Systems GDN Gateware Defined Networking Accelerated Server IP CORE FPGA Gateware HDD SSD NIC+FPGA CPU Cores CPU 10G 40G 100 GE Low Latency MAC,TCP, Protocol Parsers Order Book cores Pre-Programmed apps in multiple FPGA vendor devices Pre-Programmed apps in multiple FPGA Cards Integrated Switch Solutions Integrated Server Systems Data Center Deployments to Co-Location Facility 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 2

Algo-Logic s Family of Accelerated Finance Applications Tick-to-Trade System Low Latency Library Full Order Book Low Latency TCP 76 ns MAC to Application 10GE PHY/MAC 89 ns Round-trip latency Market Data Filter Protocol Parsers All major exchanges Accelerated Server FPGA HDD SSD CPU Cores CPU 10G 40G 100 GE Algorithms in Logic: All apps run in FPGA Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 3

Algo-Logic s Tick-to-Trade System: Full Offload to FPGA System Software Your control/gui interface(s) running on your server Your Unique Design Requirements In-House integration & API customizations GDN* Design Customization Areas Top of Order- Book Info Orders, Symbols, Trigger criteria etc. Heartbeats, echo info, stats & status Execution Reports & logs * Gateware Defined Networking (GDN) FPGA Card FPGA interfaces + API customizations 10GE multicast data Algo- Logic ULL PHY+ MAC IP Customizations Algo-Logic Market Data Filter Module UDP Parser IP Customizations Algo-Logic Protocol Parsing Libraries IP Customizations Algo-Logic Full Order- Book Processing Filtered Trade events & Top-of- Book data Your Trading Logic, Algorithms, Order Criteria & triggers Risk Checks module Inject market Orders IP Customizations Algo-Logic 76-nanosecond TCP/UDP Endpoint Algo- Logic ULL PHY+ MAC Orders to Exchange(s) Execution Reports from Exchange(s) (at your co-location site) Algo-Logic Systems GDN ULL Trading Solutions are sub-microsecond 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 4 Traditional Legacy Software Trading Systems at 20 to 50 microseconds

Algo-Logic s Key Value Store (KVS) Examples: Directory Key Value Company Phone # Algo-Logic (408) 707-3740 Key/Value Store (KVS) Simplifies implementation of large-scale distributed computation algorithms Data Center Servers exchanges data over standard Ethernet Forwarding Tables Data Deduplication IP Address Interface : MAC Address 204.2.34.5 Eth6 : 02:33:29:F2:AB:CC Content Hash Storage Block ID XYZ 948830038411 Order ID Symbol, Side, Price Stock Trading ATY11217911101 AAPL, B, 126.75 Virtex Edge List Graph Search v140 v201, v206, v225 Challenges Operating System delays packets and limits throughput Per-core processing inefficient at high-speed packet processing Solutions Bypass kernel bypass with DPDK Offload of packet processing with FPGA 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 5

Implementation of KVS with Socket I/O, Kernel Bypass, and GDN in FPGA Benchmark same application Key/Value Store (KVS) Running on the same PC Intel i7-4770k CPU, 82598 NIC, and Altera Stratix V A7 FPGA With three different implementations Socket I/O, Kernel bypass with DPDK, FPGA OCSM LEGEND Data Transfer = 10g Ethernet Traditional Socket I/O Intel 10G NIC Kernel Driver Message Process Kernel Bypass with DPDK Algo-Logic software on Intel 82598 10GE NIC and Core i7-4770k CPU Receive Queue Dequeue GDN in FPGA Enqueue OCSM LEGEND Control Handoff = 10g Ethernet Intel 82598 DPDK Supported NIC Store Dequeue Message Buffer Message Process Note: Message read once into CPU Cache Response Generation OCSM 10g Ethernet Parser Modifier REQUEST GENERATOR OCSM Header Identifier RESPONSE GENERATOR OCSM Header Reconstruct Key/Value Extractor Key/Value Search Response Decoder Exact Match Search Engine (EMSE) Data Transfer = Transmit Queue Enqueue Algo-Logic gateware Algo-Logic software on Nallatech P385 with on Intel 82598 10GE NIC Altera Stratix V A7 FPGA and Core i7-4770k CPU 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 6

Measured Latency, Throughput, and Power Results FPGA PHY MAC GDN-Traffic Classifier Associative Rule-Match CAM Key Extractor Parser Flow or ACL Target Queues MACs PHYs All Datapaths Summary Latency (µseconds) Tested Throughput (CSMs/sec) Power (µjoules/csm) KVS in Software Sockets 41.54 4.0 11 KVS in DPDK KVS in FPGA Rack of Search Servers Additional KVS Servers Provision Controller UPS Power 10G 40G DPDK 6.434 16 6.6 RTL 0.467 15 0.52 All Datapaths Summary Latency (µseconds) Maximum Throughput (CSMs/sec) Power (µjoules/csm) GDN vs. Sockets 88x less 13x 21x less GDN vs. DPDK 14x less 3.2x 13x less Advance Results: http://algo-logic.com/kvs-whitepaper Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 7

Tighter Spread = Less Jitter Peercentage of Observed s Percentage of s Observed [%] KVS Latency in FPGA, DPDK, and Sockets 50.00% 45.00% 40.00% 35.00% Latency Comparison 100k packets, 1 OCSM per packet, 1k pps Altera Stratix V RTL Average: 0.467µs KVS in FPGA: Best Latency, No Jitter 0.70% 0.60% KVS in Software Worst Latency Worst Jitter Socket Implementation Latency Distribution with One OCSM/ Intel i7 Average: 41.54µs 30.00% 25.00% KVS in DPDK: Lowers Latency, Some Jitter 0.50% 0.40% 0.30% 0.20% Sockets RTL Sockets DPDK 20.00% 0.10% 15.00% 10.00% 5.00% DPDK Average: 6.29µs 0.00% 38.0 39.0 40.0 41.0 42.0 43.0 44.0 45.0 46.0 47.0 48.0 Latency Distribution [µs] Sockets Average: 41.40µs Advance Results: http://algologic.com/kvswhitepaper 0.00% 0 5.5 11 16.5 22 27.5 33 38.5 44 Latency Distribution [µs] Lowest Lower Latency = Faster Response Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 8

Key Results: Gateware Defined Networking Gateware Defined Networking (GDN) Lowers Latency 7x to 45x over optimized DPDK and traditional Linux networking software Increases Throughput 3x to 13x improvement in Throughput / Server Reduces Power 13x to 21x less Power / Server Advance Results: http://algo-logic.com/kvs-whitepaper Not STAC Benchmarks 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 9

Thank You! Algo-Logic Systems, Inc. Phone: (408) 707-3740 Web: http://algo-logic.com Email: info@algo-logic.com Corporate Headquarters 2255-D Martin Ave Santa Clara, CA 95050 2015 Algo-Logic Systems Inc., All rights reserved. STAC FPGA Panel 11