Accelerated Programmable Services. FPGA and GPU augmented infrastructure.

Size: px

Start display at page:

Download "Accelerated Programmable Services. FPGA and GPU augmented infrastructure."

Alisha Hunter
5 years ago
Views:

1 Accelerated Programmable Services FPGA and GPU augmented infrastructure

2 Here and Now Market data 10GbE feeds common moving to 40GbE then 100GbE Software feed handlers can barely handle 10GbE Risk/Derivatives/Price Generation CUDA farms offer 75% cost reduction but have poor I/O FPGA promises 10th of the power (25W device) 100GbE handled with ease Technology SWOT Strengths Faster, Less Power, Less cost Weaknesses Complex - long dev/testing cycle Litany of failures Opportunities Enhanced performance Less energy use Threats Software engineers Future State Trading Architecture FSTA Quad Chart Future State Ideas Hybrid hardware/software systems GPU/FPGA/Multicore/SoC/ARM/Phi Embedded strategies (Terra/Lua/OpenCL) epcie/ntb Synthetic fill-rate graded trading venues Analogue Trading Geodesic Trading Trade Notarisation Ultra-high accuracy time Binary market data Hatstand Capabilities and Services APS component catalogue Bespoke Engineering OpenCL CUDA FPGA (ImpulseC, VHDL, Verilog, HDL) HFT: lock free, low latency Strategy and Architecture CVA and Monte Carlo on GPU solution from Xcelerit Strong industry partnerships and alliances 2

3 Software Techniques Common in HFT Atomics Lock-free (Shavit et al/fraser et al) Disruptor (n-m queue with back pressure - Thompson et al) Sinks, Sources and Actors (Xcelerit) Work-stealing queue Asynchronous threading with user space locking JIT data path techniques Intel Intrinsics Streaming SIMD Extentions (SSE) ascii/int conversions (x3 speed up) Cache management (prefetching) Software Transactional Memory Distributed Order Management Memory and Cache Management (Agner Fog) Prefaulting (TLB/Huge Pages/mlock) builtin_expect() (L1 cache misses) False sharing (L2 cache misses) Instruction timing Assembler analytics Hand Optimisation 3

4 HFT Techniques continued Operating System Kernel bypass CPU Pinning (SCHED_FIFO etc) Customised kernels/schedulers Data plane hacking (DPDK) Networking Multicast IGMP snooping Xorp HSRP/NAT avoidance Firewall/Switch bypass Customised NIC drivers Cut-through switches NIC Card Techniques Flow steering Receiver Side Scaling Ethernet packet access (ef_vi, VMA) Hardware Techniques DDI (cache injection) Jitter reduction (platform interrupts analysis - sysjitter/ftq) 4

5 FPGA Market Data Feeds NASDAQ TotalView-ITCH 4.1 FPGA feed 9KB Jumbo Frames ~100 ITCH messages per 40 Gbps 125 million bytes per second ~1 million ITCH messages per second - 1 message per microsecond Parsing in software with 100% reliability is impossible (even at 10GbE) Minimum server jitter is 3 microseconds Add PCI transfer buffering, OS Scheduling, cache misses, TLB misses etc SSE won t help much FPGA based parsing is mandatory for FIX/ITCH messages 5

6 FPGA Trading Use Cases Session Management (timed sign-in, re-sign-in, group cancel) BGP/IGMP session management and address re-advertisement A/B Line arbitration Simulsend: Route diversity for fibre/microwave Protocol Conversion: FIX/ITCH to binary translation Common format conversions: C structure, Protocol Buffers, MessagePack, LBM, Thrift Symbol Shredding, flow steering, Market Data QOS, Temporal Queues Multicast Emission - rebroadcasting Market Map with full depth in user space memory Rules Engine: Risk checks, Kill Switch Crossing Engine - deterministic, accurate timestamp VWAP/TWAP/Volatility/Real-time Risk High-accuracy packet time stamping Flow capture (drop copy, flow notarisation) Transactional Order Manager using epcie and Non-Transparent Bridge Virtualisation: data de-duplication and versioning Throttle Management Templatised Trading, TCP offload Exchange, Network, Platform jitter collection and analytics 6

7 Hatstand Services COTS integration HFT/FPGA/GPU/Multicore Strategy and Architecture CVA, IRD and Monte Carlo on GPU solution from Xcelerit 22x+ performance improvement over grid/multicore Hybrid hardware/software development SolarFlare FDK Xcelerit SDK Impulse C OpenCL VHDL, Verilog, System Verilog Bespoke HFT software engineering Strong industry partnerships and alliances Altera, Eynx, ImpulseC, Nallatech, SolarFlare AOE 7

FPGA Augmented ASICs: The Time Has Come

FPGA Augmented ASICs: The Time Has Come David Riddoch Steve Pope Copyright 2012 Solarflare Communications, Inc. All Rights Reserved. Hardware acceleration is Niche (With the obvious exception of graphics