Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden
Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. 2017 Intel Corporation. Intel, the Intel logo, Stratix, Arria, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others.
Avg. Internet user Autonomous vehicles CONNECTED AIRPLANE Smart Factory Cloud Video Providers By 2020 1.5 GB of traffic / Day 4 TB of data / Day 5 TB of data / Day 1 PB of Data / Day 750 pb of video / Day The Coming Flood of Data Source: Amalgamation of analyst data and Intel analysis. 3
The purpose of computing is insight, not numbers Richard Hamming 4
The World of Compute must Accelerate Autonomous Driving Networking & 5G Cloud Computing 1GB/s real-time processing Sense, understand, react safely in <1 second 1,000x increase in bandwidth 100x more connected devices 1ms end-to-end round trip delay 2013: 4.4ZB 2020: 44ZB data on the planet Need for scale, throughput, compute efficiency and lowest latency 5
Dedicated Accelerators ASSP/ASIC Host CPU Flexible Accelerators FPGA Acceleration of compute means heterogenous compute Dedicated accelerators for maximum compute efficiency Flexible accelerators for maximum compute flexibility Shift towards software-defined hardware 6
FPGAs critical to heterogeneous architectures Delivering the performance of hardware with the programmability of software FLEXIBLE reprogrammable Inherently Parallel Low Latency High Performance Energy Efficient 7
FPGAs Can Help: Target Applications Data Analytics Artificial Intelligence Video Transcoding >10x Performance Improvement 40% TCO Reduction 5X Power Reduction Flexible Precision Types 2X Increase Video Performance 30% Reduction in Power Cyber Security Financial Trading Genomics >1000X Higher Operations/Watt Sub-Microsecond Latencies 2-3X GATK Application Performance 8
Solving Real-World Problems: database acceleration 5X+ FASTER REAL-TIME DATA ANALYTICS Traditional Data Warehousing Storage 2X+ 3X+ Test configuration: Supermicro* SuperServer 2028U-TR4+ with Super X10DRU-i+ Mainboard, 2X Intel Xeon E5-2695 v4 CPUs, 8X Samsung* 32GB DDR4-2400 ECC RAM. Note: This is SQL to relational database, not SQL to semi/unstructured data. TPC DS benchmark used for data warehousing tests. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Solving Real-world problems: Genomics Sequencing 50X PairHMM Algorithm Speedup 1.2X Overall pipeline Speedup Test configuration: Intel Xeon processor E5-2699 v4 at 2.20 GHz, 2 sockets, 22 cores/socket, 256 GB RAM, 2 TB Intel SSD DC P3700, Intel Arria 10 GX Development Kit compared to Intel Xeon processor E5-2699 v4 at 2.20 GHz with Advanced Vector Extensions (AVX), 2 sockets, 22 cores/socket, 256 GB RAM, 2 TB Intel SSD DC P3700. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Solving Real-World Problems: Storage NVMe over Fabric 57-72% Lower latency Test configuration:. Quanta D51B with Intel Xeon processor E5-2600 v3, Attala Systems Development Host NVMe Adapter with Intel Arria 10 FPGA, Intel SSD DC P3700 STD. compared to Quanta D51B with Intel Xeon processor E5-2600 v3, Mellanox ConnectX*-4 Lx EN RNIC, Intel SSD DC P3700 STD. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information Programmable to evaluate performance Solutions as Group you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.
Autonomous Driving Acceleration Acceleration of image and sensor processing CPU ASSP ASSP accelerates vision processing ASSP FPGA FPGA FPGA accelerates fusion, machine learning and security 12
Smart NIC and Cloud Acceleration HOST Acceleration of networking, compute and storage CPU NIC ASIC NIC ASIC accelerates networking functions NIC ASIC FPGA FPGA FPGA accelerates compute such as transcoding, encryption and custom networking functions SMART NIC Network 13
Intel Xeon + Integrated FPGA Strengths Intel Xeon socket-compatible Seamless Application Migration Xeon Processor Xeon Processor Intel Xeon + Discrete FPGA Strengths FPGA size choice High Intel Xeon-to-FPGA bandwidth FPGA coherent with Intel Xeon memory Very low Intel Xeon-to-FPGA latency PCIe FPGA UPI Accelerator Accelerator Intel Xeon + Integrated FPGA PCIe PCIe FPGA Accelerator Accelerator Intel Xeon + Discrete FPGA Provides processor-pairing choice Compatible with standard motherboard Compatible with 1RU rack server Dedicated memory (Arria 10 board) Same virtualization framework, drivers, libraries, and development tools Portfolio choice, consistent development experience and investment protection 14
https://www.altera.com/solutions/acceleration-hub/platforms.html https://www.altera.com/pac 15
Software Developers are the New FPGA Developers I don t speak FPGA! What is the programming model, and where are the compilers, libraries and tools I am used to? 16
Application Software SW Developers Software Development Kits Reference Platforms Integrated Frameworks and Tools Language Support, Compilers and Libraries/IP Ecosystem: Open Source Community Partners HW Developers Platform Development Tools 17
Software Developer Hardware Developer Software Developer Algorithm Designer IP Library Developer HDL Designer Intel SoC FPGA Embedded Design Suite (EDS) Intel FPGA SDK for OpenCL DSP Builder for Intel FPGAs Intel HLS Compiler Intel Quartus Prime Design Software 18
User Application OpenCL API Provides Kernel Abstraction Library Orchestration and Accelerator Management OpenCL Runtime Increase User Design Library Infrastructure Software Frameworks Gap: Creating full-stack accelerated Abstraction applications on FPGA can be difficult Library kernels and time consuming Increase User Base Compute Primitives OpenCL SDK Infrastructure Primitives Driver Board Support Package (BSP) 19
Fast, hot-swapping of accelerator functions Accessible from virtual machines and containers Support for leading cloud orchestrators Orchestration / Rack Level Management User Applications Frameworks & SDKs Acceleration Libraries Platform Development Tools Open Programmable Acceleration Engine (OPAE) FPGA Interface Manager (FIM) Accelerator Functions Intel Xeon Processor with FPGA Platforms 20
Discovery, acquisition, access, and management of FPGA resources Thin abstraction to provide portability across platforms variants and generations Simple C API plus optional language bindings User Applications Acceleration Libraries Frameworks & SDKs OPAE C++ API OPAE Python API OPAE Language Bindings OPAE C API libopae-c API Library FPGA Driver Interface Intel FPGA Driver Open-source (BSD / GPLv2) https://01.org/opae 21
Intel Nervana DL Studio SDKs Deep Learning Framework Intel Deep Learning Deployment Toolkit C++ Deep learning accelerator API API for data transfers and kernel synchronization Low-level device communication Topology Deployment Tool DLA run-time OpenCL run-time PCIe driver Deep-Learning Inference Library 22
Public and Private Cloud Users Workload End-User Developed Functions Intel- Developed Functions 3 rd -Party Developed Functions Examples: Machine Learning Encryption Compression Big Data Analytics NFV, vswitch Launch Workload Orchestration Software (FPGA-Enabled) Resource Pool Storage Network Compute VM AF 23
Growing the FPGA-Enabled Ecosystem IP and solutions Developer Community Universities Extensive online catalog of accelerator functions, boards, and reference designs Broad partner network Intel Cloud, Networking & Storage Builder programs AI Academy Intel Developer Zone (IDZ) Rocketboards.org Reaching over 200,000 students per year with FPGA publications, workshops and hands-on research labs Committed to Open Source vision 24
The flood of data and new user experiences drive up compute needs FPGAs are critical for acceleration in heterogenous compute platforms Programming models must evolve to enable FPGAs for the software developer Intel offers a comprehensive solutions stack including FPGA-optimized libraries, compilers, tools, frameworks and SDK integration, and an FPGA-enabled ecosystem Intel FPGA Acceleration Hub https://www.altera.com/solutions/acceleration-hub/ Open Programmable Acceleration Engine http://01.org/opae http://github.com/opae Intel Hardware Accelerator Research Program https://software.intel.com/en-us/hardwareaccelerator-research-program 25