Accelerating Data Center Workloads with FPGAs

Similar documents
Data and Intelligence in Storage Carol Wilder Intel Corporation

Future of datacenter STORAGE. Carol Wilder, Niels Reimers,

Agenda. Introduction Network functions virtualization (NFV) promise and mission cloud native approach Where do we want to go with NFV?

Intel. Rack Scale Design: A Deeper Perspective on Software Manageability for the Open Compute Project Community. Mohan J. Kumar Intel Fellow

A U G U S T 8, S A N T A C L A R A, C A

Dr. Jean-Laurent PHILIPPE, PhD EMEA HPC Technical Sales Specialist. With Dell Amsterdam, October 27, 2016

NVMe Over Fabrics: Scaling Up With The Storage Performance Development Kit

Intel s Architecture for NFV

Zhang Tianfei. Rosen Xu

THE STORAGE PERFORMANCE DEVELOPMENT KIT AND NVME-OF

WITH INTEL TECHNOLOGIES

unleashed the future Intel Xeon Scalable Processors for High Performance Computing Alexey Belogortsev Field Application Engineer

SPDK China Summit Ziye Yang. Senior Software Engineer. Network Platforms Group, Intel Corporation

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Fast forward. To your <next>

Intel SSD Data center evolution

Jim Harris. Principal Software Engineer. Data Center Group

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

Simplify Software Integration for FPGA Accelerators with OPAE

Andreas Schneider. Markus Leberecht. Senior Cloud Solution Architect, Intel Deutschland. Distribution Sales Manager, Intel Deutschland

Achieving 2.5X 1 Higher Performance for the Taboola TensorFlow* Serving Application through Targeted Software Optimization

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

April 2 nd, Bob Burroughs Director, HPC Solution Sales

Daniel Verkamp, Software Engineer

NFV Platform Service Assurance Intel Infrastructure Management Technologies

Data center: The center of possibility

Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP

Fast-track Hybrid IT Transformation with Intel Data Center Blocks for Cloud

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Ultimate Workstation Performance

OCP Engineering Workshop - Telco

Data-Centric Innovation Summit NAVEEN RAO CORPORATE VICE PRESIDENT & GENERAL MANAGER ARTIFICIAL INTELLIGENCE PRODUCTS GROUP

Real-Time Systems and Intel take industrial embedded systems to the next level

Deep learning prevalence. first neuroscience department. Spiking Neuron Operant conditioning First 1 Billion transistor processor

Welcome. Altera Technology Roadshow 2013

FAST FORWARD TO YOUR <NEXT> CREATION

Changpeng Liu. Cloud Storage Software Engineer. Intel Data Center Group

Takashi Shono, Ph.D. Intel 5G Tokyo Bay Summit 2017

Out-of-band (OOB) Management of Storage Software through Baseboard Management Controller Piotr Wysocki, Kapil Karkra Intel

Data-Centric Innovation Summit ALPER ILKBAHAR VICE PRESIDENT & GENERAL MANAGER MEMORY & STORAGE SOLUTIONS, DATA CENTER GROUP

Intel Architecture 2S Server Tioga Pass Performance and Power Optimization

Jim Harris. Principal Software Engineer. Intel Data Center Group

Create a Flexible, Scalable High-Performance Storage Cluster with WekaIO Matrix

What s P. Thierry

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

INTEL HPC DEVELOPER CONFERENCE FUEL YOUR INSIGHT

Accelerating NVMe I/Os in Virtual Machine via SPDK vhost* Solution Ziye Yang, Changpeng Liu Senior software Engineer Intel

Building an Open Memory-Centric Computing Architecture using Intel Optane Frank Ober Efstathios Efstathiou Oracle Open World 2017 October 3, 2017

Provisioning Intel Rack Scale Design Bare Metal Resources in the OpenStack Environment

Case Study. Optimizing an Illegal Image Filter System. Software. Intel Integrated Performance Primitives. High-Performance Computing

IBM Power AC922 Server

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Thomas Lin, Naif Tarafdar, Byungchul Park, Paul Chow, and Alberto Leon-Garcia

Ziye Yang. NPG, DCG, Intel

Intel optane memory as platform accelerator. Vladimir Knyazkin

Built to Scale: The Intel Xeon Processor E7 and E5 Families in Cisco UCS

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

Density Optimized System Enabling Next-Gen Performance

Intel tools for High Performance Python 데이터분석및기타기능을위한고성능 Python

Intel Open Network Platform Release 2.0 Hardware and Software Specifications Application Note. SDN/NFV Solutions with Intel Open Network Platform

Cover TBD. intel Quartus prime Design software

IXPUG 16. Dmitry Durnov, Intel MPI team

Cover TBD. intel Quartus prime Design software

Extremely Fast Distributed Storage for Cloud Service Providers

Hubert Nueckel Principal Engineer, Intel. Doug Nelson Technical Lead, Intel. September 2017

Fast Hardware For AI

Intel and SAP Realising the Value of your Data

Efficient Parallel Programming on Xeon Phi for Exascale

Real World Development examples of systems / iot

Data Center Megatrends Opportunity in 2015 Jim Henrys

Are You Insured Against Your Noisy Neighbor Sunku Ranganath, Intel Corporation Sridhar Rao, Spirent Communications

SUPERMICRO, VEXATA AND INTEL ENABLING NEW LEVELS PERFORMANCE AND EFFICIENCY FOR REAL-TIME DATA ANALYTICS FOR SQL DATA WAREHOUSE DEPLOYMENTS

Accelerate Deep Learning Inference with openvino toolkit

Oracle Exadata: Strategy and Roadmap

More performance options

Accelerate Machine Learning on macos with Intel Integrated Graphics. Hisham Chowdhury May 23, 2018

Optimizing Film, Media with OpenCL & Intel Quick Sync Video

SmartNICs: Giving Rise To Smarter Offload at The Edge and In The Data Center

33% 148% 2. at 4 years. Silo d applications & data pockets. Slow Deployment of new services. Security exploits growing. Network bottlenecks

Changpeng Liu, Cloud Software Engineer. Piotr Pelpliński, Cloud Software Engineer

Intel Speed Select Technology Base Frequency - Enhancing Performance

OpenPOWER Performance

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION


12th ANNUAL WORKSHOP 2016 NVME OVER FABRICS. Presented by Phil Cayton Intel Corporation. April 6th, 2016

Cisco UCS B200 M3 Blade Server

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

Engineers can be significantly more productive when ANSYS Mechanical runs on CPUs with a high core count. Executive Summary

Andrzej Jakowski, Armoun Forghan. Apr 2017 Santa Clara, CA

Implementing SQL Server 2016 with Microsoft Storage Spaces Direct on Dell EMC PowerEdge R730xd

SOLUTIONS BRIEF: Transformation of Modern Healthcare

IBM Power Systems HPC Cluster

Fast forward. To your <next>

POWER YOUR CREATIVITY WITH THE INTEL CORE X-SERIES PROCESSOR FAMILY

Michael Kinsner, Dirk Seynhaeve IWOCL 2018

IBM Power Systems: Open innovation to put data to work Dexter Henderson Vice President IBM Power Systems

Evolution of Rack Scale Architecture Storage

Intel Acceleration Stack for Intel Xeon CPU with FPGAs 1.0 Errata

Transcription:

Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden

Intel technologies features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/performance. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/performance. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. 2017 Intel Corporation. Intel, the Intel logo, Stratix, Arria, and Xeon are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as property of others.

Avg. Internet user Autonomous vehicles CONNECTED AIRPLANE Smart Factory Cloud Video Providers By 2020 1.5 GB of traffic / Day 4 TB of data / Day 5 TB of data / Day 1 PB of Data / Day 750 pb of video / Day The Coming Flood of Data Source: Amalgamation of analyst data and Intel analysis. 3

The purpose of computing is insight, not numbers Richard Hamming 4

The World of Compute must Accelerate Autonomous Driving Networking & 5G Cloud Computing 1GB/s real-time processing Sense, understand, react safely in <1 second 1,000x increase in bandwidth 100x more connected devices 1ms end-to-end round trip delay 2013: 4.4ZB 2020: 44ZB data on the planet Need for scale, throughput, compute efficiency and lowest latency 5

Dedicated Accelerators ASSP/ASIC Host CPU Flexible Accelerators FPGA Acceleration of compute means heterogenous compute Dedicated accelerators for maximum compute efficiency Flexible accelerators for maximum compute flexibility Shift towards software-defined hardware 6

FPGAs critical to heterogeneous architectures Delivering the performance of hardware with the programmability of software FLEXIBLE reprogrammable Inherently Parallel Low Latency High Performance Energy Efficient 7

FPGAs Can Help: Target Applications Data Analytics Artificial Intelligence Video Transcoding >10x Performance Improvement 40% TCO Reduction 5X Power Reduction Flexible Precision Types 2X Increase Video Performance 30% Reduction in Power Cyber Security Financial Trading Genomics >1000X Higher Operations/Watt Sub-Microsecond Latencies 2-3X GATK Application Performance 8

Solving Real-World Problems: database acceleration 5X+ FASTER REAL-TIME DATA ANALYTICS Traditional Data Warehousing Storage 2X+ 3X+ Test configuration: Supermicro* SuperServer 2028U-TR4+ with Super X10DRU-i+ Mainboard, 2X Intel Xeon E5-2695 v4 CPUs, 8X Samsung* 32GB DDR4-2400 ECC RAM. Note: This is SQL to relational database, not SQL to semi/unstructured data. TPC DS benchmark used for data warehousing tests. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Solving Real-world problems: Genomics Sequencing 50X PairHMM Algorithm Speedup 1.2X Overall pipeline Speedup Test configuration: Intel Xeon processor E5-2699 v4 at 2.20 GHz, 2 sockets, 22 cores/socket, 256 GB RAM, 2 TB Intel SSD DC P3700, Intel Arria 10 GX Development Kit compared to Intel Xeon processor E5-2699 v4 at 2.20 GHz with Advanced Vector Extensions (AVX), 2 sockets, 22 cores/socket, 256 GB RAM, 2 TB Intel SSD DC P3700. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Solving Real-World Problems: Storage NVMe over Fabric 57-72% Lower latency Test configuration:. Quanta D51B with Intel Xeon processor E5-2600 v3, Attala Systems Development Host NVMe Adapter with Intel Arria 10 FPGA, Intel SSD DC P3700 STD. compared to Quanta D51B with Intel Xeon processor E5-2600 v3, Mellanox ConnectX*-4 Lx EN RNIC, Intel SSD DC P3700 STD. Tests measure performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information Programmable to evaluate performance Solutions as Group you consider your purchase. For more complete information about performance and benchmark results, visit www.intel.com/benchmarks.

Autonomous Driving Acceleration Acceleration of image and sensor processing CPU ASSP ASSP accelerates vision processing ASSP FPGA FPGA FPGA accelerates fusion, machine learning and security 12

Smart NIC and Cloud Acceleration HOST Acceleration of networking, compute and storage CPU NIC ASIC NIC ASIC accelerates networking functions NIC ASIC FPGA FPGA FPGA accelerates compute such as transcoding, encryption and custom networking functions SMART NIC Network 13

Intel Xeon + Integrated FPGA Strengths Intel Xeon socket-compatible Seamless Application Migration Xeon Processor Xeon Processor Intel Xeon + Discrete FPGA Strengths FPGA size choice High Intel Xeon-to-FPGA bandwidth FPGA coherent with Intel Xeon memory Very low Intel Xeon-to-FPGA latency PCIe FPGA UPI Accelerator Accelerator Intel Xeon + Integrated FPGA PCIe PCIe FPGA Accelerator Accelerator Intel Xeon + Discrete FPGA Provides processor-pairing choice Compatible with standard motherboard Compatible with 1RU rack server Dedicated memory (Arria 10 board) Same virtualization framework, drivers, libraries, and development tools Portfolio choice, consistent development experience and investment protection 14

https://www.altera.com/solutions/acceleration-hub/platforms.html https://www.altera.com/pac 15

Software Developers are the New FPGA Developers I don t speak FPGA! What is the programming model, and where are the compilers, libraries and tools I am used to? 16

Application Software SW Developers Software Development Kits Reference Platforms Integrated Frameworks and Tools Language Support, Compilers and Libraries/IP Ecosystem: Open Source Community Partners HW Developers Platform Development Tools 17

Software Developer Hardware Developer Software Developer Algorithm Designer IP Library Developer HDL Designer Intel SoC FPGA Embedded Design Suite (EDS) Intel FPGA SDK for OpenCL DSP Builder for Intel FPGAs Intel HLS Compiler Intel Quartus Prime Design Software 18

User Application OpenCL API Provides Kernel Abstraction Library Orchestration and Accelerator Management OpenCL Runtime Increase User Design Library Infrastructure Software Frameworks Gap: Creating full-stack accelerated Abstraction applications on FPGA can be difficult Library kernels and time consuming Increase User Base Compute Primitives OpenCL SDK Infrastructure Primitives Driver Board Support Package (BSP) 19

Fast, hot-swapping of accelerator functions Accessible from virtual machines and containers Support for leading cloud orchestrators Orchestration / Rack Level Management User Applications Frameworks & SDKs Acceleration Libraries Platform Development Tools Open Programmable Acceleration Engine (OPAE) FPGA Interface Manager (FIM) Accelerator Functions Intel Xeon Processor with FPGA Platforms 20

Discovery, acquisition, access, and management of FPGA resources Thin abstraction to provide portability across platforms variants and generations Simple C API plus optional language bindings User Applications Acceleration Libraries Frameworks & SDKs OPAE C++ API OPAE Python API OPAE Language Bindings OPAE C API libopae-c API Library FPGA Driver Interface Intel FPGA Driver Open-source (BSD / GPLv2) https://01.org/opae 21

Intel Nervana DL Studio SDKs Deep Learning Framework Intel Deep Learning Deployment Toolkit C++ Deep learning accelerator API API for data transfers and kernel synchronization Low-level device communication Topology Deployment Tool DLA run-time OpenCL run-time PCIe driver Deep-Learning Inference Library 22

Public and Private Cloud Users Workload End-User Developed Functions Intel- Developed Functions 3 rd -Party Developed Functions Examples: Machine Learning Encryption Compression Big Data Analytics NFV, vswitch Launch Workload Orchestration Software (FPGA-Enabled) Resource Pool Storage Network Compute VM AF 23

Growing the FPGA-Enabled Ecosystem IP and solutions Developer Community Universities Extensive online catalog of accelerator functions, boards, and reference designs Broad partner network Intel Cloud, Networking & Storage Builder programs AI Academy Intel Developer Zone (IDZ) Rocketboards.org Reaching over 200,000 students per year with FPGA publications, workshops and hands-on research labs Committed to Open Source vision 24

The flood of data and new user experiences drive up compute needs FPGAs are critical for acceleration in heterogenous compute platforms Programming models must evolve to enable FPGAs for the software developer Intel offers a comprehensive solutions stack including FPGA-optimized libraries, compilers, tools, frameworks and SDK integration, and an FPGA-enabled ecosystem Intel FPGA Acceleration Hub https://www.altera.com/solutions/acceleration-hub/ Open Programmable Acceleration Engine http://01.org/opae http://github.com/opae Intel Hardware Accelerator Research Program https://software.intel.com/en-us/hardwareaccelerator-research-program 25