Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development
Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with High Level Languages Software Defined Systems
The Evolution of FPGAs and FPGA Programming
The Evolution of Programmable Devices Logic Cells 1M 3D ICs 10K Programmable SoCs FPGAs 100 PLDs 1985 1995 2005 2015 2025
The Progression of FPGA Design Methodology Logic Software Defined Cells 1M IP-Centric with High-level Languages 10K Schematics RTL Programmable SoCs 3D ICs FPGAs 100 PLDs 1985 1995 2005 2015 2025
The Shift in Developer Personas Application Developer Logic Software Defined Cells 1M Hardware Designer Algorithm Developer Embedded SW Dev IP-Centric with High-level Languages 10K Hardware Designer Schematics RTL Programmable SoCs 3D ICs FPGAs 100 PLDs 1985 1995 2005 2015 2025
IP-Centric Design with High Level Languages IP-Centric Design with High Level Languages
Step 1: Leverage Hard and Soft IP + Embedded Processors Example of Hard IP: Zynq MPSOC Examples of Complex Soft IP AXI-MM AXI-Lite AXI-MM interconnect AXI-Lite interconnect AXI-MM AXI-MM AXI-Lite VDMA Deinterlacer V Scaler H Scaler CSC 422-444 420-422 420-422 Letterboxing AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S AXI4-S router 10x10 AXI4-S OTN Subsystem Video Subsystem HMC Controller Digital Pre- Distortion SmartConnect
Step 2: Develop New IP blocks in C/C++ Algorithmic Specification Micro-architecture Exploration RTL Implementation FPGA Integration Create IP from C/C++/System C algorithm specification Abstract algorithm verification 10,000x faster than RTL sim Traditional FPGA design experience not required
Step 3: Use Automated IP Assembly = IP Assembly Example: Zynq Processor Subsystem + Video Subsystem + 6 IP Blocks 4700 lines of VHDL (top-level connectivity only) Video Processing IP Subsystem
High Level Design Case Study: GainSpeed Venture-backed start-up Products for cable operators to: Meet skyrocketing capacity requirements of streaming video Cost-effectively migrate networks to a software-driven, all-ip architecture Need to be 10x better and 10x cheaper than much larger incumbents Have a much smaller team and need to work smarter 11
Previous Approach to Design 100K+ lines of RTL RTL RTL (VHDL) (VHDL) RTL (VHDL) Test Bench Test Test Bench Bench C) C) (System C) Testbench same as driver code Model Sim Minimal Test Cases Synthesis P&R System Debug (Chipscope) Exhaustive Corner Cases Used Virtex-6 240Ts, targeting 200+ MHz Ran P&R on 100 servers Spent 20% of time designing and 80% making it work Took a team of 10 engineers working for 2 years 12
Current Design Methodology : HLS + IPI Low 1000s lines of C code RTL IP RTL Blocks (VHDL) (VHDL) (C code)) Test Bench Test Test Bench Bench (System C) C) (C code) C Compiler Exhaustive Test Cases HLS IPI Synthesis Kintex 480T + off-the-shelf parts Used HLS to build 80% of the IP Blocks DSP functions, closed-loop timing recovery, DMA engines, etc Fast functional simulation in C P&R Much better coverage achieved earlier Team of 2 people working for 6 months System Debug (Chipscope) System-level Debug 13
Automated IP Assembly Eliminated grunt-work in wiring IP 14
Overall Project Results Elapsed time from project start to running system in lab: 6 months Total number of IP blocks integrated: 30+ Leveraged key IP cores: SRIO, 10G Ethernet MAC, MIG controller, FIR Compiler, Reed-Solomon Design running at 368 MHz in Kintex-7 Enabled co-debug with software developers 15
The Era of Software Defined Systems
Why FPGAs for Software Defined Systems? The Era of Virtualization Reconfigurable computing, storage and networking in the cloud The Thirst for Acceleration Heterogeneous computing Compute-intensive algorithms DNA sequencing Search engines Video processing Encryption/Decryption Packet routing FPGAs and Programmable SoCs: Power-efficient Reconfigurable Massively-Parallel Compute Engines
Example of FPGAs as Accelerators Smith-Waterman DNA Sequencing Application Reference Compares Query(N) with Reference(M) genome strings Query Involves MxN Matrix Computation and Dynamic Programming Maximal parallelism along diagonals Xilinx Virtex-7 690T (reference) Intel Xeon E5-2697 12 core Ratio Virtex-7 vs Intel 12 core Intel Xeon Phi 5110P 60 core Ratio Virtex-7 vs Intel 60 Core GCUPS 7700 1975 390 3000 257 Watts 2800 13000 022 22500 012 GCUPS/Watt 275 015 1810 013 2063
SDSoc: Software Defined SoC Development Applications: Machine Vision Driver Assistance/ADAS Software-Defined Radio (SDR) Wireless Radio Surveillance UAV / Drones Full System Optimizing Compiler ARM Code Main( ) C/C++ Development System-level Profiling Mark C/C++ Functions for Acceleration GCC Connectivity HLS+ SP&R Standard Eclipse IDE Accelerator Func( ) Embedded ARM Processor Subsystem Programmable Logic
SDAccel: Software Defined Algorithm Acceleration Sample Applications: Machine Learning Bioinformatics Graph Processing Stringology Data Analytics Modelling Science Codes Signal Processing Video & Image Processing Software-Defined FPGA Acceleration
Platforms Enable Software Defined FPGA Systems Pre-defined Platform Hardware System Performance Partial Design Board Algorithms Support Reconfig Analysis & Host Software Stack
Summary HW designers: SW developers: C-based IP development + highlevel IP assembly are the next step beyond RTL Software-defined algorithm development + platforms will enable you to exploit the power of FPGAs & SoCs We re making major investments in next generation silicon and tools that will revolutionize FPGA design