HW/SW Programmable Engine:

Size: px
Start display at page:

Download "HW/SW Programmable Engine:"

Transcription

1 HW/SW Programmable Engine: Domain Specific Architecture for Project Everest Juanjo Noguera, Goran Bilski, Jan Langer, Baris Ozgul, Tim Tuan, David Clarke, Peter McColgan, Sneha Date, Zachary Dickman, Pedro Duarte, Stephan Munz, Jose Marques, Sridhar Subramanian, Saurabh Mathur, Malav P. Shah, Chris Dick, Paul Newson, Kaushik Barman, Ramon Uribe, Helen Tarn, Engin Tunali, Richard Walke, Vinod Kathail, Shail Aditya Gupta, Akella Sastry, Mukund Sivaraman, Rishi Surendran, Abraham Lee, Chia-Jui Hsu, Kumud Bhandari, Jack Lo, Kristof Denolf, Phil James-Roxby, Samuel Bayliss, Kees Vissers, Ralph Wittig, Gaurav Singh 21 st August 2018

2 Heterogenous Computing Architecture Everest Heterogeneous Architecture Application-Level Performance Enabled by SW Programmable Engine Processing System SW Programmable Engine Programmable Logic Compute Efficiency Reduced Power Programmable I/O (GT, AMS) 20x Xilinx 7nm Everest 4x Xilinx 16nm UltraScale+ 40% Less Power ML Inference 5G Wireless Power >> 2 7nm Everest is the First Product to Integrate this New Architecture Tape Out in 2018

3 Everest Architecture Block Diagram SW Programmable Engine Domain Specific Architecture Hardened 7nm technology Throughput-oriented, low-latency Processing System Application Processor SW Programmable Engine SW PE SW PE SW PE SW PE SW PE SW PE I/O (GT, AMS) Transceivers PCIe Programmable Logic Soft logic Real-Time Processor OCM Programmable Logic DDR HBM Flexibility LUT BRAM Custom memory hierarchy SMMU DSP URAM AMS Chips with Different Implementations of the Architecture will be Announced Shortly >> 3

4 Market Requirements and Trends: Data Center Increasing Compute Lower Latency Requirements Heterogeneous Workloads ML Becoming an Essential Part of Data Processing Video + ML Genomics + ML Risk Modelling + ML Database + ML Network IPS + ML Storage + ML Source: Canziani et. al, An Analysis of Deep Neural Network Models for Practical Applications >> 4

5 ML Inference on Heterogenous Architecture Convolutions Fully Connected Layers Pooling Activations single depth slice X y i = 0 y y i = x i x y i = a i x i y y i = x i Y ReLU ReLU/PReLU Video Genomics Storage Database Network IPS Risk modeling Processing System SW Programmable Engine Programmable Logic I/O (GT, AMS) Feature Map Data Volume* Custom Hierarchy *Figure credit: >> 5

6 Packet Processing and Wired Backhaul Higher Layer Processing Baseband Processing Switching Beam Forming & MMIO + Some Baseband Transforms Digital Radio ADC / DAC Analogue Radio Antenna Array Market Requirements and Trends: Wireless 5G 5G Complexity is 100X that of 4G Still Evolving Standard New Technologies in 5G Massive MIMO Multiple antenna, frequency bands Changing functional partitioning ETRI RWS , 5G Vision and Enabling Technologies: ETRI Perspective 3GPP RAN Workshop Phoenix, Dec Transport & CTRL L2 L7 Modulation & FEC IQ Switch Linear Algebra ifft/ FFT DUC, CFR, DPD, DDC PA, LNA, Diplexer >> 6

7 Packet Processing and Wired Backhaul Higher Layer Processing Baseband Processing Switching Beam Forming & MMIO + Some Baseband Transforms Digital Radio ADC / DAC Analogue Radio Antenna Array 5G Wireless on Heterogenous Architecture 5G Wireless Infrastructure (i.e., base-station) Digital Radio with ADC/DAC Compute Maps to PE Mapping Example CPRI DPD Update DUC DPD AMS Control Maps to PS Processing System DPD Update SW Programmable Engine DUC DPD Programmable Logic I/O AMS CPRI 1: DUC: Digital Up Converter 2: DPD: Digital Pre-Distortion 3: AMS: Analog Mix-Signal (ADC/DAC) 4: CPRI: Common Public Radio Interface >> 7 I/O Maps to PL

8 Architecture Overview

9 Tile-Based Architecture Non-Blocking Interconnect Up to 200+ GB/s bandwidth per tile PS I/O PL Interconnect Local Multi-bank implementation Shared across neighbor cores Local ISA-based Processor ML Extensions ISA-based Processor Programmable (e.g., C/C++) Data Mover 5G Wireless Extensions >> 9 Cascade Interface Partial results to next core Data Mover Non-neighbor data communication Integrated synchronization primitives

10 Tile-Based Architecture (Continued) PS I/O Massive multi-core engines Increase in compute, memory and communication bandwidth PL Modular and scalable architecture Instantiate multiple tiles for more compute 10 s-100 s Instances* *Device Specific Distributed memory hierarchy Maximize memory bandwidth Architectural Focus: Deterministic Performance & Low Latency >> 10

11 Data Movement Architecture: Examples (1/2) 1 Dataflow Pipeline Overlap Compute and Communication PE 0 B0 B1 PE 1 B2 B3 PE 2 Comm Compute Comm Compute Comm Compute 2 Dataflow Graph 3 Streaming Multicast Mem PE Mem PE PE 0 PE 1 PE Mem PE Mem PE PE 2 Mem PE Mem PE PE 3 >> 11

12 Data Movement Architecture: Examples (2/2) 4 Non-neighbor communication PE 0 PE 1 Physical Mapping 5 Cascade streaming PE 1 PE 2 PE 3 PE 0 PE 2 PE 3 >> 12

13 Device Integration PS PL I/O Up to TB/s of bandwidth between PE and PL * *Device Specific CDC CDC CDC Programmable Logic Clock-domain crossing (CDC) between PL & PE clocks DSP LUT BRAM URAM >> 13

14 Data Movement Architecture: Examples PL Integration PL Accelerator PL Accelerator Pre/Post Processing Co-processing Mem PE 0 PE 1 Mem Custom On-chip Hierarchy Custom On-Chip Hierarchy Leverages on-chip memory resources in PL (e.g., BRAM, URAM) High on-chip memory bandwidth and lower memory access latency >> 14

15 Programming Environment

16 Programming Environment Full Device Programming Solution Fully Integrated with Tools for Other Everest Fabrics High Level Programming Languages (e.g., C/C++) C/C++ C/C++ DPD Update CPRI DUC DPD AMS Device-Level Programming SW Programmable Engine PS Programmable Logic I/O CPU Compiler SW PE Compiler PL Compiler >> 16

17 Complete Development Stack C/C++ C/C++ Device-Level Programming CPU Compiler SW PE Compiler PL Compiler 1 Full SW Programming Tool Chain (Single- and Multi-) 2 Performance-Optimized Libraries (Examples) 3 Run-Time (Examples) IDE Wireless 4G/5G Library Error Management Compiler Debugger ML (BLAS) Library Management Boot + Configuration Performance Analysis Vision Library Power/Thermal Management >> 17

18 High-level Compiler for ML Inference Users Working at ML Framework Level Users don t directly program PE Deep Learning Frameworks Seamless Migration Same experience as 16nm devices Current models continue to work Xilinx ML Compiler Xilinx 16nm UltraScale+ Xilinx Everest w/ PE >> 18

19 Application Results

20 Kernel-Level Performance: ML and Wireless 5G Processor Efficiency Peak Theoretical Performance 95% 80% 98% ML Convolutions FFT DPD Block-based Matrix Multiplication (32 64) (64 32) 1024-pt FFT/iFFT Volterra-based forward-path DPD High Compute Efficiency for Key Functions in ML and Wireless 5G >> 20

21 Programmable Engine: Summary Image Classification (GoogleNet CNN) Sub 1ms Latency 20x Application-level Performance Enabled by SW Programmable Engine Massive MIMO Radio (DUC, DDC, CFR, DPD) ML Inference 5G Wireless Power Reduction Xilinx Everest 7nm TX Sample Rate Up to 2Gsps 4x Xilinx UltraScale+ 16nm 40% Less Power Compute Efficiency Domain specific architecture Increase in compute density Xilinx 7nm Everest Heterogenous Architecture High-throughput, low-latency PL flexibility Custom memory hierarchy Multiple Applications ML Inference for Cloud DC Wireless 5G: Radio, Baseband ADAS/AD embedded vision Wired: DOCSIS cable access SW Programmable SW programmable (e.g., C/C++) Compile, execute, debug Optimized software libraries >> 23

22 KEYNOTE Adaptable Intelligence: The Next Computing Era Victor Peng, Xilinx CEO AUGUST 21 A.M. Connect Learn Share XDF connects software developers and system designers to the deep expertise of Xilinx engineers, partners, and industry leaders. Silicon Valley October 1-2 Learn More >> 22

23 Adaptable. Intelligent.

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm

Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Engineering Director, Xilinx Silicon Architecture Group Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Presented By Kees Vissers Fellow February 25, FPGA 2019 Technology scaling

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs

Xilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs ilinx DNN Proceor An Inference Engine, Network Compiler Runtime for ilinx FPGA Rahul Nimaiyar, Brian Sun, Victor Wu, Thoma Branca, Yi Wang, Jutin Oo, Elliott Delaye, Aaron Ng, Paolo D'Alberto, Sean Settle,

More information

Xilinx ML Suite Overview

Xilinx ML Suite Overview Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame

More information

Zynq-7000 All Programmable SoC Product Overview

Zynq-7000 All Programmable SoC Product Overview Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform

More information

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015

RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,

More information

Universal Hardware Platform for

Universal Hardware Platform for Universal Hardware Platform for OpenAir5G OpenAir5G Daqing Li From BUPT 1. Preface 2. Universal Platform for Baseband CONTENT Unit(BBU) 3. Universal Platform for Remote Radio Unit(RRU) 4. Accelerate Cards

More information

Signal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech

Signal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech Signal Conversion in a Modular Open Standard Form Factor CASPER Workshop August 2017 Saeed Karamooz, VadaTech At VadaTech we are technology leaders First-to-market silicon Continuous innovation Open systems

More information

OCP Engineering Workshop - Telco

OCP Engineering Workshop - Telco OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,

More information

FPGA Entering the Era of the All Programmable SoC

FPGA Entering the Era of the All Programmable SoC FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip

More information

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx

High Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth

More information

Revolutionizing the Datacenter

Revolutionizing the Datacenter Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5

More information

The Application of FPGAs for Wireless Base-Station Connectivity

The Application of FPGAs for Wireless Base-Station Connectivity White Paper: 7 Series and UltraScale FPGAs WP450 (v1.1) September 29, 2015 The Application of FPGAs for Wireless Base-Station Connectivity By: Paul Newson Painstakingly architected to be a generation ahead,

More information

Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP

Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Devices / edge network Cloud/data center Removing data Bottlenecks with Fpga acceleration

More information

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics

Overcoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs

How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs Delivering a Generation Ahead How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs Agenda Introduction to Mobile Network Introduction to Xilinx Solution

More information

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research

Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)

More information

Maximizing heterogeneous system performance with ARM interconnect and CCIX

Maximizing heterogeneous system performance with ARM interconnect and CCIX Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable

More information

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team

COSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical

More information

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System

Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM

More information

Timing & Synchronization in Wireless Infrastructure

Timing & Synchronization in Wireless Infrastructure Timing & Synchronization in Wireless Infrastructure Harpinder Singh Matharu Senior Product Manager, Comms Division, ilinx 11/4/2010 Copyright 2010 ilinx Topics Wireless Infrastructure Market Trends Timing

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

Differentiation in Flexible Data Center Interconnect

Differentiation in Flexible Data Center Interconnect Differentiation in Flexible Data Center Interconnect Industry Leadership Momentum Founded 1984 20,000 customers $238B FY15 revenue 3,500 employees world wide >55% Market segment share 3500+ Patents 60

More information

Zynq Ultrascale+ Architecture

Zynq Ultrascale+ Architecture Zynq Ultrascale+ Architecture Stephanie Soldavini and Andrew Ramsey CMPE-550 Dec 2017 Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17 Agenda Heterogeneous Computing Zynq Ultrascale+

More information

White Paper Using Cyclone III FPGAs for Emerging Wireless Applications

White Paper Using Cyclone III FPGAs for Emerging Wireless Applications White Paper Introduction Emerging wireless applications such as remote radio heads, pico/femto base stations, WiMAX customer premises equipment (CPE), and software defined radio (SDR) have stringent power

More information

Building blocks for 64-bit Systems Development of System IP in ARM

Building blocks for 64-bit Systems Development of System IP in ARM Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects

More information

Simplifying FPGA Design with A Novel Network-on-Chip Architecture

Simplifying FPGA Design with A Novel Network-on-Chip Architecture Simplifying FPGA Design with A Novel Network-on-Chip Architecture ABSTRACT John Malsbury Ettus Research 1043 N Shoreline Blvd Suite 100 +1 (650) 967-2870 john.malsbury@ettus.com As wireless communications

More information

Deep Learning Accelerators

Deep Learning Accelerators Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction

More information

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)

NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

XMC-RFSOC-A. XMC Module Xilinx Zynq UltraScale+ RFSOC. Overview. Key Features. Typical Applications. Advanced Information Subject To Change

XMC-RFSOC-A. XMC Module Xilinx Zynq UltraScale+ RFSOC. Overview. Key Features. Typical Applications. Advanced Information Subject To Change Advanced Information Subject To Change XMC-RFSOC-A XMC Module Xilinx Zynq UltraScale+ RFSOC Overview PanaTeQ s XMC-RFSOC-A is a XMC module based on the Zynq UltraScale+ RFSoC device from Xilinx. The Zynq

More information

An introduction to Machine Learning silicon

An introduction to Machine Learning silicon An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional

More information

Hot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect Gaurav Singh Sagheer Ahmad, Ralph Wittig, Millind

Hot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect Gaurav Singh Sagheer Ahmad, Ralph Wittig, Millind Hot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and Interconnect Gaurav Singh Sagheer Ahmad, Ralph Wittig, Millind Mittal, Ygal Arbel, Arun VR, Suresh Ramalingam, Kiran Puranik,

More information

Enabling Technology for the Cloud and AI One Size Fits All?

Enabling Technology for the Cloud and AI One Size Fits All? Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence

More information

Arm s First-Generation Machine Learning Processor

Arm s First-Generation Machine Learning Processor Arm s First-Generation Machine Learning Processor Ian Bratt 2018 Arm Limited Introducing the Arm Machine Learning (ML) Processor Optimized ground-up architecture for machine learning processing Massive

More information

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),

More information

Cost-Optimized Backgrounder

Cost-Optimized Backgrounder Cost-Optimized Backgrounder A Cost-Optimized FPGA & SoC Portfolio for Part or All of Your System Optimizing a system for cost requires analysis of every silicon device on the board, particularly the high

More information

How to validate your FPGA design using realworld

How to validate your FPGA design using realworld How to validate your FPGA design using realworld stimuli Daniel Clapham National Instruments ni.com Agenda Typical FPGA Design NIs approach to FPGA Brief intro into platform based approach RIO architecture

More information

Enabling FPGAs in Hyperscale Data Centers

Enabling FPGAs in Hyperscale Data Centers J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical

More information

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center

SDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently

More information

Fast Hardware For AI

Fast Hardware For AI Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights

More information

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.

Optimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015. Optimizing HW/SW Partition of a Complex Embedded Systems Simon George November 2015 Zynq-7000 All Programmable SoC HP ACP GP Page 2 Zynq UltraScale+ MPSoC Page 3 HW/SW Optimization Challenges application()

More information

Adaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018

Adaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018 Adaptable Computing The Future of FPGA Acceleration Dan Gibbons, VP Software Development June 6, 2018 Adaptable Accelerated Computing Page 2 Three Big Trends The Evolution of Computing Trend to Heterogeneous

More information

Accelerating Data Center Workloads with FPGAs

Accelerating Data Center Workloads with FPGAs Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden Intel technologies features and benefits depend on system configuration and may require enabled hardware, software

More information

Next Generation Enterprise Solutions from ARM

Next Generation Enterprise Solutions from ARM Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the

More information

Nutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET

Nutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET Nutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET QUEBEC I MONTREAL I NEW YORK I nutaq.com Nutaq μdigitizer Supports a wide variety of high-speed I/O A/D and D/A converters

More information

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray

GRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively

More information

Near Memory Key/Value Lookup Acceleration MemSys 2017

Near Memory Key/Value Lookup Acceleration MemSys 2017 Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy

More information

SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou

SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding

More information

New Software-Designed Instruments

New Software-Designed Instruments 1 New Software-Designed Instruments Nicholas Haripersad Field Applications Engineer National Instruments South Africa Agenda What Is a Software-Designed Instrument? Why Software-Designed Instrumentation?

More information

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,

More information

New! New! New! New! New!

New! New! New! New! New! New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to

More information

Ettus Research Update

Ettus Research Update Ettus Research Update Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 Recent New Products 3 Third Generation Introduction Who am I? Core GNU Radio contributor since 2001 Designed

More information

Simplifying FPGA Design for SDR with a Network on Chip Architecture

Simplifying FPGA Design for SDR with a Network on Chip Architecture Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen

More information

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi

RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi Power Matters. TM RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi chowdhary.musunuri@microsemi.com RIC217 1 Agenda A brief introduction

More information

5G-PICTURE Project: Converged 5G

5G-PICTURE Project: Converged 5G 5G-PICTURE Project: Converged 5G Fronthaul/ Backhaul Infrastructure based on Dis-Aggregated RAN Presenter: Ioanna Mesogiti Senior R&D Engineer, MBA, MSc COSMOTE - Mobile Telecommunications S.A. R&D Projects

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

KeyStone C66x Multicore SoC Overview. Dec, 2011

KeyStone C66x Multicore SoC Overview. Dec, 2011 KeyStone C66x Multicore SoC Overview Dec, 011 Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution Challenge Before KeyStone Multicore performance degradation Lack of efficient

More information

Third Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR

Third Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR Third Genera+on USRP Devices and the RF Network- On- Chip Leif Johansson Market Development RF, Comm and SDR About Ettus Research Leader in soeware defined radio and signals intelligence Maker of USRP

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

Extending the Power of FPGAs

Extending the Power of FPGAs Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with

More information

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA

Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School

More information

Extending the Power of FPGAs to Software Developers:

Extending the Power of FPGAs to Software Developers: Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

Bidirectional 10&40 km Optical PHY for 50GbE. Xinyuan Wang Huawei Technologies

Bidirectional 10&40 km Optical PHY for 50GbE. Xinyuan Wang Huawei Technologies Bidirectional 10&40 km Optical PHY for 50GbE Xinyuan Wang Huawei Technologies Background In IEEE 802 March plenary meeting, Call For Interest Bidirectional 10Gb/s and 25Gb/s optical access PHYs is accepted

More information

Enabling Technologies for Next Generation Wireless Systems

Enabling Technologies for Next Generation Wireless Systems Enabling Technologies for Next Generation Wireless Systems Dr. Amitava Ghosh Nokia Fellow Nokia Bell Labs 10 th March, 2016 1 Nokia 2015 Heterogeneous use cases diverse requirements >10 Gbps peak data

More information

Deliver Femtocell Basestations Quickly and Cost Effectively with Altera HardCopy Structured ASICs Altera Corporation Public

Deliver Femtocell Basestations Quickly and Cost Effectively with Altera HardCopy Structured ASICs Altera Corporation Public Deliver Femtocell Basestations Quickly and Cost Effectively with HardCopy Structured ASICs Agenda Femtocell overview Benefits and challenges Enabling low-cost, low-risk, fast time-to-market femtocell designs

More information

Simplify System Complexity

Simplify System Complexity 1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller

More information

The WINLAB Cognitive Radio Platform

The WINLAB Cognitive Radio Platform The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any

More information

CCIX: a new coherent multichip interconnect for accelerated use cases

CCIX: a new coherent multichip interconnect for accelerated use cases : a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity

More information

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013

A Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013 A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company

More information

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA

Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089

More information

Exploring OpenCL Memory Throughput on the Zynq

Exploring OpenCL Memory Throughput on the Zynq Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines

More information

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray

GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org

More information

Cognitive Radio Platform Research at WINLAB

Cognitive Radio Platform Research at WINLAB Cognitive Radio Platform Research at WINLAB December 2, 2010 Zoran Miljanic and Ivan Seskar WINLAB Rutgers University www.winlab.rutgers.edu 1 WiNC2R objectives Programmable processing of phy and higher

More information

Massively Parallel Processor Breadboarding (MPPB)

Massively Parallel Processor Breadboarding (MPPB) Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,

More information

Dr. Evaldas Stankevičius, Regulatory and Security Expert.

Dr. Evaldas Stankevičius, Regulatory and Security Expert. 2018-08-23 Dr. Evaldas Stankevičius, Regulatory and Security Expert Email: evaldas.stankevicius@tele2.com 1G: purely analog system. 2G: voice and SMS. 3G: packet switching communication. 4G: enhanced mobile

More information

White Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004.

White Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004. White Paper The advantages of using a combination of DSP s and FPGA s Version: 1.0 Author: Louis N. Bélanger Date: May, 2004 Lyrtech Inc The advantages of using a combination of DSP s and FPGA s DSP and

More information

The Many Dimensions of SDR Hardware

The Many Dimensions of SDR Hardware The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia

More information

Towards 5G RAN Virtualization Enabled by Intel and ASTRI*

Towards 5G RAN Virtualization Enabled by Intel and ASTRI* white paper Communications Service Providers C-RAN Towards 5G RAN Virtualization Enabled by Intel and ASTRI* ASTRI* has developed a flexible, scalable, and high-performance virtualized C-RAN solution to

More information

Brainchip OCTOBER

Brainchip OCTOBER Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History

More information

ITU Arab Forum on Future Networks: "Broadband Networks in the Era of App Economy", Tunis - Tunisia, Feb. 2017

ITU Arab Forum on Future Networks: Broadband Networks in the Era of App Economy, Tunis - Tunisia, Feb. 2017 On the ROAD to 5G Ines Jedidi Network Products, Ericsson Maghreb ITU Arab Forum on Future Networks: "Broadband Networks in the Era of App Economy", Tunis - Tunisia, 21-22 Feb. 2017 agenda Why 5G? What

More information

An 80-core GRVI Phalanx Overlay on PYNQ-Z1:

An 80-core GRVI Phalanx Overlay on PYNQ-Z1: An 80-core GRVI Phalanx Overlay on PYNQ-Z1: Pynq as a High Productivity Platform For FPGA Design and Exploration Jan Gray jan@fpga.org http://fpga.org/grvi-phalanx FCCM 2017 05/03/2017 Pynq Workshop My

More information

借助 SDSoC 快速開發複雜的嵌入式應用

借助 SDSoC 快速開發複雜的嵌入式應用 借助 SDSoC 快速開發複雜的嵌入式應用 May 2017 What Is C/C++ Development System-level Profiling SoC application-like programming Tools and IP for system-level profiling Specify C/C++ Functions for Acceleration Full System

More information

Communication Systems Design in Practice

Communication Systems Design in Practice Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 03 04 '05 '06 '07 '08 '09 '10 '11 '12 '13

More information

The importance of RAN to Core validation as networks evolve to support 5G

The importance of RAN to Core validation as networks evolve to support 5G The importance of RAN to Core validation as networks evolve to support 5G Stephen Hire Vice President Asia Pacific Cobham Wireless October 2017 Commercial in Confidence Cobham Wireless The industry standard

More information

Did I Just Do That on a Bunch of FPGAs?

Did I Just Do That on a Bunch of FPGAs? Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure

More information

Tile Processor (TILEPro64)

Tile Processor (TILEPro64) Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth

More information

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing

Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Min Li, limin@imec.be Wireless Research, IMEC Introduction SDR Baseband Platforms Today are Usually Based on ILP + DLP + MP Massive

More information

RapidIO.org Update.

RapidIO.org Update. RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications

More information

Communication Systems Design in Practice

Communication Systems Design in Practice Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments A Word About National Instruments Annual Revenue: $1.14 billion Global Operations: Approximately 6,870 employees;

More information

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist

High Performance Embedded Applications. Raja Pillai Applications Engineering Specialist High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System

In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System A presentation for LMCO-MPAR project 2007 briefing Dr. Yan Zhang School of Electrical and Computer

More information

Simulation, prototyping and verification of standards-based wireless communications

Simulation, prototyping and verification of standards-based wireless communications Simulation, prototyping and verification of standards-based wireless communications Colin McGuire, Neil MacEwen 2015 The MathWorks, Inc. 1 Real Time LTE Cell Scanner with MATLAB and Simulink 2 Real time

More information

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com

FlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers

More information

Unified Deep Learning with CPU, GPU, and FPGA Technologies

Unified Deep Learning with CPU, GPU, and FPGA Technologies Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine

More information

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs

Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru

More information