HW/SW Programmable Engine:
|
|
- Emerald Rice
- 5 years ago
- Views:
Transcription
1 HW/SW Programmable Engine: Domain Specific Architecture for Project Everest Juanjo Noguera, Goran Bilski, Jan Langer, Baris Ozgul, Tim Tuan, David Clarke, Peter McColgan, Sneha Date, Zachary Dickman, Pedro Duarte, Stephan Munz, Jose Marques, Sridhar Subramanian, Saurabh Mathur, Malav P. Shah, Chris Dick, Paul Newson, Kaushik Barman, Ramon Uribe, Helen Tarn, Engin Tunali, Richard Walke, Vinod Kathail, Shail Aditya Gupta, Akella Sastry, Mukund Sivaraman, Rishi Surendran, Abraham Lee, Chia-Jui Hsu, Kumud Bhandari, Jack Lo, Kristof Denolf, Phil James-Roxby, Samuel Bayliss, Kees Vissers, Ralph Wittig, Gaurav Singh 21 st August 2018
2 Heterogenous Computing Architecture Everest Heterogeneous Architecture Application-Level Performance Enabled by SW Programmable Engine Processing System SW Programmable Engine Programmable Logic Compute Efficiency Reduced Power Programmable I/O (GT, AMS) 20x Xilinx 7nm Everest 4x Xilinx 16nm UltraScale+ 40% Less Power ML Inference 5G Wireless Power >> 2 7nm Everest is the First Product to Integrate this New Architecture Tape Out in 2018
3 Everest Architecture Block Diagram SW Programmable Engine Domain Specific Architecture Hardened 7nm technology Throughput-oriented, low-latency Processing System Application Processor SW Programmable Engine SW PE SW PE SW PE SW PE SW PE SW PE I/O (GT, AMS) Transceivers PCIe Programmable Logic Soft logic Real-Time Processor OCM Programmable Logic DDR HBM Flexibility LUT BRAM Custom memory hierarchy SMMU DSP URAM AMS Chips with Different Implementations of the Architecture will be Announced Shortly >> 3
4 Market Requirements and Trends: Data Center Increasing Compute Lower Latency Requirements Heterogeneous Workloads ML Becoming an Essential Part of Data Processing Video + ML Genomics + ML Risk Modelling + ML Database + ML Network IPS + ML Storage + ML Source: Canziani et. al, An Analysis of Deep Neural Network Models for Practical Applications >> 4
5 ML Inference on Heterogenous Architecture Convolutions Fully Connected Layers Pooling Activations single depth slice X y i = 0 y y i = x i x y i = a i x i y y i = x i Y ReLU ReLU/PReLU Video Genomics Storage Database Network IPS Risk modeling Processing System SW Programmable Engine Programmable Logic I/O (GT, AMS) Feature Map Data Volume* Custom Hierarchy *Figure credit: >> 5
6 Packet Processing and Wired Backhaul Higher Layer Processing Baseband Processing Switching Beam Forming & MMIO + Some Baseband Transforms Digital Radio ADC / DAC Analogue Radio Antenna Array Market Requirements and Trends: Wireless 5G 5G Complexity is 100X that of 4G Still Evolving Standard New Technologies in 5G Massive MIMO Multiple antenna, frequency bands Changing functional partitioning ETRI RWS , 5G Vision and Enabling Technologies: ETRI Perspective 3GPP RAN Workshop Phoenix, Dec Transport & CTRL L2 L7 Modulation & FEC IQ Switch Linear Algebra ifft/ FFT DUC, CFR, DPD, DDC PA, LNA, Diplexer >> 6
7 Packet Processing and Wired Backhaul Higher Layer Processing Baseband Processing Switching Beam Forming & MMIO + Some Baseband Transforms Digital Radio ADC / DAC Analogue Radio Antenna Array 5G Wireless on Heterogenous Architecture 5G Wireless Infrastructure (i.e., base-station) Digital Radio with ADC/DAC Compute Maps to PE Mapping Example CPRI DPD Update DUC DPD AMS Control Maps to PS Processing System DPD Update SW Programmable Engine DUC DPD Programmable Logic I/O AMS CPRI 1: DUC: Digital Up Converter 2: DPD: Digital Pre-Distortion 3: AMS: Analog Mix-Signal (ADC/DAC) 4: CPRI: Common Public Radio Interface >> 7 I/O Maps to PL
8 Architecture Overview
9 Tile-Based Architecture Non-Blocking Interconnect Up to 200+ GB/s bandwidth per tile PS I/O PL Interconnect Local Multi-bank implementation Shared across neighbor cores Local ISA-based Processor ML Extensions ISA-based Processor Programmable (e.g., C/C++) Data Mover 5G Wireless Extensions >> 9 Cascade Interface Partial results to next core Data Mover Non-neighbor data communication Integrated synchronization primitives
10 Tile-Based Architecture (Continued) PS I/O Massive multi-core engines Increase in compute, memory and communication bandwidth PL Modular and scalable architecture Instantiate multiple tiles for more compute 10 s-100 s Instances* *Device Specific Distributed memory hierarchy Maximize memory bandwidth Architectural Focus: Deterministic Performance & Low Latency >> 10
11 Data Movement Architecture: Examples (1/2) 1 Dataflow Pipeline Overlap Compute and Communication PE 0 B0 B1 PE 1 B2 B3 PE 2 Comm Compute Comm Compute Comm Compute 2 Dataflow Graph 3 Streaming Multicast Mem PE Mem PE PE 0 PE 1 PE Mem PE Mem PE PE 2 Mem PE Mem PE PE 3 >> 11
12 Data Movement Architecture: Examples (2/2) 4 Non-neighbor communication PE 0 PE 1 Physical Mapping 5 Cascade streaming PE 1 PE 2 PE 3 PE 0 PE 2 PE 3 >> 12
13 Device Integration PS PL I/O Up to TB/s of bandwidth between PE and PL * *Device Specific CDC CDC CDC Programmable Logic Clock-domain crossing (CDC) between PL & PE clocks DSP LUT BRAM URAM >> 13
14 Data Movement Architecture: Examples PL Integration PL Accelerator PL Accelerator Pre/Post Processing Co-processing Mem PE 0 PE 1 Mem Custom On-chip Hierarchy Custom On-Chip Hierarchy Leverages on-chip memory resources in PL (e.g., BRAM, URAM) High on-chip memory bandwidth and lower memory access latency >> 14
15 Programming Environment
16 Programming Environment Full Device Programming Solution Fully Integrated with Tools for Other Everest Fabrics High Level Programming Languages (e.g., C/C++) C/C++ C/C++ DPD Update CPRI DUC DPD AMS Device-Level Programming SW Programmable Engine PS Programmable Logic I/O CPU Compiler SW PE Compiler PL Compiler >> 16
17 Complete Development Stack C/C++ C/C++ Device-Level Programming CPU Compiler SW PE Compiler PL Compiler 1 Full SW Programming Tool Chain (Single- and Multi-) 2 Performance-Optimized Libraries (Examples) 3 Run-Time (Examples) IDE Wireless 4G/5G Library Error Management Compiler Debugger ML (BLAS) Library Management Boot + Configuration Performance Analysis Vision Library Power/Thermal Management >> 17
18 High-level Compiler for ML Inference Users Working at ML Framework Level Users don t directly program PE Deep Learning Frameworks Seamless Migration Same experience as 16nm devices Current models continue to work Xilinx ML Compiler Xilinx 16nm UltraScale+ Xilinx Everest w/ PE >> 18
19 Application Results
20 Kernel-Level Performance: ML and Wireless 5G Processor Efficiency Peak Theoretical Performance 95% 80% 98% ML Convolutions FFT DPD Block-based Matrix Multiplication (32 64) (64 32) 1024-pt FFT/iFFT Volterra-based forward-path DPD High Compute Efficiency for Key Functions in ML and Wireless 5G >> 20
21 Programmable Engine: Summary Image Classification (GoogleNet CNN) Sub 1ms Latency 20x Application-level Performance Enabled by SW Programmable Engine Massive MIMO Radio (DUC, DDC, CFR, DPD) ML Inference 5G Wireless Power Reduction Xilinx Everest 7nm TX Sample Rate Up to 2Gsps 4x Xilinx UltraScale+ 16nm 40% Less Power Compute Efficiency Domain specific architecture Increase in compute density Xilinx 7nm Everest Heterogenous Architecture High-throughput, low-latency PL flexibility Custom memory hierarchy Multiple Applications ML Inference for Cloud DC Wireless 5G: Radio, Baseband ADAS/AD embedded vision Wired: DOCSIS cable access SW Programmable SW programmable (e.g., C/C++) Compile, execute, debug Optimized software libraries >> 23
22 KEYNOTE Adaptable Intelligence: The Next Computing Era Victor Peng, Xilinx CEO AUGUST 21 A.M. Connect Learn Share XDF connects software developers and system designers to the deep expertise of Xilinx engineers, partners, and industry leaders. Silicon Valley October 1-2 Learn More >> 22
23 Adaptable. Intelligent.
Versal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationVersal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm
Engineering Director, Xilinx Silicon Architecture Group Versal: The New Xilinx Adaptive Compute Acceleration Platform (ACAP) in 7nm Presented By Kees Vissers Fellow February 25, FPGA 2019 Technology scaling
More informationAdaptable Intelligence The Next Computing Era
Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion
More informationXilinx DNN Processor An Inference Engine, Network Compiler + Runtime for Xilinx FPGAs
ilinx DNN Proceor An Inference Engine, Network Compiler Runtime for ilinx FPGA Rahul Nimaiyar, Brian Sun, Victor Wu, Thoma Branca, Yi Wang, Jutin Oo, Elliott Delaye, Aaron Ng, Paolo D'Alberto, Sean Settle,
More informationXilinx ML Suite Overview
Xilinx ML Suite Overview Yao Fu System Architect Data Center Acceleration Xilinx Accelerated Computing Workloads Machine Learning Inference Image classification and object detection Video Streaming Frame
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationRFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015
RFNoC : RF Network on Chip Martin Braun, Jonathon Pendlum GNU Radio Conference 2015 Outline Motivation Current situation Goal RFNoC Basic concepts Architecture overview Summary No Demo! See our booth,
More informationUniversal Hardware Platform for
Universal Hardware Platform for OpenAir5G OpenAir5G Daqing Li From BUPT 1. Preface 2. Universal Platform for Baseband CONTENT Unit(BBU) 3. Universal Platform for Remote Radio Unit(RRU) 4. Accelerate Cards
More informationSignal Conversion in a Modular Open Standard Form Factor. CASPER Workshop August 2017 Saeed Karamooz, VadaTech
Signal Conversion in a Modular Open Standard Form Factor CASPER Workshop August 2017 Saeed Karamooz, VadaTech At VadaTech we are technology leaders First-to-market silicon Continuous innovation Open systems
More informationOCP Engineering Workshop - Telco
OCP Engineering Workshop - Telco Low Latency Mobile Edge Computing Trevor Hiatt Product Management, IDT IDT Company Overview Founded 1980 Workforce Approximately 1,800 employees Headquarters San Jose,
More informationFPGA Entering the Era of the All Programmable SoC
FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip
More informationHigh Capacity and High Performance 20nm FPGAs. Steve Young, Dinesh Gaitonde August Copyright 2014 Xilinx
High Capacity and High Performance 20nm FPGAs Steve Young, Dinesh Gaitonde August 2014 Not a Complete Product Overview Page 2 Outline Page 3 Petabytes per month Increasing Bandwidth Global IP Traffic Growth
More informationRevolutionizing the Datacenter
Power-Efficient Machine Learning using FPGAs on POWER Systems Ralph Wittig, Distinguished Engineer Office of the CTO, Xilinx Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit Top-5
More informationThe Application of FPGAs for Wireless Base-Station Connectivity
White Paper: 7 Series and UltraScale FPGAs WP450 (v1.1) September 29, 2015 The Application of FPGAs for Wireless Base-Station Connectivity By: Paul Newson Painstakingly architected to be a generation ahead,
More informationData-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP
Data-Centric Innovation Summit DAN MCNAMARA SENIOR VICE PRESIDENT GENERAL MANAGER, PROGRAMMABLE SOLUTIONS GROUP Devices / edge network Cloud/data center Removing data Bottlenecks with Fpga acceleration
More informationOvercoming the Memory System Challenge in Dataflow Processing. Darren Jones, Wave Computing Drew Wingard, Sonics
Overcoming the Memory System Challenge in Dataflow Processing Darren Jones, Wave Computing Drew Wingard, Sonics Current Technology Limits Deep Learning Performance Deep Learning Dataflow Graph Existing
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationHow to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs
Delivering a Generation Ahead How to Efficiently Implement Flexible and Full-Featured Digital Radio Solutions Using All Programmable SoCs Agenda Introduction to Mobile Network Introduction to Xilinx Solution
More informationScaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research
Scaling Convolutional Neural Networks on Reconfigurable Logic Michaela Blott, Principal Engineer, Xilinx Research Nick Fraser (Xilinx & USydney) Yaman Umuroglu (Xilinx & NTNU) Giulio Gambardella (Xilinx)
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationCOSMOS Architecture and Key Technologies. June 1 st, 2018 COSMOS Team
COSMOS Architecture and Key Technologies June 1 st, 2018 COSMOS Team COSMOS: System Architecture (2) System design based on three levels of SDR radio node (S,M,L) with M,L connected via fiber to optical
More informationFrequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System
Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System Chi Zhang, Viktor K Prasanna University of Southern California {zhan527, prasanna}@usc.edu fpga.usc.edu ACM
More informationTiming & Synchronization in Wireless Infrastructure
Timing & Synchronization in Wireless Infrastructure Harpinder Singh Matharu Senior Product Manager, Comms Division, ilinx 11/4/2010 Copyright 2010 ilinx Topics Wireless Infrastructure Market Trends Timing
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationDifferentiation in Flexible Data Center Interconnect
Differentiation in Flexible Data Center Interconnect Industry Leadership Momentum Founded 1984 20,000 customers $238B FY15 revenue 3,500 employees world wide >55% Market segment share 3500+ Patents 60
More informationZynq Ultrascale+ Architecture
Zynq Ultrascale+ Architecture Stephanie Soldavini and Andrew Ramsey CMPE-550 Dec 2017 Soldavini, Ramsey (CMPE-550) Zynq Ultrascale+ Architecture Dec 2017 1 / 17 Agenda Heterogeneous Computing Zynq Ultrascale+
More informationWhite Paper Using Cyclone III FPGAs for Emerging Wireless Applications
White Paper Introduction Emerging wireless applications such as remote radio heads, pico/femto base stations, WiMAX customer premises equipment (CPE), and software defined radio (SDR) have stringent power
More informationBuilding blocks for 64-bit Systems Development of System IP in ARM
Building blocks for 64-bit Systems Development of System IP in ARM Research seminar @ University of York January 2015 Stuart Kenny stuart.kenny@arm.com 1 2 64-bit Mobile Devices The Mobile Consumer Expects
More informationSimplifying FPGA Design with A Novel Network-on-Chip Architecture
Simplifying FPGA Design with A Novel Network-on-Chip Architecture ABSTRACT John Malsbury Ettus Research 1043 N Shoreline Blvd Suite 100 +1 (650) 967-2870 john.malsbury@ettus.com As wireless communications
More informationDeep Learning Accelerators
Deep Learning Accelerators Abhishek Srivastava (as29) Samarth Kulshreshtha (samarth5) University of Illinois, Urbana-Champaign Submitted as a requirement for CS 433 graduate student project Outline Introduction
More informationNVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM. Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive)
NVIDIA'S DEEP LEARNING ACCELERATOR MEETS SIFIVE'S FREEDOM PLATFORM Frans Sijstermans (NVIDIA) & Yunsup Lee (SiFive) NVDLA NVIDIA DEEP LEARNING ACCELERATOR IP Core for deep learning part of NVIDIA s Xavier
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationXMC-RFSOC-A. XMC Module Xilinx Zynq UltraScale+ RFSOC. Overview. Key Features. Typical Applications. Advanced Information Subject To Change
Advanced Information Subject To Change XMC-RFSOC-A XMC Module Xilinx Zynq UltraScale+ RFSOC Overview PanaTeQ s XMC-RFSOC-A is a XMC module based on the Zynq UltraScale+ RFSoC device from Xilinx. The Zynq
More informationAn introduction to Machine Learning silicon
An introduction to Machine Learning silicon November 28 2017 Insight for Technology Investors AI/ML terminology Artificial Intelligence Machine Learning Deep Learning Algorithms: CNNs, RNNs, etc. Additional
More informationHot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect Gaurav Singh Sagheer Ahmad, Ralph Wittig, Millind
Hot Chips 2017 Xilinx 16nm Datacenter Device Family with In-Package HBM and Interconnect Gaurav Singh Sagheer Ahmad, Ralph Wittig, Millind Mittal, Ygal Arbel, Arun VR, Suresh Ramalingam, Kiran Puranik,
More informationEnabling Technology for the Cloud and AI One Size Fits All?
Enabling Technology for the Cloud and AI One Size Fits All? Tim Horel Collaborate. Differentiate. Win. DIRECTOR, FIELD APPLICATIONS The Growing Cloud Global IP Traffic Growth 40B+ devices with intelligence
More informationArm s First-Generation Machine Learning Processor
Arm s First-Generation Machine Learning Processor Ian Bratt 2018 Arm Limited Introducing the Arm Machine Learning (ML) Processor Optimized ground-up architecture for machine learning processing Massive
More informationFINN: A Framework for Fast, Scalable Binarized Neural Network Inference
FINN: A Framework for Fast, Scalable Binarized Neural Network Inference Yaman Umuroglu (XIR & NTNU), Nick Fraser (XIR & USydney), Giulio Gambardella (XIR), Michaela Blott (XIR), Philip Leong (USydney),
More informationCost-Optimized Backgrounder
Cost-Optimized Backgrounder A Cost-Optimized FPGA & SoC Portfolio for Part or All of Your System Optimizing a system for cost requires analysis of every silicon device on the board, particularly the high
More informationHow to validate your FPGA design using realworld
How to validate your FPGA design using realworld stimuli Daniel Clapham National Instruments ni.com Agenda Typical FPGA Design NIs approach to FPGA Brief intro into platform based approach RIO architecture
More informationEnabling FPGAs in Hyperscale Data Centers
J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical
More informationSDACCEL DEVELOPMENT ENVIRONMENT. The Xilinx SDAccel Development Environment. Bringing The Best Performance/Watt to the Data Center
SDAccel Environment The Xilinx SDAccel Development Environment Bringing The Best Performance/Watt to the Data Center Introduction Data center operators constantly seek more server performance. Currently
More informationFast Hardware For AI
Fast Hardware For AI Karl Freund karl@moorinsightsstrategy.com Sr. Analyst, AI and HPC Moor Insights & Strategy Follow my blogs covering Machine Learning Hardware on Forbes: http://www.forbes.com/sites/moorinsights
More informationOptimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.
Optimizing HW/SW Partition of a Complex Embedded Systems Simon George November 2015 Zynq-7000 All Programmable SoC HP ACP GP Page 2 Zynq UltraScale+ MPSoC Page 3 HW/SW Optimization Challenges application()
More informationAdaptable Computing The Future of FPGA Acceleration. Dan Gibbons, VP Software Development June 6, 2018
Adaptable Computing The Future of FPGA Acceleration Dan Gibbons, VP Software Development June 6, 2018 Adaptable Accelerated Computing Page 2 Three Big Trends The Evolution of Computing Trend to Heterogeneous
More informationAccelerating Data Center Workloads with FPGAs
Accelerating Data Center Workloads with FPGAs Enno Lübbers NorCAS 2017, Linköping, Sweden Intel technologies features and benefits depend on system configuration and may require enabled hardware, software
More informationNext Generation Enterprise Solutions from ARM
Next Generation Enterprise Solutions from ARM Ian Forsyth Director Product Marketing Enterprise and Infrastructure Applications Processor Product Line Ian.forsyth@arm.com 1 Enterprise Trends IT is the
More informationNutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET
Nutaq μdigitizer FPGA-based, multichannel, high-speed DAQ solutions PRODUCT SHEET QUEBEC I MONTREAL I NEW YORK I nutaq.com Nutaq μdigitizer Supports a wide variety of high-speed I/O A/D and D/A converters
More informationGRVI Phalanx. A Massively Parallel RISC-V FPGA Accelerator Accelerator. Jan Gray
GRVI Phalanx A Massively Parallel RISC-V FPGA Accelerator Accelerator Jan Gray jan@fpga.org Introduction FPGA accelerators are hot MSR Catapult. Intel += Altera. OpenPOWER + Xilinx FPGAs as computers Massively
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationSODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou
SODA: Stencil with Optimized Dataflow Architecture Yuze Chi, Jason Cong, Peng Wei, Peipei Zhou University of California, Los Angeles 1 What is stencil computation? 2 What is Stencil Computation? A sliding
More informationNew Software-Designed Instruments
1 New Software-Designed Instruments Nicholas Haripersad Field Applications Engineer National Instruments South Africa Agenda What Is a Software-Designed Instrument? Why Software-Designed Instrumentation?
More informationTowards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA
Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA Junzhong Shen, You Huang, Zelong Wang, Yuran Qiao, Mei Wen, Chunyuan Zhang National University of Defense Technology,
More informationNew! New! New! New! New!
New! New! New! New! New! Model 5950 Features Supports Xilinx Zynq UltraScale+ RFSoC FPGAs 18 GB of DDR4 SDRAM On-board GPS receiver PCI Express (Gen. 1, 2 and 3) interface up to x8 LVDS connections to
More informationEttus Research Update
Ettus Research Update Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 Recent New Products 3 Third Generation Introduction Who am I? Core GNU Radio contributor since 2001 Designed
More informationSimplifying FPGA Design for SDR with a Network on Chip Architecture
Simplifying FPGA Design for SDR with a Network on Chip Architecture Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 RF NoC 3 Status and Conclusions USRP FPGA Capability Gen
More informationRISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi
Power Matters. TM RISC-V based core as a soft processor in FPGAs Chowdhary Musunuri Sr. Director, Solutions & Applications Microsemi chowdhary.musunuri@microsemi.com RIC217 1 Agenda A brief introduction
More information5G-PICTURE Project: Converged 5G
5G-PICTURE Project: Converged 5G Fronthaul/ Backhaul Infrastructure based on Dis-Aggregated RAN Presenter: Ioanna Mesogiti Senior R&D Engineer, MBA, MSc COSMOTE - Mobile Telecommunications S.A. R&D Projects
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationKeyStone C66x Multicore SoC Overview. Dec, 2011
KeyStone C66x Multicore SoC Overview Dec, 011 Outline Multicore Challenge KeyStone Architecture Reminder About KeyStone Solution Challenge Before KeyStone Multicore performance degradation Lack of efficient
More informationThird Genera+on USRP Devices and the RF Network- On- Chip. Leif Johansson Market Development RF, Comm and SDR
Third Genera+on USRP Devices and the RF Network- On- Chip Leif Johansson Market Development RF, Comm and SDR About Ettus Research Leader in soeware defined radio and signals intelligence Maker of USRP
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationExtending the Power of FPGAs
Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with
More informationScalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA
Scalable and Modularized RTL Compilation of Convolutional Neural Networks onto FPGA Yufei Ma, Naveen Suda, Yu Cao, Jae-sun Seo, Sarma Vrudhula School of Electrical, Computer and Energy Engineering School
More informationExtending the Power of FPGAs to Software Developers:
Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationBidirectional 10&40 km Optical PHY for 50GbE. Xinyuan Wang Huawei Technologies
Bidirectional 10&40 km Optical PHY for 50GbE Xinyuan Wang Huawei Technologies Background In IEEE 802 March plenary meeting, Call For Interest Bidirectional 10Gb/s and 25Gb/s optical access PHYs is accepted
More informationEnabling Technologies for Next Generation Wireless Systems
Enabling Technologies for Next Generation Wireless Systems Dr. Amitava Ghosh Nokia Fellow Nokia Bell Labs 10 th March, 2016 1 Nokia 2015 Heterogeneous use cases diverse requirements >10 Gbps peak data
More informationDeliver Femtocell Basestations Quickly and Cost Effectively with Altera HardCopy Structured ASICs Altera Corporation Public
Deliver Femtocell Basestations Quickly and Cost Effectively with HardCopy Structured ASICs Agenda Femtocell overview Benefits and challenges Enabling low-cost, low-risk, fast time-to-market femtocell designs
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationThe WINLAB Cognitive Radio Platform
The WINLAB Cognitive Radio Platform IAB Meeting, Fall 2007 Rutgers, The State University of New Jersey Ivan Seskar Software Defined Radio/ Cognitive Radio Terminology Software Defined Radio (SDR) is any
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationA Closer Look at the Epiphany IV 28nm 64 core Coprocessor. Andreas Olofsson PEGPUM 2013
A Closer Look at the Epiphany IV 28nm 64 core Coprocessor Andreas Olofsson PEGPUM 2013 1 Adapteva Achieves 3 World Firsts 1. First processor company to reach 50 GFLOPS/W 3. First semiconductor company
More informationScalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA
Scalable and Dynamically Updatable Lookup Engine for Decision-trees on FPGA Yun R. Qu, Viktor K. Prasanna Ming Hsieh Dept. of Electrical Engineering University of Southern California Los Angeles, CA 90089
More informationExploring OpenCL Memory Throughput on the Zynq
Exploring OpenCL Memory Throughput on the Zynq Technical Report no. 2016:04, ISSN 1652-926X Chalmers University of Technology Bo Joel Svensson bo.joel.svensson@gmail.com Abstract The Zynq platform combines
More informationGRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens. Jan Gray
If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens? Seymour Cray GRVI Phalanx Update: Plowing the Cloud with Thousands of RISC-V Chickens Jan Gray jan@fpga.org http://fpga.org
More informationCognitive Radio Platform Research at WINLAB
Cognitive Radio Platform Research at WINLAB December 2, 2010 Zoran Miljanic and Ivan Seskar WINLAB Rutgers University www.winlab.rutgers.edu 1 WiNC2R objectives Programmable processing of phy and higher
More informationMassively Parallel Processor Breadboarding (MPPB)
Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,
More informationDr. Evaldas Stankevičius, Regulatory and Security Expert.
2018-08-23 Dr. Evaldas Stankevičius, Regulatory and Security Expert Email: evaldas.stankevicius@tele2.com 1G: purely analog system. 2G: voice and SMS. 3G: packet switching communication. 4G: enhanced mobile
More informationWhite Paper. The advantages of using a combination of DSP s and FPGA s. Version: 1.0. Author: Louis N. Bélanger. Date: May, 2004.
White Paper The advantages of using a combination of DSP s and FPGA s Version: 1.0 Author: Louis N. Bélanger Date: May, 2004 Lyrtech Inc The advantages of using a combination of DSP s and FPGA s DSP and
More informationThe Many Dimensions of SDR Hardware
The Many Dimensions of SDR Hardware Plotting a Course for the Hardware Behind the Software Sept 2017 John Orlando Epiq Solutions LO RFIC Epiq Solutions in a Nutshell Schaumburg, IL EST 2009 N. Virginia
More informationTowards 5G RAN Virtualization Enabled by Intel and ASTRI*
white paper Communications Service Providers C-RAN Towards 5G RAN Virtualization Enabled by Intel and ASTRI* ASTRI* has developed a flexible, scalable, and high-performance virtualized C-RAN solution to
More informationBrainchip OCTOBER
Brainchip OCTOBER 2017 1 Agenda Neuromorphic computing background Akida Neuromorphic System-on-Chip (NSoC) Brainchip OCTOBER 2017 2 Neuromorphic Computing Background Brainchip OCTOBER 2017 3 A Brief History
More informationITU Arab Forum on Future Networks: "Broadband Networks in the Era of App Economy", Tunis - Tunisia, Feb. 2017
On the ROAD to 5G Ines Jedidi Network Products, Ericsson Maghreb ITU Arab Forum on Future Networks: "Broadband Networks in the Era of App Economy", Tunis - Tunisia, 21-22 Feb. 2017 agenda Why 5G? What
More informationAn 80-core GRVI Phalanx Overlay on PYNQ-Z1:
An 80-core GRVI Phalanx Overlay on PYNQ-Z1: Pynq as a High Productivity Platform For FPGA Design and Exploration Jan Gray jan@fpga.org http://fpga.org/grvi-phalanx FCCM 2017 05/03/2017 Pynq Workshop My
More information借助 SDSoC 快速開發複雜的嵌入式應用
借助 SDSoC 快速開發複雜的嵌入式應用 May 2017 What Is C/C++ Development System-level Profiling SoC application-like programming Tools and IP for system-level profiling Specify C/C++ Functions for Acceleration Full System
More informationCommunication Systems Design in Practice
Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99 '00 '01 '02 03 04 '05 '06 '07 '08 '09 '10 '11 '12 '13
More informationThe importance of RAN to Core validation as networks evolve to support 5G
The importance of RAN to Core validation as networks evolve to support 5G Stephen Hire Vice President Asia Pacific Cobham Wireless October 2017 Commercial in Confidence Cobham Wireless The industry standard
More informationDid I Just Do That on a Bunch of FPGAs?
Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure
More informationTile Processor (TILEPro64)
Tile Processor Case Study of Contemporary Multicore Fall 2010 Agarwal 6.173 1 Tile Processor (TILEPro64) Performance # of cores On-chip cache (MB) Cache coherency Operations (16/32-bit BOPS) On chip bandwidth
More informationAlgorithm-Architecture Co- Design for Efficient SDR Signal Processing
Algorithm-Architecture Co- Design for Efficient SDR Signal Processing Min Li, limin@imec.be Wireless Research, IMEC Introduction SDR Baseband Platforms Today are Usually Based on ILP + DLP + MP Massive
More informationRapidIO.org Update.
RapidIO.org Update rickoco@rapidio.org June 2015 2015 RapidIO.org 1 Outline RapidIO Overview Benefits Interconnect Comparison Ecosystem System Challenges RapidIO Markets Data Center & HPC Communications
More informationCommunication Systems Design in Practice
Communication Systems Design in Practice Jacob Kornerup, Ph.D. LabVIEW R&D National Instruments A Word About National Instruments Annual Revenue: $1.14 billion Global Operations: Approximately 6,870 employees;
More informationHigh Performance Embedded Applications. Raja Pillai Applications Engineering Specialist
High Performance Embedded Applications Raja Pillai Applications Engineering Specialist Agenda What is High Performance Embedded? NI s History in HPE FlexRIO Overview System architecture Adapter modules
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More informationIn-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System
In-chip and Inter-chip Interconnections and data transportations for Future MPAR Digital Receiving System A presentation for LMCO-MPAR project 2007 briefing Dr. Yan Zhang School of Electrical and Computer
More informationSimulation, prototyping and verification of standards-based wireless communications
Simulation, prototyping and verification of standards-based wireless communications Colin McGuire, Neil MacEwen 2015 The MathWorks, Inc. 1 Real Time LTE Cell Scanner with MATLAB and Simulink 2 Real time
More informationFlexRIO. FPGAs Bringing Custom Functionality to Instruments. Ravichandran Raghavan Technical Marketing Engineer. ni.com
FlexRIO FPGAs Bringing Custom Functionality to Instruments Ravichandran Raghavan Technical Marketing Engineer Electrical Test Today Acquire, Transfer, Post-Process Paradigm Fixed- Functionality Triggers
More informationUnified Deep Learning with CPU, GPU, and FPGA Technologies
Unified Deep Learning with CPU, GPU, and FPGA Technologies Allen Rush 1, Ashish Sirasao 2, Mike Ignatowski 1 1: Advanced Micro Devices, Inc., 2: Xilinx, Inc. Abstract Deep learning and complex machine
More informationAccelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs
Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs Ritchie Zhao 1, Weinan Song 2, Wentao Zhang 2, Tianwei Xing 3, Jeng-Hau Lin 4, Mani Srivastava 3, Rajesh Gupta 4, Zhiru
More information