FPGA-based Evaluation Platform for Disaggregated Computing
|
|
- Angela Gaines
- 5 years ago
- Views:
Transcription
1 This project has received funding from the European Union s Horizon 2020 research and innovation programme under grant agreement No FPGA-based Evaluation Platform for Disaggregated Computing Dimitris Theodoropoulos, Nikolaos Alachiotis, and Dionisis Pnevmatikatos 1
2 Outline Introduction (motivation, resource disaggregation, dedbox) Evaluation platform (architecture overview, software support) Hardware prototype (multiple interconnected FPGAs) Experimental evaluation (matrix multiply test case) Conclusions
3 Current server infrastructure Server trays typically comprise compute, memory, and acceleration resources Intra-tray network ACC ACC ACC ACC IBM IBM
4 Current server infrastructure Server trays typically comprise compute, memory, and acceleration resources Efficient SW communication ow power consumption ACC Intra-tray network ACC ACC ACC esource proportionality fixed at design time educed granularity when allocating resources to virtual machines IBM IBM
5 esource allocation example VM 1: 3 Units, 1 VM 2: 2 Units, 1 VM 3: 3 Units, 1 Server 1 Server 2 Server 3
6 esource allocation example VM 1: 3 Units, 1 VM 2: 2 Units, 1 VM 3: 3 Units, 1 Memory utilization: 66% utilization: 50% Server 1 Server 2 Server 3
7 esource allocation example VM 1: 3 Units, 1 VM 2: 2 Units, 1 VM 3: 3 Units, 1 VM 4: 4 Units, 3 s (IT DOES NOT FIT) Memory utilization: 66% utilization: 50% Server 1 Server 2 Server 3
8 esource disaggregation Server-centric approach Intra-tray Intra-tray network Intra-tray network ACC ACC ACC network ACC Intra-tray ACC ACC ACC network ACC ACC ACC ACC ACC ACC ACC ACC ACC
9 esource disaggregation Server-centric approach Intra-tray Intra-tray network Intra-tray network ACC ACC ACC network ACC Intra-tray ACC ACC ACC network ACC ACC ACC ACC ACC ACC ACC ACC ACC Intra-tray network esource-centric approach ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC ACC
10 esource allocation example VM 1: 3 Units, 1 VM 2: 2 Units, 1 VM 3: 3 Units, 1 VM 4: 4 Units, 3 s (IT DOES NOT FIT) Memory utilization: 66% utilization: 50% Server 1 Server 2 Server 3
11 esource allocation example VM 1: 3 Units, 1 VM 2: 2 Units, 1 VM 3: 3 Units, 1 VM 4: 4 Units, 3 s (IT FITS) Memory utilization: 66% Server 1 utilization: 50% Server 2 Server 3 Memory utilization: 100% utilization: 100%
12 The dedbox Architecture Paradigm shift: Mainboard-as-a-unit disaggregated function-block-as-a-unit
13 Evaluation Platform
14 Contribution We provide a tool to write and optimize code for disaggregated environments through enabling runs on real hardware. This facilitates the exploration of tradeoffs between the overhead for remote data accesses and the effect of caches, either locally or remotely, for real-world execution scenarios.
15 Platform architecture low-latency interconnect Three types of resources, referred to as blocks mblock Master-worker scheme, with a serving as the master and memory blocks and/or accelerators being the workers. off-chip memory PHY off-chip memory PHY PU DMA PU DMA I/C ACC off-chip memory... PHY PU DMA Accel. Accel. ablock
16 Platform architecture: low-latency interconnect Provides general-purpose processing capacity mblock PU + memory instructions/data off-chip memory PHY PU DMA off-chip memory PHY PU DMA DMA + PHY pairs remote access to multiple workers off-chip memory PHY PU DMA I/C ACC... Accel. Accel. ablock
17 Platform architecture: mblock low-latency interconnect Provides memory resources mblock PU manages access to local memories enables computations near memory DMA + PHY pair facilitates remote data transfers off-chip memory PHY off-chip memory PHY PU DMA PU DMA I/C ACC off-chip memory... PHY PU DMA Accel. Accel. ablock
18 Platform architecture: ablock low-latency interconnect Provides the infrastructure to deploy custom hardware accelerators mblock PU controls the accelerators facilitates communication with the master off-chip memory PHY PU DMA off-chip memory PHY PU DMA Interconnect Quick accelerator-based datapath construction off-chip memory PHY PU DMA I/C... Accel. Accel. ablock
19 Software stack for inter-block communication application SW user application code SW API remote memory allocation, memcpy from / to remote blocks, accelerator start / poll, local / remote block debug -specific firmware user task code mblock-specific firmware user task code ablock-specific firmware PHY layer block-to-block data transfers The Physical ayer - PHY IP module (Aurora IP)
20 Software stack for inter-block communication application SW user application code SW API remote memory allocation, memcpy from / to remote blocks, accelerator start / poll, local / remote block debug -specific firmware user task code mblock-specific firmware user task code ablock-specific firmware Block ayer - Block-specific firmware for initialization, testing, debugging - mblock and ablock task code execution PHY layer block-to-block data transfers The Physical ayer - PHY IP module (Aurora IP)
21 Software stack for inter-block communication application SW user application code SW API remote memory allocation, memcpy from / to remote blocks, accelerator start / poll, local / remote block debug API ayer - Platform s API exposed to the user (Drives operations on the Block ayer) -specific firmware user task code mblock-specific firmware user task code ablock-specific firmware Block ayer - Block-specific firmware for initialization, testing, debugging - mblock and ablock task code execution PHY layer block-to-block data transfers The Physical ayer - PHY IP module (Aurora IP)
22 Software stack for inter-block communication application SW user application code User ayer - API-based user code SW API remote memory allocation, memcpy from / to remote blocks, accelerator start / poll, local / remote block debug API ayer - Platform s API exposed to the user (Drives operations on the Block ayer) -specific firmware user task code mblock-specific firmware user task code ablock-specific firmware Block ayer - Block-specific firmware for initialization, testing, debugging - mblock and ablock task code execution PHY layer block-to-block data transfers The Physical ayer - PHY IP module (Aurora IP)
23 Inter-block communication flow xblock = mblock or ablock xblock alloc, size, type ACK, addr, id read, id, idx ACK, data xblock write, id, idx, data ACK xblock 1 allocatev 2 readv 3 writev xblock toblock, id ready data xblock fromblock, id ACK, data xblock taskproc, tid, in ids, out ids ACK xblock taskacc, acid, in ids, out ids ACK ACK memcpy memcpy nearmemory ToBlock FromBlock TaskProcess nearmemory TaskAccel
24 Hardware Prototype
25 Hardware prototype Three-block prototype: one SOC with reconfig. logic per block type Xilinx ZC706 boards - Zynq 7045 MP: 2 AM Cortex A9 s and Kintex 7 prog. logic MHz Si570 SMA connectors + SMA bridge physical cables - AF14 AG14 AD18 AD19 Zynq7045 PS7 ACP MGP0 peripheral interconnect QP W8 W7 SMA connectors DMA engine SFP Aurora 64B66B SFP memory interconnect DMA engine SMA Aurora 64B66B SMA zc706 ablock mblock
26 Hardware prototype The Processing Unit on the is the master. Connected to local and memory interconnect through the ACP (Accelerator Coherency Port) MHz Si570 SMA connectors + SMA bridge physical cables - AF14 AG14 AD18 AD19 Zynq7045 PS7 ACP MGP0 peripheral interconnect QP W8 W7 SMA connectors DMA engine SFP Aurora 64B66B SFP memory interconnect DMA engine SMA Aurora 64B66B SMA zc706 ablock mblock
27 Hardware prototype The Processing Unit on the is the master. Connected to local and memory interconnect through the ACP Cache-coherent remote data transfers MHz Si570 SMA connectors + SMA bridge physical cables - AF14 AG14 AD18 AD19 Zynq7045 PS7 ACP MGP0 peripheral interconnect QP W8 W7 SMA connectors DMA engine SFP Aurora 64B66B SFP ablock memory interconnect DMA engine SMA Aurora 64B66B SMA zc706 mblock
28 Experimental Evaluation
29 Experimental evaluation Matrix multiplication (HS-based IP by Xilinx for acceleration, mmult) Evaluation of four different execution scenarios (application mappings) PS
30 Experimental evaluation Matrix multiplication (HS-based IP by Xilinx for acceleration, mmult) Evaluation of four different execution scenarios (application mappings) SMA IF mblock PS PS-
31 Experimental evaluation Matrix multiplication (HS-based IP by Xilinx for acceleration, mmult) Evaluation of four different execution scenarios (application mappings) SMA IF SMA IF mblock mblock PS PS- PS-NEA-
32 Experimental evaluation Matrix multiplication (HS-based IP by Xilinx for acceleration, mmult) Evaluation of four different execution scenarios (application mappings) SMA IF SMA IF SFP IF mmult mblock mblock ablock PS PS- PS-NEA- PS-ACCE
33 Experimental evaluation Matrix multiplication (HS-based IP by Xilinx for acceleration, mmult) Evaluation of four different execution scenarios (application mappings) SMA IF SMA IF SFP IF mmult mblock mblock ablock PS PS- PS-NEA- PS-ACCE
34 Experimental evaluation: PS- code excerpt Declaration SMA IF Allocation mblock PS- Initialization Execution
35 Experimental evaluation: PS- code excerpt SMA IF emote memory access mblock PS-
36 Speedup Performance comparison: speedups eference PS 1: 32KB 2: 512KB Matrix size
37 Speedup Performance comparison: speedups eference SMA IF PS 1: 32KB 2: 512KB mblock PS- Matrix size PS outperforms PS- in all cases due to sufficient capacity of the
38 Speedup Performance comparison: speedups eference SMA IF cache size sufficiently large to fit data PS mblock PS-NEA- 1: 32KB 2: 512KB Matrix size
39 Speedup Performance comparison: speedups eference SMA IF Better cache utilization on the mblock, hides the transfer overhead due to ACP PS 1: 32KB 2: 512KB Matrix size mblock PS-NEA-
40 Speedup Performance comparison: speedups eference SMA IF Excessive memory requirements + transfer overhead PS mblock PS-NEA- 1: 32KB 2: 512KB Matrix size
41 Speedup Performance comparison: speedups eference SFP IF Transfer overhead comparable with computation mmult PS 1: 32KB 2: 512KB Matrix size ablock PS-ACCE
42 Speedup Performance comparison: speedups eference SFP IF Favorable computation to communication ratio mmult PS 1: 32KB 2: 512KB Matrix size ablock PS-ACCE
43 Conclusions Presented an FPGA-based evaluation platform for code preparation and optimization for disaggregated environments. - Multiple FPGA boards assume different roles, e.g., compute, memory, and acceleration - Software support and user-friendly API eliminate deployment/adoption overhead Facilitates the exploration of tradeoffs between the overhead for remote data accesses and the effect of caches, either locally or remotely, for real-world execution scenarios.
44 Future work Add functionality and flexibility to the software stack - Support for task-based execution to facilitate parallelism - Support for library-based legacy codes Augment acceleration capabilities and flexibility - Partial reconfiguration - Task offloading to processors near accelerators for better performance
45 Thank you!
S2C K7 Prodigy Logic Module Series
S2C K7 Prodigy Logic Module Series Low-Cost Fifth Generation Rapid FPGA-based Prototyping Hardware The S2C K7 Prodigy Logic Module is equipped with one Xilinx Kintex-7 XC7K410T or XC7K325T FPGA device
More informationCopyright 2016 Xilinx
Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building
More informationMaximizing heterogeneous system performance with ARM interconnect and CCIX
Maximizing heterogeneous system performance with ARM interconnect and CCIX Neil Parris, Director of product marketing Systems and software group, ARM Teratec June 2017 Intelligent flexible cloud to enable
More informationZynq-7000 All Programmable SoC Product Overview
Zynq-7000 All Programmable SoC Product Overview The SW, HW and IO Programmable Platform August 2012 Copyright 2012 2009 Xilinx Introducing the Zynq -7000 All Programmable SoC Breakthrough Processing Platform
More informationECE 5775 (Fall 17) High-Level Digital Design Automation. Hardware-Software Co-Design
ECE 5775 (Fall 17) High-Level Digital Design Automation Hardware-Software Co-Design Announcements Midterm graded You can view your exams during TA office hours (Fri/Wed 11am-noon, Rhodes 312) Second paper
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More information«Real Time Embedded systems» Multi Masters Systems
«Real Time Embedded systems» Multi Masters Systems rene.beuchat@epfl.ch LAP/ISIM/IC/EPFL Chargé de cours rene.beuchat@hesge.ch LSN/hepia Prof. HES 1 Multi Master on Chip On a System On Chip, Master can
More informationAn Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware
An Architectural Framework for Accelerating Dynamic Parallel Algorithms on Reconfigurable Hardware Tao Chen, Shreesha Srinath Christopher Batten, G. Edward Suh Computer Systems Laboratory School of Electrical
More informationAchieving UFS Host Throughput For System Performance
Achieving UFS Host Throughput For System Performance Yifei-Liu CAE Manager, Synopsys Mobile Forum 2013 Copyright 2013 Synopsys Agenda UFS Throughput Considerations to Meet Performance Objectives UFS Host
More informationEnabling FPGAs in Hyperscale Data Centers
J. Weerasinghe; IEEE CBDCom 215, Beijing; 13 th August 215 Enabling s in Hyperscale Data Centers J. Weerasinghe 1, F. Abel 1, C. Hagleitner 1, A. Herkersdorf 2 1 IBM Research Zurich Laboratory 2 Technical
More informationCopyright 2014 Xilinx
IP Integrator and Embedded System Design Flow Zynq Vivado 2014.2 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able
More informationFPGA Entering the Era of the All Programmable SoC
FPGA Entering the Era of the All Programmable SoC Ivo Bolsens, Senior Vice President & CTO Page 1 Moore s Law: The Technology Pipeline Page 2 Industry Debates on Cost Page 3 Design Cost Estimated Chip
More informationOptimizing HW/SW Partition of a Complex Embedded Systems. Simon George November 2015.
Optimizing HW/SW Partition of a Complex Embedded Systems Simon George November 2015 Zynq-7000 All Programmable SoC HP ACP GP Page 2 Zynq UltraScale+ MPSoC Page 3 HW/SW Optimization Challenges application()
More informationBuilding High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink. Robert Kaye
Building High Performance, Power Efficient Cortex and Mali systems with ARM CoreLink Robert Kaye 1 Agenda Once upon a time ARM designed systems Compute trends Bringing it all together with CoreLink 400
More informationOPERA. Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications
OPERA Low Power Heterogeneous Architecture for the Next Generation of Smart Infrastructure and Platforms in Industrial and Societal Applications Co-funded by the Horizon 2020 Framework Programme of the
More informationSimplify System Complexity
1 2 Simplify System Complexity With the new high-performance CompactRIO controller Arun Veeramani Senior Program Manager National Instruments NI CompactRIO The Worlds Only Software Designed Controller
More informationPartitioning of computationally intensive tasks between FPGA and CPUs
Partitioning of computationally intensive tasks between FPGA and CPUs Tobias Welti, MSc (Author) Institute of Embedded Systems Zurich University of Applied Sciences Winterthur, Switzerland tobias.welti@zhaw.ch
More informationEttus Research Update
Ettus Research Update Matt Ettus Ettus Research GRCon13 Outline 1 Introduction 2 Recent New Products 3 Third Generation Introduction Who am I? Core GNU Radio contributor since 2001 Designed
More informationExtending the Power of FPGAs
Extending the Power of FPGAs The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Development Agenda The Evolution of FPGAs and FPGA Programming IP-Centric Design with
More informationPactron FPGA Accelerated Computing Solutions
Pactron FPGA Accelerated Computing Solutions Intel Xeon + Altera FPGA 2015 Pactron HJPC Corporation 1 Motivation for Accelerators Enhanced Performance: Accelerators compliment CPU cores to meet market
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationSimplify System Complexity
Simplify System Complexity With the new high-performance CompactRIO controller Fanie Coetzer Field Sales Engineer Northern South Africa 2 3 New control system CompactPCI MMI/Sequencing/Logging FieldPoint
More informationHow Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC
How Might Recently Formed System Interconnect Consortia Affect PM? Doug Voigt, SNIA TC Three Consortia Formed in Oct 2016 Gen-Z Open CAPI CCIX complex to rack scale memory fabric Cache coherent accelerator
More informationExploration of Cache Coherent CPU- FPGA Heterogeneous System
Exploration of Cache Coherent CPU- FPGA Heterogeneous System Wei Zhang Department of Electronic and Computer Engineering Hong Kong University of Science and Technology 1 Outline ointroduction to FPGA-based
More informationSignal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage ECE Temple University
Signal Processing Algorithms into Fixed Point FPGA Hardware Dennis Silage silage@temple.edu ECE Temple University www.temple.edu/scdl Signal Processing Algorithms into Fixed Point FPGA Hardware Motivation
More informationEmbedded HW/SW Co-Development
Embedded HW/SW Co-Development It May be Driven by the Hardware Stupid! Frank Schirrmeister EDPS 2013 Monterey April 18th SPMI USB 2.0 SLIMbus RFFE LPDDR 2 LPDDR 3 emmc 4.5 UFS SD 3.0 SD 4.0 UFS Bare Metal
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationA So%ware Developer's Journey into a Deeply Heterogeneous World. Tomas Evensen, CTO Embedded So%ware, Xilinx
A So%ware Developer's Journey into a Deeply Heterogeneous World Tomas Evensen, CTO Embedded So%ware, Xilinx Embedded Development: Then Simple single CPU Most code developed internally 10 s of thousands
More informationMATLAB/Simulink 기반의프로그래머블 SoC 설계및검증
MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 이웅재부장 Application Engineering Group 2014 The MathWorks, Inc. 1 Agenda Introduction ZYNQ Design Process Model-Based Design Workflow Prototyping and Verification Processor
More informationNear Memory Key/Value Lookup Acceleration MemSys 2017
Near Key/Value Lookup Acceleration MemSys 2017 October 3, 2017 Scott Lloyd, Maya Gokhale Center for Applied Scientific Computing This work was performed under the auspices of the U.S. Department of Energy
More informationImplementing Long-term Recurrent Convolutional Network Using HLS on POWER System
Implementing Long-term Recurrent Convolutional Network Using HLS on POWER System Xiaofan Zhang1, Mohamed El Hadedy1, Wen-mei Hwu1, Nam Sung Kim1, Jinjun Xiong2, Deming Chen1 1 University of Illinois Urbana-Champaign
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationDid I Just Do That on a Bunch of FPGAs?
Did I Just Do That on a Bunch of FPGAs? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto About the Talk Title It s the measure
More informationOptimised OpenCL Workgroup Synthesis for Hybrid ARM-FPGA Devices
Optimised OpenCL Workgroup Synthesis for Hybrid ARM-FPGA Devices Mohammad Hosseinabady and Jose Luis Nunez-Yanez Department of Electrical and Electronic Engineering University of Bristol, UK. Email: {m.hosseinabady,
More informationHEAD HardwarE Accelerated Deduplication
HEAD HardwarE Accelerated Deduplication Final Report CS710 Computing Acceleration with FPGA December 9, 2016 Insu Jang Seikwon Kim Seonyoung Lee Executive Summary A-Z development of deduplication SW version
More informationSoC Platforms and CPU Cores
SoC Platforms and CPU Cores COE838: Systems on Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University
More informationA Reconfigurable Computing System Based on a Cache-Coherent Fabric
A Reconfigurable Computing System Based on a Cache-Coherent Fabric Presenter: Neal Oliver Intel Corporation June 10, 2012 Authors- Neal Oliver, Rahul R Sharma, Stephen Chang, Bhushan Chitlur, Elkin Garcia,
More informationExtending the Power of FPGAs to Software Developers:
Extending the Power of FPGAs to Software Developers: The Journey has Begun Salil Raje Xilinx Corporate Vice President Software and IP Products Group Page 1 Agenda The Evolution of FPGAs and FPGA Programming
More informationHigh Bandwidth Electronics
DOE BES Neutron & Photon Detectors Workshop, August 1-3, 2012 Ryan Herbst System Overview What are the standard components in a detector system? Detector/Amplifier & ADC Digital front end - Configure and
More information6.9. Communicating to the Outside World: Cluster Networking
6.9 Communicating to the Outside World: Cluster Networking This online section describes the networking hardware and software used to connect the nodes of cluster together. As there are whole books and
More informationEnyx soft-hardware design services and development framework for FPGA & SoC
soft-hardware design services and development framework for FPGA & SoC Smart NIC Smart Switch Your custom hardware hardware acceleration experts 3rd party IP Cores AXI ARM DMA CPU Your own soft-hardware
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationSoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator
SoC Systeme ultra-schnell entwickeln mit Vivado und Visual System Integrator FPGA Kongress München 2017 Martin Heimlicher Enclustra GmbH Agenda 2 What is Visual System Integrator? Introduction Platform
More informationRiceNIC. Prototyping Network Interfaces. Jeffrey Shafer Scott Rixner
RiceNIC Prototyping Network Interfaces Jeffrey Shafer Scott Rixner RiceNIC Overview Gigabit Ethernet Network Interface Card RiceNIC - Prototyping Network Interfaces 2 RiceNIC Overview Reconfigurable and
More informationD4.5 OS Allocation and Migration Support Final Release
Project Number 611411 D4.5 OS Allocation and Migration Support Final Release Version 1.01 Final Public Distribution University of York, aicas, Bosch and University of Stuttgart Project Partners: aicas,
More informationCCIX: a new coherent multichip interconnect for accelerated use cases
: a new coherent multichip interconnect for accelerated use cases Akira Shimizu Senior Manager, Operator relations Arm 2017 Arm Limited Arm 2017 Interconnects for different scale SoC interconnect. Connectivity
More informationSoftware Driven Verification at SoC Level. Perspec System Verifier Overview
Software Driven Verification at SoC Level Perspec System Verifier Overview June 2015 IP to SoC hardware/software integration and verification flows Cadence methodology and focus Applications (Basic to
More informationMassively Parallel Processor Breadboarding (MPPB)
Massively Parallel Processor Breadboarding (MPPB) 28 August 2012 Final Presentation TRP study 21986 Gerard Rauwerda CTO, Recore Systems Gerard.Rauwerda@RecoreSystems.com Recore Systems BV P.O. Box 77,
More informationMulticycle-Path Challenges in Multi-Synchronous Systems
Multicycle-Path Challenges in Multi-Synchronous Systems G. Engel 1, J. Ziebold 1, J. Cox 2, T. Chaney 2, M. Burke 2, and Mike Gulotta 3 1 Department of Electrical and Computer Engineering, IC Design Research
More informationIntroduction to FPGA Design with Vivado High-Level Synthesis. UG998 (v1.0) July 2, 2013
Introduction to FPGA Design with Vivado High-Level Synthesis Notice of Disclaimer The information disclosed to you hereunder (the Materials ) is provided solely for the selection and use of Xilinx products.
More informationOptimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd
Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block
More informationA Real Time Controller for E-ELT
A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project #671662 funded by European Commission under program H2020-EU.1.2.2
More informationZynq Architecture, PS (ARM) and PL
, PS (ARM) and PL Joint ICTP-IAEA School on Hybrid Reconfigurable Devices for Scientific Instrumentation Trieste, 1-5 June 2015 Fernando Rincón Fernando.rincon@uclm.es 1 Contents Zynq All Programmable
More informationSri Vidya College of Engineering and Technology. EC6703 Embedded and Real Time Systems Unit IV Page 1.
Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and Real Time Systems Page 1 Sri Vidya College of Engineering and Technology ERTS Course Material EC6703 Embedded and
More informationOptical switching for scalable and programmable data center networks
Optical switching for scalable and programmable data center networks Paraskevas Bakopoulos National Technical University of Athens Photonics Communications Research Laboratory @ pbakop@mail.ntua.gr Please
More informationUse ZCU102 TRD to Accelerate Development of ZYNQ UltraScale+ MPSoC
Use ZCU102 TRD to Accelerate Development of ZYNQ UltraScale+ MPSoC Topics Hardware advantages of ZYNQ UltraScale+ MPSoC Software stacks of MPSoC Target reference design introduction Details about one Design
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More informationOn the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms
On the Portability and Performance of Message-Passing Programs on Embedded Multicore Platforms Shih-Hao Hung, Po-Hsun Chiu, Chia-Heng Tu, Wei-Ting Chou and Wen-Long Yang Graduate Institute of Networking
More informationLecture 7: Introduction to Co-synthesis Algorithms
Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today
More informationA Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs
A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs Christoforos Kachris, Georgios Ch. Sirakoulis Department of Electrical & Computer Engineering Democritus University of Thrace,
More informationMYC-C7Z010/20 CPU Module
MYC-C7Z010/20 CPU Module - 667MHz Xilinx XC7Z010/20 Dual-core ARM Cortex-A9 Processor with Xilinx 7-series FPGA logic - 1GB DDR3 SDRAM (2 x 512MB, 32-bit), 4GB emmc, 32MB QSPI Flash - On-board Gigabit
More informationGigaX API for Zynq SoC
BUM002 v1.0 USER MANUAL A software API for Zynq PS that Enables High-speed GigaE-PL Data Transfer & Frames Management BERTEN DSP S.L. www.bertendsp.com gigax@bertendsp.com +34 942 18 10 11 Table of Contents
More informationNew Software-Designed Instruments
1 New Software-Designed Instruments Nicholas Haripersad Field Applications Engineer National Instruments South Africa Agenda What Is a Software-Designed Instrument? Why Software-Designed Instrumentation?
More informationUsing Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore
More informationExecuting Secured Virtual Machines within a Manycore Architecture
Executing Secured Virtual Machines within a Manycore Architecture Clément DÉVIGNE Jean-Baptiste BRÉJON, Quentin MEUNIER, Franck WAJSBÜRT LIP6 October 27, 2015 C.D, J-B.B, Q.M, F.W (LIP6) Executing Secured
More informationMidterm Exam. Solutions
Midterm Exam Solutions Problem 1 List at least 3 advantages of implementing selected portions of a design in hardware, and at least 3 advantages of implementing the remaining portions of the design in
More informationDesign Choices for FPGA-based SoCs When Adding a SATA Storage }
U4 U7 U7 Q D U5 Q D Design Choices for FPGA-based SoCs When Adding a SATA Storage } Lorenz Kolb & Endric Schubert, Missing Link Electronics Rudolf Usselmann, ASICS World Services Motivation for SATA Storage
More informationExtending Fixed Subsystems at the TLM Level: Experiences from the FPGA World
I N V E N T I V E Extending Fixed Subsystems at the TLM Level: Experiences from the FPGA World Frank Schirrmeister, Steve Brown, Larry Melling (Cadence) Dave Beal (Xilinx) Agenda Virtual Platforms Xilinx
More informationFCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA
1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,
More informationLinux Storage System Bottleneck Exploration
Linux Storage System Bottleneck Exploration Bean Huo / Zoltan Szubbocsev Beanhuo@micron.com / zszubbocsev@micron.com 215 Micron Technology, Inc. All rights reserved. Information, products, and/or specifications
More informationPorting VME-Based Optical-Link Remote I/O Module to a PLC Platform - an Approach to Maximize Cross-Platform Portability Using SoC
Porting VME-Based Optical-Link Remote I/O Module to a PLC Platform - an Approach to Maximize Cross-Platform Portability Using SoC T. Masuda, A. Kiyomichi Japan Synchrotron Radiation Research Institute
More informationSpeeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools
Speeding AM335x Programmable Realtime Unit (PRU) Application Development Through Improved Debug Tools The hardware modules and descriptions referred to in this document are *NOT SUPPORTED* by Texas Instruments
More informationWhy Put FPGAs in your CPU Socket?
Why Put FPGAs in your CPU Socket? Paul Chow High-Performance Reconfigurable Computing Group Department of Electrical and Computer Engineering University of Toronto What are we talking about? 1. Start with
More informationCloud Acceleration with FPGA s. Mike Strickland, Director, Computer & Storage BU, Altera
Cloud Acceleration with FPGA s Mike Strickland, Director, Computer & Storage BU, Altera Agenda Mission Alignment & Data Center Trends OpenCL and Algorithm Acceleration Networking Acceleration Data Access
More informationSoftware Verification for Low Power, Safety Critical Systems
Software Verification for Low Power, Safety Critical Systems 29 Nov 2016, Simon Davidmann info@imperas.com, Imperas Software Ltd. Page 1 Software Verification for Low Power, Safety Critical Systems Page
More informationAccelerating Flash Memory with the High Performance, Low Latency, OpenCAPI Interface
Accelerating Flash Memory with the High Performance, Low Latency, OpenCAPI Interface Allan Cantle, CTO & Founder, Nallatech/Molex Marcy Byers, Processor Development, IBM Nallatech at a Glance Server qualified
More informationZC706 GTX IBERT Design Creation June 2013
ZC706 GTX IBERT Design Creation June 2013 XTP243 Revision History Date Version Description 06/19/13 4.0 Recompiled for Vivado 2013.2. 04/16/13 3.1 Added AR54225. 04/03/13 3.0 Recompiled for 14.5. 01/18/13
More informationNew Interconnnects. Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel
New Interconnnects Moderator: Andy Rudoff, SNIA NVM Programming Technical Work Group and Persistent Memory SW Architect, Intel CCIX: Seamless Data Movement for Accelerated Applications TM Millind Mittal
More informationA Real Time Controller for E-ELT
A Real Time Controller for E-ELT Addressing the jitter/latency constraints Maxime Lainé, Denis Perret LESIA / Observatoire de Paris Project #671662 funded by European Commission under program H2020-EU.1.2.2
More informationSoftware Development Using Full System Simulation with Freescale QorIQ Communications Processors
Patrick Keliher, Simics Field Application Engineer Software Development Using Full System Simulation with Freescale QorIQ Communications Processors 1 2013 Wind River. All Rights Reserved. Agenda Introduction
More informationAN OPEN-SOURCE VHDL IP LIBRARY WITH PLUG&PLAY CONFIGURATION
AN OPEN-SOURCE VHDL IP LIBRARY WITH PLUG&PLAY CONFIGURATION Jiri Gaisler Gaisler Research, Första Långgatan 19, 413 27 Göteborg, Sweden Abstract: Key words: An open-source IP library based on the AMBA-2.0
More informationInterrupts in Zynq Systems
Interrupts in Zynq Systems C r i s t i a n S i s t e r n a U n i v e r s i d a d N a c i o n a l d e S a n J u a n A r g e n t i n a Exception / Interrupt Special condition that requires a processor's
More informationCombining Arm & RISC-V in Heterogeneous Designs
Combining Arm & RISC-V in Heterogeneous Designs Gajinder Panesar, CTO, UltraSoC gajinder.panesar@ultrasoc.com RISC-V Summit 3 5 December 2018 Santa Clara, USA Problem statement Deterministic multi-core
More informationSerial. Parallel. CIT 668: System Architecture 2/14/2011. Topics. Serial and Parallel Computation. Parallel Computing
CIT 668: System Architecture Parallel Computing Topics 1. What is Parallel Computing? 2. Why use Parallel Computing? 3. Types of Parallelism 4. Amdahl s Law 5. Flynn s Taxonomy of Parallel Computers 6.
More informationEnergy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ
Energy and Performance Exploration of Accelerator Coherency Port Using Xilinx ZYNQ Mohammadsadegh Sadri *, Christian Weis, Norbert Wehn, and Luca Benini * * Department of Electrical, Electronic and Information
More informationHardware-Software Codesign. 1. Introduction
Hardware-Software Codesign 1. Introduction Lothar Thiele 1-1 Contents What is an Embedded System? Levels of Abstraction in Electronic System Design Typical Design Flow of Hardware-Software Systems 1-2
More informationXPU A Programmable FPGA Accelerator for Diverse Workloads
XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for
More informationMulti-protocol controller for Industry 4.0
Multi-protocol controller for Industry 4.0 Andreas Schwope, Renesas Electronics Europe With the R-IN Engine architecture described in this article, a device can process both network communications and
More informationCEC 450 Real-Time Systems
CEC 450 Real-Time Systems Lecture 9 Device Interfaces October 20, 2015 Sam Siewert This Week Exam 1 86.4 Ave, 4.93 Std Dev, 91 High Solutions Posted on Canvas Questions? Monday Went Over in Class Assignment
More informationSpeeding Up Robot Control Software Through Seamless Integration With FPGA
Speeding Up Robot Control Software Through Seamless Integration With Xuan Sang LE 1,2, Luc Fabresse 1, Jannik Laval 3, Jean-Christophe Le Lann 2, Loic Lagadec 2 and Noury Bouraqadi 1 1 Mines Douai-DIA,
More informationCo-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationSpartan-6 & Virtex-6 FPGA Connectivity Kit FAQ
1 P age Spartan-6 & Virtex-6 FPGA Connectivity Kit FAQ April 04, 2011 Getting Started 1. Where can I purchase a kit? A: You can purchase your Spartan-6 and Virtex-6 FPGA Connectivity kits online at: Spartan-6
More informationOpenCAPI Technology. Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name. Join the Conversation #OpenPOWERSummit
OpenCAPI Technology Myron Slota Speaker name, Title OpenCAPI Consortium Company/Organization Name Join the Conversation #OpenPOWERSummit Industry Collaboration and Innovation OpenCAPI Topics Computation
More informationDesign AXI Master IP using Vivado HLS tool
W H I T E P A P E R Venkatesh W VLSI Design Engineer and Srikanth Reddy Sr.VLSI Design Engineer Design AXI Master IP using Vivado HLS tool Abstract Vivado HLS (High-Level Synthesis) tool converts C, C++
More informationBoosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search
Boosting the Performance of FPGA-based Graph Processor using Hybrid Memory Cube: A Case for Breadth First Search Jialiang Zhang, Soroosh Khoram and Jing Li 1 Outline Background Big graph analytics Hybrid
More informationCMP Conference 20 th January Director of Business Development EMEA
CMP Conference 20 th January 2011 eric.lalardie@arm.com Director of Business Development EMEA +33 6 07 83 09 60 1 1 Unparalleled Applicability ARM Cortex Advanced Processors Architectural innovation, compatibility
More informationRe-architecting Virtualization in Heterogeneous Multicore Systems
Re-architecting Virtualization in Heterogeneous Multicore Systems Himanshu Raj, Sanjay Kumar, Vishakha Gupta, Gregory Diamos, Nawaf Alamoosa, Ada Gavrilovska, Karsten Schwan, Sudhakar Yalamanchili College
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationHardware Design. MicroBlaze 7.1. This material exempt per Department of Commerce license exception TSU Xilinx, Inc. All Rights Reserved
Hardware Design MicroBlaze 7.1 This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: List the MicroBlaze 7.1 Features List
More informationNetworks-on-Chip Router: Configuration and Implementation
Networks-on-Chip : Configuration and Implementation Wen-Chung Tsai, Kuo-Chih Chu * 2 1 Department of Information and Communication Engineering, Chaoyang University of Technology, Taichung 413, Taiwan,
More information