Challenges of Heterogeneous MPSoC for Image Processing

Size: px
Start display at page:

Download "Challenges of Heterogeneous MPSoC for Image Processing"

Transcription

1 Challenges of Heterogeneous MPSoC for Image Processing DGLR 2017 Walter Stechele Institute for Integrated Systems Technische Universität München

2 Overview Reconfigurable hardware Hardware <---> software migration Heterogeneous MPSoC Application mapping and resource-aware programming Case studies from driver assistance and robotic vision 2

3 AutoVision Processor Shape Engine Contrast Engine Taillight Engine Optischer Fluß C Highway X X Tunnel entrance X X Tunnel X X City X X C1 FPGA C2 I/O on-chip bus ShapeEng Eng0 Eng1 ICAP MEM CTRL Video IF MatE CensusE TaillightE ConEng ShapeEng EdgeEng SDRAM Partial Bitstreams 3

4 Optical Flow Census Transformation HW SW Draw features Matching Post Proc List of features Status SW-Algorithm (Matlab) SW-Algorithm (OpenCV) Profiling HW/SW Partitioning HW Accelerator Demonstrator Results: Optical flow (640x480, ~ 17000feat) Core2 Duo 1,86GHz: 40 ms Engine: 2217ms 4

5 Census transformation 1) Compute signature for every pixel within Frame t k Frame t k ( 0ms) Frame t k 1 (40ms) Frame t k ( 0ms) > < < > < x < = < < > x > < > = > < 2) same comparison in Frame t k 1 Signature k Signature k ) Match the signatures from both images correspondence / motion vector [1] F Stein: Efficient Computation of Optical Flow Using the Census Transform, DAGM-Symposium,

6 Algorithmic redesign Software Version [1] n Signature = Address/Key Frame t k Frame t k 1 3 steps: low pass filter Census transformation Matching Frame t k n Frame t k Signature serves as address, pixel coordinates as value non-consecutive memory regions (no bursting possible) Counter update requires a read for every write operation Too high memory consumption through table based indexing scheme global matching: motion vectors across whole image possible Algorithm unsuitable for FPGA implementation!!! m m x,y x,y 0 4 x,y x,y 1 1 x,y x,y 5 0 x,y x,y 1 2 [1] F Stein: Efficient Computation of Optical Flow Using the Census Transform, DAGM-Symposium,

7 Algorithmic redesign Hardware Version [2] 3 steps: low pass filter Census transformation Matching m m n Frame t k n m m Signature = Value n n Frame t k 1 Signature serves as value, pixel coordinates as address Bursting possible Table based approach removed completely -> no counter update local matching: motion vectors only within neighborhood possible (225 parallel paths and comparisons in one clock cycle) Algorithm in that form unsuitable for software!!! [2] C Claus, A Laika, Li Jia, W Stechele: High performance FPGA based optical flow calculation using the census transformation, IV

8 Performance Comparison Bursting possible Counter update required Matching Scheme Platform Frequency Image smoothing Census transformation Finding matches Drawing motion vectors Total time Power consumption Processing on C Processing on FPGA SW no yes global Intel Core 2 Duo 186 GHz 187 ms 3868 ms 4055 ms* 65 W (TDP) HW yes no local FPGA, 2 emb PPCs 100 (FPGA), 300 (PPC) 395 ms 592 ms 123 ms 2217 ms* 10 W (TDP, < 1W target) *Execution time of images with a resolution of 640x480 and approximately feat detected 8

9 CensusEngine 9

10 MatchingEngine 10

11 AutoVision in a car Special thanks to DFG, BMW, Xilinx, sensor-to-image 11

12 Object Recognition An approach used in computer vision to extract features and infer the contents of an image Enables the ARMAR robot to recognize objects and to carry out tasks like object tracking, object grasping Consists of three main stages (Harris corner detection, SIFT feature extraction and SIFT feature matching) CAM Frame Buffer Stage-1 Stage-2 Stage-3 Harris Corner SIFT Feature Extraction SIFT Feature Matching 12

13 Heterogeneous MPSoC Tiled hardware architecture with: Tightly Coupled Processor Array (TCPA) for image processing Invasive-Core (i-core) with special instruction support (SI) Loosely coupled LEON3 cores for high level algorithms Network-on-Chip (NoC) Tile Local Memory (TLM) Memory tile with interface to external DDR-II memory IO tile and Ethernet interface 13

14 Humanoid Robot ARMAR from KIT [Asfour et al] Camera Microphone Accelerometer Pressure sensor Torque sensor Rotary encoder 14

15 AMBA APB Bus Configuration Manager Reconfigurable Buffer Reconfigurable Buffer Harris Corner Detection on TCPA Tightly Coupled Processor Array [Teich et al] TCPA consists of numerous light weight processing elements (PE) TCPA benefit from instruction and loop-level parallelism and offers significant acceleration to image processing algorithms Direct PE to PE communication channels, results in continuous streaming of data from the surrounding buffers through the array Irq Ctrl Config & Com Proc Network Adapter AHB/APB Bridge AMBA AHB Bus IM GC AG IM AG Reconfigurable Buffer GC Config Memory Config Loader GC Reconfigurable Buffer AG IM AG GC IM 15

16 Configuration Manager I/O Buffers I/O Buffers Mapping HCD on TCPA 3x3 IM GC AG IM AG I/O Buffers GC TCPA prototype for HCD consists of two PEs Achieved a frame rate of 5 fps (640x480 pixels) GC I/O Buffers AG IM AG GC IM TCPA implementation is expected to consumes less power due to its lightweight PE structure AMBA bus Conf & Com Proc (LEON3) Memory 16

17 SIFT Feature Matching on i-core [Henkel et al] i-core - an extension of a LEON3 processor with a reconfigurable fabric, which allows loading application specific accelerators at runtime Start Harris Corner Detection SIFT Feature Extraction Euclidean distance Distance between p and q D p, q = Σ (p q) 2 SIFT Feature Matching Visualize for( k = 0; k < ndimension; k++) { v = pquery[k] pdata[k] ; sum += v * v; } Stop 17

18 SIFT Feature Matching on i-core Two memory ports provide a high-bandwidth connection (2x128 bits) to the tile-local memory 18

19 Homogeneous vs Heterogeneous Three stages of the object recognition algorithm operate in a pipelined fashion Hardware variants used for comparison: Homogeneous MPSoC, 2x3 tile design with four LEON3 PEs per tile Heterogeneous MPSoC, 2x3 tile design with one TCPA tile and one i-core CAM Frame Buffer Stage-1 Stage-2 Stage-3 Harris Corner on TCPA SIFT Feature Extraction on LEON3 SIFT Feature Matching on i-core TCPA i-core LEON3 HCD-TCPA SIFT-Extr-LEON3 SIFT-Match-iCore Time 19

20 Homogeneous vs Heterogeneous Three stages of the object recognition algorithm operate in a pipelined fashion Hardware variants used for comparison: Homogeneous MPSoC, 2x3 tile design with four LEON3 PEs per tile Heterogeneous MPSoC, 2x3 tile design with one TCPA tile and one i-core Load LEON3 Load TCPA Load i-core Throughput WOLT (msec) Homogeneous 59% 0% 0% 97 frames 732 Heterogeneous 23% 62% 72% 97 frames 683 TCPA i-core LEON3 HCD-TCPA SIFT-Extr-LEON3 SIFT-Match-iCore Time 20

21 Additional Applications Conventional task distribution App-3 App-1 Audio Filtering on TCPA App-2 Matrix Multiplication i-core CAM Frame Buffer Stage-1 Stage-2 Stage-3 Harris Corner on TCPA SIFT Feature Extraction on LEON3 SIFT Feature Matching on i-core TCPA i-core LEON3 (a) Frame 1 (c) (b) Time (d) Frame 2 Audio-TCPA MatrixMul-iCore HCD-TCPA HCD-LEON3 SIFT-Extr-LEON3 SIFT-Match-iCore SIFT-Match-LEON3 21

22 Additional Applications Conventional task distribution Load LEON3 Load TCPA Load i-core Throughput WOLT (msec) Obj-Recog Only 23% 62% 72% 97 frames 683 All Three Apps 17% 96% 67% 72 frames 1400 TCPA i-core LEON3 (a) Frame 1 (c) (b) Time (d) Frame 2 Audio-TCPA MatrixMul-i-Core MatrixMul-iCore HCD-TCPA HCD-LEON3 SIFT-Extr-LEON3 SIFT-Match-iCore SIFT-Match-i-Core SIFT-Match-LEON3 22

23 Conventional vs Resource-aware Resource-aware task distribution App-1 Audio Filtering on TCPA App-2 Matrix Multiplication i-core CAM Frame Buffer Harris Corner on TCPA SIFT Feature Extraction on LEON3 App-3 Stage-1 Stage-2 Stage-3 SIFT Feature Matching on i-core TCPA i-core LEON3 Frame 1 (e) (h) (f) Frame 2 (x) Frame 3 Frame Time (g) (y) Audio-TCPA MatrixMul-i-Core MatrixMul-iCore HCD-TCPA HCD-LEON3 SIFT-Extr-LEON3 SIFT-Match-iCore SIFT-Match-i-Core SIFT-Match-LEON3 23

24 Conventional vs Resource-aware Resource-aware task distribution Load LEON3 Load TCPA Load i- Core Throughput WOLT (msec) Conventional 17% 96% 67% 72 frames 1400 Resource-aware 38% 80% 75% 98 frames 705 TCPA i-core LEON3 Frame 1 (e) (h) (f) Frame 2 (x) Frame 3 Frame Time (g) (y) Audio-TCPA MatrixMul-i-Core MatrixMul-iCore HCD-TCPA HCD-LEON3 SIFT-Extr-LEON3 SIFT-Match-iCore SIFT-Match-i-Core SIFT-Match-LEON3 24

25 Summary for Heterogeneous MPSoC What we could see so far Resource-awareness helps task distribution on heterogeneous MPSoC Improves throughput and WOLT (worst observed latency time) Now, what if no more C cores available 25

26 Execution time Frame No Expected Observed 26 HCD and Additional Load Core count Frame No Required Available Harris Corner Detection Operating System Many-core HW Add l Load

27 Stage -3 Stage -2 Stage -1 Harris Corner Detection - Stages Harris corner detection algorithm consists of three main stages Covariance I x = p 1 p 2 I y = p 3 p 4 X Harris Map w x I x 2 w w x I x I y w w x I x I y w w x I y 2 w = a b b c X Threshold R = ac b 2 k((a + c)(a c)) 27

28 Subsampling by Dropping Pixels Advantages: Reduce the computational load by dropping pixels Drop every alternate pixel horizontally to reduce the computations by 50% Drop every alternate pixels horizontally & vertically reduces workload by 75% Disadvantages: Regions with and without corners are considered equally Memory read/write remains almost the same Low ratio of computation : memory leads to poor scalability 28

29 Masking Technique Covariance Harris Map Threshold X Threshold the covariance image to generate a mask unique to the input image, based on [Alkaabi 2004] Mask out regions with regular intensities and unmask others Masked and unmasked regions appear in clusters, making it highly cache friendly 29

30 Masking Technique Self-adaptive HCD algorithm with variable masking Negligible loss in precision & recall until threshold of 8 Execution time reduced by 60% with a mask threshold of 100 keeping up precision & recall to 88-90% precision = 1 recall = #incorrect matches #correct + #incorrect #correct matches #total possible matches Resource Usage Precision Recall Mask Threshold = 0 Mask Threshold = 2 Mask Threshold = 4 Mask Threshold = 8 Harris Corner Detection Mask Threshold = 16 Mask Threshold = 32 Mask Threshold = 64 Mask Threshold = 128 Operating System Many-core HW 30

31 Precision/Recall Rate Duration (milliseconds) Results: Conventional vs Resource-aware HCD Execution time profile RA CN Accuracy 1 0,8 0,6 0,4 0, Frame No PR-RA RE-RA PR-CN RE-CN Summary 31

32 Conclusion Reconfigurable hardware Hardware is not always fixed Optical flow in hardware and software Software is not simply mapped to hardware Heterogeneous MPSoC Static mapping is not sufficient Resource-aware computing Software adaptation is not limited to parameter tuning 32

33 References J A Colmenares, G Eads, S A Hofmeyr, S Bird, M Moretó, D Chou, B Gluzman, E Roman, D B Bartolini, N Mor et al: Tessellation: refactoring the OS around explicit resource containers with continuous adaptation, DAC 2013 Henry Hoffmann, Jonathan Eastep, Marco D Santambrogio, Jason E Miller, Anant Agarwal: Application Heartbeats - A Generic Interface for Specifying Program Performance and Goals in Autonomous Computing Environments, in ICAC 2010 Henry Hoffmann, Stelios Sidiroglou, Michael Carbin, Sasa Misailovic, Anant Agarwal, Martin Rinard: Dynamic Knobs for Responsive Power-Aware Computing, ASPLOS 2012 D B Bartolini, R Cattaneo, G C Durelli, M Maggio, M D Santambrogio and F Sironi The autonomic operating system research project: achievements and future directions, DAC 2013 Edoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano: Evaluating Orthogonality between Application Auto-Tuning and Run-Time Resource Management for Adaptive OpenCL Applications, ASAP

34 References F Stein: Efficient Computation of Optical Flow Using the Census Transform, DAGM-Symposium, 2004 S Alkaabi, F Deravi: Candidate pruning for fast corner detection, Electronics Letters, 2004 [Teich et al] [Henkel et al] J Paul, W Stechele et al: Resource awareness on heterogeneous MPSoCs for image processing, Journal of Systems Architecture, Elsevier, 2015 J Paul, W Stechele et al: Self-adaptive corner detection on MPSoCs through resource-aware programming, Journal of Systems Architecture, Elsevier, 2015 C Claus, A Laika, Li Jia, W Stechele: High performance FPGA based optical flow calculation using the census transformation, Intelligent Vehicles Symposium,

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications

Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications Evaluating Orthogonality between Application Auto tuning and Run Time Resource Management for Adaptive OpenCL Applications Edoardo Paone, Davide Gadioli, Gianluca Palermo, Vittorio Zaccaria, Cristina Silvano

More information

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures

Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Design Space Exploration and Application Autotuning for Runtime Adaptivity in Multicore Architectures Cristina Silvano Politecnico di Milano cristina.silvano@polimi.it Outline Research challenges in multicore

More information

Resource-Aware Programming for Robotic Vision

Resource-Aware Programming for Robotic Vision Resource-Aware Programming for Robotic Vision Johny Paul, Walter Stechele Manfred Kro hnert, Tamim Asfour Institute for Integrated Systems Technical University of Munich, Germany {Johny.Paul, Walter.Stechele}@tum.de

More information

Image Processing on Heterogeneous Multiprocessor System-on-Chip using Resource-aware Programming

Image Processing on Heterogeneous Multiprocessor System-on-Chip using Resource-aware Programming TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Integrierte Systeme Image Processing on Heterogeneous Multiprocessor System-on-Chip using Resource-aware Programming Johny Paul Vollständiger Abdruck der von

More information

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration

Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration Marie Nguyen Carnegie Mellon University Pittsburgh, Pennsylvania James C. Hoe Carnegie Mellon University Pittsburgh,

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Invasive Computing for Robotic Vision

Invasive Computing for Robotic Vision Invasive Computing for Robotic Vision Johny Paul and Walter Stechele Institute for Integrated Systems Technical University of Munich Germany {Johny.Paul, Walter.Stechele}@tum.de M. Kröhnert, T. Asfour

More information

Multi processor systems with configurable hardware acceleration

Multi processor systems with configurable hardware acceleration Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations

More information

A Resource-Aware Nearest-Neighbor Search Algorithm for K-Dimensional Trees

A Resource-Aware Nearest-Neighbor Search Algorithm for K-Dimensional Trees A Resource-Aware Nearest-Neighbor Search Algorithm for K-Dimensional Trees Johny Paul and Walter Stechele Institute for Integrated Systems Technical University of Munich Germany {johny.paul,walter.stechele}@tum.de

More information

Adaptable Intelligence The Next Computing Era

Adaptable Intelligence The Next Computing Era Adaptable Intelligence The Next Computing Era Hot Chips, August 21, 2018 Victor Peng, CEO, Xilinx Pervasive Intelligence from Cloud to Edge to Endpoints >> 1 Exponential Growth and Opportunities Data Explosion

More information

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system

A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system A software platform to support dynamically reconfigurable Systems-on-Chip under the GNU/Linux operating system 26th July 2005 Alberto Donato donato@elet.polimi.it Relatore: Prof. Fabrizio Ferrandi Correlatore:

More information

Design of an open hardware architecture for the humanoid robot ARMAR

Design of an open hardware architecture for the humanoid robot ARMAR Design of an open hardware architecture for the humanoid robot ARMAR Kristian Regenstein 1 and Rüdiger Dillmann 1,2 1 FZI Forschungszentrum Informatik, Haid und Neustraße 10-14, 76131 Karlsruhe, Germany

More information

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano

Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Introduction to Multiprocessors (Part I) Prof. Cristina Silvano Politecnico di Milano Outline Key issues to design multiprocessors Interconnection network Centralized shared-memory architectures Distributed

More information

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei Machine Learning is a Subset of Artificial

More information

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration

Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration , pp.517-521 http://dx.doi.org/10.14257/astl.2015.1 Improving Energy Efficiency of Block-Matching Motion Estimation Using Dynamic Partial Reconfiguration Jooheung Lee 1 and Jungwon Cho 2, * 1 Dept. of

More information

Using FPGAs as Microservices

Using FPGAs as Microservices Using FPGAs as Microservices David Ojika, Ann Gordon-Ross, Herman Lam, Bhavesh Patel, Gaurav Kaul, Jayson Strayer (University of Florida, DELL EMC, Intel Corporation) The 9 th Workshop on Big Data Benchmarks,

More information

On Road Vehicle Detection using Shadows

On Road Vehicle Detection using Shadows On Road Vehicle Detection using Shadows Gilad Buchman Grasp Lab, Department of Computer and Information Science School of Engineering University of Pennsylvania, Philadelphia, PA buchmag@seas.upenn.edu

More information

Addressing the Memory Wall

Addressing the Memory Wall Lecture 26: Addressing the Memory Wall Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Cage the Elephant Back Against the Wall (Cage the Elephant) This song is for the

More information

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture II. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture II Benny Thörnberg Associate Professor in Electronics Parallelism Microscopic vs Macroscopic Microscopic parallelism hardware solutions inside system components providing

More information

Autonomous Navigation for Flying Robots

Autonomous Navigation for Flying Robots Computer Vision Group Prof. Daniel Cremers Autonomous Navigation for Flying Robots Lecture 7.1: 2D Motion Estimation in Images Jürgen Sturm Technische Universität München 3D to 2D Perspective Projections

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

HVSoCs: A Framework for Rapid Prototyping of 3-D Hybrid Virtual System-on-Chips

HVSoCs: A Framework for Rapid Prototyping of 3-D Hybrid Virtual System-on-Chips on introducing a new design paradigm HVSoCs: A Framework for Rapid Prototyping of 3-D Hybrid Virtual System-on-Chips D. Diamantopoulos, K. Siozios, E. Sotiriou-Xanthopoulos, G. Economakos and D. Soudris

More information

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms

Hardware Acceleration of Feature Detection and Description Algorithms on Low Power Embedded Platforms Hardware Acceleration of Feature Detection and Description Algorithms on LowPower Embedded Platforms Onur Ulusel, Christopher Picardo, Christopher Harris, Sherief Reda, R. Iris Bahar, School of Engineering,

More information

XPU A Programmable FPGA Accelerator for Diverse Workloads

XPU A Programmable FPGA Accelerator for Diverse Workloads XPU A Programmable FPGA Accelerator for Diverse Workloads Jian Ouyang, 1 (ouyangjian@baidu.com) Ephrem Wu, 2 Jing Wang, 1 Yupeng Li, 1 Hanlin Xie 1 1 Baidu, Inc. 2 Xilinx Outlines Background - FPGA for

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

Self-Aware Adaptation in FPGA-based Systems

Self-Aware Adaptation in FPGA-based Systems DIPARTIMENTO DI ELETTRONICA E INFORMAZIONE Self-Aware Adaptation in FPGA-based Systems IEEE FPL 2010 Filippo Siorni: filippo.sironi@dresd.org Marco Triverio: marco.triverio@dresd.org Martina Maggio: mmaggio@mit.edu

More information

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증

MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 MATLAB/Simulink 기반의프로그래머블 SoC 설계및검증 이웅재부장 Application Engineering Group 2014 The MathWorks, Inc. 1 Agenda Introduction ZYNQ Design Process Model-Based Design Workflow Prototyping and Verification Processor

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

The S6000 Family of Processors

The S6000 Family of Processors The S6000 Family of Processors Today s Design Challenges The advent of software configurable processors In recent years, the widespread adoption of digital technologies has revolutionized the way in which

More information

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF.

Multicore SoC is coming. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems. Source: 2007 ISSCC and IDF. Scalable and Reconfigurable Stream Processor for Mobile Multimedia Systems Liang-Gee Chen Distinguished Professor General Director, SOC Center National Taiwan University DSP/IC Design Lab, GIEE, NTU 1

More information

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain

Massively Parallel Computing on Silicon: SIMD Implementations. V.M.. Brea Univ. of Santiago de Compostela Spain Massively Parallel Computing on Silicon: SIMD Implementations V.M.. Brea Univ. of Santiago de Compostela Spain GOAL Give an overview on the state-of of-the- art of Digital on-chip CMOS SIMD Solutions,

More information

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on

A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on A Reconfigurable Crossbar Switch with Adaptive Bandwidth Control for Networks-on on-chip Donghyun Kim, Kangmin Lee, Se-joong Lee and Hoi-Jun Yoo Semiconductor System Laboratory, Dept. of EECS, Korea Advanced

More information

The Challenges of System Design. Raising Performance and Reducing Power Consumption

The Challenges of System Design. Raising Performance and Reducing Power Consumption The Challenges of System Design Raising Performance and Reducing Power Consumption 1 Agenda The key challenges Visibility for software optimisation Efficiency for improved PPA 2 Product Challenge - Software

More information

Designing and Targeting Video Processing Subsystems for Hardware

Designing and Targeting Video Processing Subsystems for Hardware 1 Designing and Targeting Video Processing Subsystems for Hardware 정승혁과장 Senior Application Engineer MathWorks Korea 2017 The MathWorks, Inc. 2 Pixel-stream Frame-based Process : From Algorithm to Hardware

More information

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware

ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware ReconOS: Multithreaded Programming and Execution Models for Reconfigurable Hardware Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn {enno.luebbers, platzner}@upb.de Outline

More information

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17, Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17, 2014 1 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems

More information

ReconOS: An RTOS Supporting Hardware and Software Threads

ReconOS: An RTOS Supporting Hardware and Software Threads ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming

More information

Computer Architecture

Computer Architecture Computer Architecture Slide Sets WS 2013/2014 Prof. Dr. Uwe Brinkschulte M.Sc. Benjamin Betting Part 10 Thread and Task Level Parallelism Computer Architecture Part 10 page 1 of 36 Prof. Dr. Uwe Brinkschulte,

More information

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing

Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Towards a Dynamically Reconfigurable System-on-Chip Platform for Video Signal Processing Walter Stechele, Stephan Herrmann, Andreas Herkersdorf Technische Universität München 80290 München Germany Walter.Stechele@ei.tum.de

More information

EECS150 - Digital Design Lecture 17 Memory 2

EECS150 - Digital Design Lecture 17 Memory 2 EECS150 - Digital Design Lecture 17 Memory 2 October 22, 2002 John Wawrzynek Fall 2002 EECS150 Lec17-mem2 Page 1 SDRAM Recap General Characteristics Optimized for high density and therefore low cost/bit

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders

A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs. Marco Bekooij & Frank Ophelders A Tuneable Software Cache Coherence Protocol for Heterogeneous MPSoCs Marco Bekooij & Frank Ophelders Outline Context What is cache coherence Addressed challenge Short overview of related work Related

More information

Hardware-Software Codesign

Hardware-Software Codesign Hardware-Software Codesign 8. Performance Estimation Lothar Thiele 8-1 System Design specification system synthesis estimation -compilation intellectual prop. code instruction set HW-synthesis intellectual

More information

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi

Hardware Software Co-design and SoC. Neeraj Goel IIT Delhi Hardware Software Co-design and SoC Neeraj Goel IIT Delhi Introduction What is hardware software co-design Some part of application in hardware and some part in software Mpeg2 decoder example Prediction

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

THE NVIDIA DEEP LEARNING ACCELERATOR

THE NVIDIA DEEP LEARNING ACCELERATOR THE NVIDIA DEEP LEARNING ACCELERATOR INTRODUCTION NVDLA NVIDIA Deep Learning Accelerator Developed as part of Xavier NVIDIA s SOC for autonomous driving applications Optimized for Convolutional Neural

More information

Single Pass Connected Components Analysis

Single Pass Connected Components Analysis D. G. Bailey, C. T. Johnston, Single Pass Connected Components Analysis, Proceedings of Image and Vision Computing New Zealand 007, pp. 8 87, Hamilton, New Zealand, December 007. Single Pass Connected

More information

Embedded real-time stereo estimation via Semi-Global Matching on the GPU

Embedded real-time stereo estimation via Semi-Global Matching on the GPU Embedded real-time stereo estimation via Semi-Global Matching on the GPU Daniel Hernández Juárez, Alejandro Chacón, Antonio Espinosa, David Vázquez, Juan Carlos Moure and Antonio M. López Computer Architecture

More information

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP

Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Processor Architectures At A Glance: M.I.T. Raw vs. UC Davis AsAP Presenter: Course: EEC 289Q: Reconfigurable Computing Course Instructor: Professor Soheil Ghiasi Outline Overview of M.I.T. Raw processor

More information

SoC for Car Navigation Systems with a 53.3 GOPS Image Recognition Engine

SoC for Car Navigation Systems with a 53.3 GOPS Image Recognition Engine Session 5D : Designer s Forum : State-of-the-art SoCs 5D-4 SoC for Car Navigation Systems with a 53.3 GOPS Image Recognition Engine Jan. 20. 2010 Hiroyuki Hamasaki*, Yasuhiko Hoshi*, Atsushi Nakamura *,

More information

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Introduction to System-on-Chip

Introduction to System-on-Chip Introduction to System-on-Chip COE838: Systems-on-Chip Design http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer Engineering Ryerson University

More information

[Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개

[Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개 [Sub Track 1-3] FPGA/ASIC 을타겟으로한알고리즘의효율적인생성방법및신기능소개 정승혁과장 Senior Application Engineer MathWorks Korea 2015 The MathWorks, Inc. 1 Outline When FPGA, ASIC, or System-on-Chip (SoC) hardware is needed Hardware

More information

RISC-V Core IP Products

RISC-V Core IP Products RISC-V Core IP Products An Introduction to SiFive RISC-V Core IP Drew Barbier September 2017 drew@sifive.com SiFive RISC-V Core IP Products This presentation is targeted at embedded designers who want

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Table 1: Example Implementation Statistics for Xilinx FPGAs

Table 1: Example Implementation Statistics for Xilinx FPGAs logijpge Motion JPEG Encoder January 10 th, 2018 Data Sheet Version: v1.0 Xylon d.o.o. Fallerovo setaliste 22 10000 Zagreb, Croatia Phone: +385 1 368 00 26 Fax: +385 1 365 51 67 E-mail: support@logicbricks.com

More information

Broadening the Exploration of the Accelerator Design Space in Embedded Scalable Platforms

Broadening the Exploration of the Accelerator Design Space in Embedded Scalable Platforms IEEE High Performance Extreme Computing Conference (HPEC), 2017 Broadening the Exploration of the Design Space in Embedded Scalable Platforms Luca Piccolboni, Paolo Mantovani, Giuseppe Di Guglielmo, Luca

More information

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS

SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS SOFTWARE-DEFINED MEMORY HIERARCHIES: SCALABILITY AND QOS IN THOUSAND-CORE SYSTEMS DANIEL SANCHEZ MIT CSAIL IAP MEETING MAY 21, 2013 Research Agenda Lack of technology progress Moore s Law still alive Power

More information

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik

SoC Design. Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik SoC Design Prof. Dr. Christophe Bobda Institut für Informatik Lehrstuhl für Technische Informatik Chapter 5 On-Chip Communication Outline 1. Introduction 2. Shared media 3. Switched media 4. Network on

More information

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University

ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University ECE5775 High-Level Digital Design Automation, Fall 2018 School of Electrical Computer Engineering, Cornell University Lab 4: Binarized Convolutional Neural Networks Due Wednesday, October 31, 2018, 11:59pm

More information

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Yafit Snir Arindam Guha, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Agenda Overview: MIPI Verification approaches and challenges Acceleration methodology overview and

More information

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter

Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter Tradeoff Analysis and Architecture Design of a Hybrid Hardware/Software Sorter M. Bednara, O. Beyer, J. Teich, R. Wanka Paderborn University D-33095 Paderborn, Germany bednara,beyer,teich @date.upb.de,

More information

DRPM architecture overview

DRPM architecture overview DRPM architecture overview Jens Hagemeyer, Dirk Jungewelter, Dario Cozzi, Sebastian Korf, Mario Porrmann Center of Excellence Cognitive action Technology, Bielefeld University, Germany Project partners:

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Politecnico di Milano

Politecnico di Milano Politecnico di Milano Prototyping Pipelined Applications on a Heterogeneous FPGA Multiprocessor Virtual Platform Antonino Tumeo, Marco Branca, Lorenzo Camerini, Marco Ceriani, Gianluca Palermo, Fabrizio

More information

Model-based Visual Tracking:

Model-based Visual Tracking: Technische Universität München Model-based Visual Tracking: the OpenTL framework Giorgio Panin Technische Universität München Institut für Informatik Lehrstuhl für Echtzeitsysteme und Robotik (Prof. Alois

More information

Next Generation Multi-Purpose Microprocessor

Next Generation Multi-Purpose Microprocessor Next Generation Multi-Purpose Microprocessor Presentation at MPSA, 4 th of November 2009 www.aeroflex.com/gaisler OUTLINE NGMP key requirements Development schedule Architectural Overview LEON4FT features

More information

Intelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems

Intelligent Interconnect for Autonomous Vehicle SoCs. Sam Wong / Chi Peng, NetSpeed Systems Intelligent Interconnect for Autonomous Vehicle SoCs Sam Wong / Chi Peng, NetSpeed Systems Challenges Facing Autonomous Vehicles Exploding Performance Requirements Real-Time Processing of Sensors Ultra-High

More information

FlexTiles. Runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore

FlexTiles. Runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore FlexTiles www.flextiles.eu Runtime mapping of hardware accelerators on 3D self-adaptive heterogeneous manycore 21/5/2013 Christophe HURIAUX, Olivier SENTIEYS, Antoine COURTAY, Emmanuel CASSEAU, Quang Hoa

More information

Local features: detection and description. Local invariant features

Local features: detection and description. Local invariant features Local features: detection and description Local invariant features Detection of interest points Harris corner detection Scale invariant blob detection: LoG Description of local patches SIFT : Histograms

More information

FPGA: What? Why? Marco D. Santambrogio

FPGA: What? Why? Marco D. Santambrogio FPGA: What? Why? Marco D. Santambrogio marco.santambrogio@polimi.it 2 Reconfigurable Hardware Reconfigurable computing is intended to fill the gap between hardware and software, achieving potentially much

More information

CogniSight, image recognition engine

CogniSight, image recognition engine CogniSight, image recognition engine Making sense of video and images Generating insights, meta data and decision Applications 2 Inspect, Sort Identify, Track Detect, Count Search, Tag Match, Compare Find,

More information

A Bus-based SoC Architecture for Flexible Module Placement on Reconfigurable FPGAs

A Bus-based SoC Architecture for Flexible Module Placement on Reconfigurable FPGAs The work was published in Proceedings of International Conference on Field-Programmable Logic and Applications (FPL 10), pp. 234-239 A Bus-based SoC Architecture for Flexible Module Placement on Reconfigurable

More information

ECE/CS 757: Advanced Computer Architecture II Interconnects

ECE/CS 757: Advanced Computer Architecture II Interconnects ECE/CS 757: Advanced Computer Architecture II Interconnects Instructor:Mikko H Lipasti Spring 2017 University of Wisconsin-Madison Lecture notes created by Natalie Enright Jerger Lecture Outline Introduction

More information

Motion Estimation and Optical Flow Tracking

Motion Estimation and Optical Flow Tracking Image Matching Image Retrieval Object Recognition Motion Estimation and Optical Flow Tracking Example: Mosiacing (Panorama) M. Brown and D. G. Lowe. Recognising Panoramas. ICCV 2003 Example 3D Reconstruction

More information

Mapping applications into MPSoC

Mapping applications into MPSoC Mapping applications into MPSoC concurrency & communication Jos van Eijndhoven jos@vectorfabrics.com March 12, 2011 MPSoC mapping: exploiting concurrency 2 March 12, 2012 Computation on general purpose

More information

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

Interfacing a High Speed Crypto Accelerator to an Embedded CPU Interfacing a High Speed Crypto Accelerator to an Embedded CPU Alireza Hodjat ahodjat @ee.ucla.edu Electrical Engineering Department University of California, Los Angeles Ingrid Verbauwhede ingrid @ee.ucla.edu

More information

Runtime Application Mapping Using Software Agents

Runtime Application Mapping Using Software Agents 1 Runtime Application Mapping Using Software Agents Mohammad Abdullah Al Faruque, Thomas Ebi, Jörg Henkel Chair for Embedded Systems (CES) Karlsruhe Institute of Technology Overview 2 Motivation Related

More information

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov

CS 378: Autonomous Intelligent Robotics. Instructor: Jivko Sinapov CS 378: Autonomous Intelligent Robotics Instructor: Jivko Sinapov http://www.cs.utexas.edu/~jsinapov/teaching/cs378/ Visual Registration and Recognition Announcements Homework 6 is out, due 4/5 4/7 Installing

More information

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Tom Atwood Business Development Manager Sun Microsystems, Inc. Takeaways Understand the technical differences between

More information

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology

Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore

More information

ELCT 912: Advanced Embedded Systems

ELCT 912: Advanced Embedded Systems ELCT 912: Advanced Embedded Systems Lecture 2-3: Embedded System Hardware Dr. Mohamed Abd El Ghany, Department of Electronics and Electrical Engineering Embedded System Hardware Used for processing of

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

Giancarlo Vasta, Magneti Marelli, Lucia Lo Bello, University of Catania,

Giancarlo Vasta, Magneti Marelli, Lucia Lo Bello, University of Catania, An innovative traffic management scheme for deterministic/eventbased communications in automotive applications with a focus on Automated Driving Applications Giancarlo Vasta, Magneti Marelli, giancarlo.vasta@magnetimarelli.com

More information

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

Runtime Reconfigurable Memory Hierarchy in Embedded Scalable Platforms

Runtime Reconfigurable Memory Hierarchy in Embedded Scalable Platforms Runtime Reconfigurable Memory Hierarchy in Embedded Scalable Platforms Davide Giri Columbia University New York, USA davide_giri@cs.columbia.edu ABSTRACT In heterogeneous systems-on-chip, the optimal choice

More information

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration

Implementing Flexible Interconnect Topologies for Machine Learning Acceleration Implementing Flexible Interconnect for Machine Learning Acceleration A R M T E C H S Y M P O S I A O C T 2 0 1 8 WILLIAM TSENG Mem Controller 20 mm Mem Controller Machine Learning / AI SoC New Challenges

More information

Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays

Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays Acceleration of Optical Flow Computations on Tightly-Coupled Processor Arrays Éricles Rodrigues Sousa 1, Alexandru Tanase 1,VahidLari 1, Frank Hannig 1,Jürgen Teich 1, Johny Paul 2, Walter Stechele 2,

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Copyright 2016 Xilinx

Copyright 2016 Xilinx Zynq Architecture Zynq Vivado 2015.4 Version This material exempt per Department of Commerce license exception TSU Objectives After completing this module, you will be able to: Identify the basic building

More information

100M Gate Designs in FPGAs

100M Gate Designs in FPGAs 100M Gate Designs in FPGAs Fact or Fiction? NMI FPGA Network 11 th October 2016 Jonathan Meadowcroft, Cadence Design Systems Why in the world, would I do that? ASIC replacement? Probably not! Cost prohibitive

More information

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd

Optimizing ARM SoC s with Carbon Performance Analysis Kits. ARM Technical Symposia, Fall 2014 Andy Ladd Optimizing ARM SoC s with Carbon Performance Analysis Kits ARM Technical Symposia, Fall 2014 Andy Ladd Evolving System Requirements Processor Advances big.little Multicore Unicore DSP Cortex -R7 Block

More information

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center

Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Enabling Flexible Network FPGA Clusters in a Heterogeneous Cloud Data Center Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, Paul Chow University of Toronto 1 Cloudy with

More information

SDA: Software-Defined Accelerator for Large- Scale DNN Systems

SDA: Software-Defined Accelerator for Large- Scale DNN Systems SDA: Software-Defined Accelerator for Large- Scale DNN Systems Jian Ouyang, 1 Shiding Lin, 1 Wei Qi, Yong Wang, Bo Yu, Song Jiang, 2 1 Baidu, Inc. 2 Wayne State University Introduction of Baidu A dominant

More information

Automatic Pruning of Autotuning Parameter Space for OpenCL Applications

Automatic Pruning of Autotuning Parameter Space for OpenCL Applications Automatic Pruning of Autotuning Parameter Space for OpenCL Applications Ahmet Erdem, Gianluca Palermo 6, and Cristina Silvano 6 Department of Electronics, Information and Bioengineering Politecnico di

More information

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW

More information