Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,

Size: px
Start display at page:

Download "Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,"

Transcription

1 Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17,

2 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems Objective Proposed Co-design methodology Virtual Prototyping framework. VP simulation speedup. Co-design formalization. Case study: H.264 decoding server Conclusions 2

3 Introduction Need for performance and energy improvements Many-accelerator SoCs Massive parallelization Pipelining Energy-efficient hardware configuration 3

4 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: Group 1 Group 2 Group N 4

5 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: VIDEO 1 VIDEO 2 VIDEO N Group 1 Group 2 Group N 5

6 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: VIDEO 1 VIDEO 2 VIDEO N Group 1 Group 2 Group N Requirements 1 Requirements 2 Requirements N 6

7 Introduction Typical Co-design: Common configuration Area violation Suboptimal design Maximum allowed area

8 Introduction Using different configurations: Constraints met Design optimization Maximum allowed area

9 Design problems Problem 1: Exponentially-increased design space size: A j=1 P aj V i=1 N: # accelerator groups A: Number of accelerators per group P a : Arch. Parameters of accelerator a V p : Value of architectural parameter p Increased number of evaluations is needed. p a i N

10 Design problems Problem 2: Slow system evaluation: Accurate evaluation slow simulation. Increased number of components. Non-productive simulation phases: Phases out of evaluation scope Large number of slow simulations Highly-increased design time

11 Goal of the proposed framework VP framework for co-design of manyaccelerator systems. Supports the use of different hardware core configurations Optimal designs Simulation time reduction. Avoiding non-productive simulation phases. The increased design time can be alleviated.

12 Proposed VP framework SystemC/TLM-based Virtual Platform 12

13 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) 13

14 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Part Accelerators 14

15 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Bridge System (Global) Bus Part Accelerators 15

16 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Bridge Profiler System (Global) Bus Part Accelerators Area Metric (SystemC ports) 16

17 Part Software Part Proposed VP framework SystemC/TLM-based Virtual Platform Instance 1 Instance 2 Instance N RAM ROM (Dataset) RAM ROM (Dataset) RAM ROM (Dataset) LOCAL BUS LOCAL BUS LOCAL BUS Bridge Profiler Bridge Profiler Bridge Profiler System (Global) Bus Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) 17

18 Proposed VP framework SystemC/TLM-based Virtual Platform Instance 1 Instance 2 Instance N RAM ROM (Dataset) RAM ROM (Dataset) RAM ROM (Dataset) Software Part LOCAL BUS LOCAL BUS LOCAL BUS Sync Bridge Profiler Sync Bridge Profiler Sync Bridge Profiler System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) 18

19 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Configuration Global Metrics (Area, Power, Throughput) 19

20 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Configuration Global Metrics (Area, Power, Throughput) 20

21 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Configuration Global Metrics (Area, Power, Throughput) 21

22 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Wait for metrics, for each instance Configuration Global Metrics (Area, Power, Throughput) 22

23 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Wait for metrics, for each instance Write global metrics: - Power: Sum of all HW - Area: Sum of all HW - Throughput: Min. from all HW Configuration Global Metrics (Area, Power, Throughput) 23

24 Proposed VP framework Communication among components: Software 24

25 Proposed VP framework Communication among components: Software Send input data Get input 25

26 Proposed VP framework Communication among components: Software Send input data Get input Process Set delay 26

27 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Set delay 27

28 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay 28

29 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' 29

30 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 30

31 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Get output data Produce output; Ready = 1 Send output 31

32 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Get output data Produce output; Ready = 1 Send output YES More accelerators to manipulate? 32

33 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics 33

34 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics Synchronize Synchronization Module Synchronization point reached for current instance Wait while the remaining instances are synchronized No full Sync 34

35 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics Synchronize Synchronization Module Synchronization point reached for current instance Continue execution Full Sync. Wait while the remaining instances are synchronized No full Sync 35

36 VP Simulation speedup Process-based Reconfigurable SystemC Module (PRM): Separates the virtual platform into two O/S processes: Static VP process s, memories, auxiliary peripherals Constant during exploration of design space process All hardware accelerators, for a specific group Changes during design space exploration 36

37 VP Simulation speedup Process-based Reconfigurable SystemC Module (PRM) cont d: Instead of restarting the whole simulation, the designer: 1. Pauses simulation 2. Restarts only the hardware process 3. The simulation continues from the point it was paused. Non productive simulation phases are not repeated! Exploration speedup 37

38 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder Shared Memory User Interface (O/S process 3) Process (O/S Process 2) 38

39 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 In Out In Out SystemC SystemC Accelerator... Accelerator 2 K Process (O/S Process 2) 39

40 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Shared Memory User Interface (O/S process 3) Request pause Check pause Continue simulation In Out SystemC Accelerator 1 Shared Memory Forwarder Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 40

41 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 RUNTIME DESIGN TIME Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 41

42 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Pause Request PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 42

43 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Pause Request PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Continue simulation Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 43

44 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Connection to interface Pause Request Event: New HW configuration Request pause Check pause Continue simulation PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Continue simulation Initializer Reset = 1 Run without propagating delay Reset = 0 Normal execution Reset signal Data & Timing Forwarder Shared Memory Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 44

45 Proposed VP framework + PRM Virtual Platform (Unix process 1) Instance 1 RAM ROM (Dataset) Instance 2 RAM ROM (Dataset) Instance N RAM ROM (Dataset) LOCAL BUS LOCAL BUS LOCAL BUS Partial Partial Sync Bridge Metrics Profiler Sync Bridge Metrics Profiler Sync Bridge Profiler Partial Metrics System (Global) Bus PRM Wrapper Area Metric (SystemC ports) PRM Wrapper Area Metric (SystemC ports) PRM Wrapper Area Metric (SystemC ports) PRM Control PRM Control PRM Control PRM HW Process Instance 1 (Unix process 2) HW 1 HW 2 HW 3 HW 4 PRM HW Process Instance 2 (Unix process 3) HW 1 HW 2 HW 3 HW 4 PRM HW Process Instance N (Unix process N+1) HW 1 HW 2 HW 3 HW 4 Simulation Control Server Create config. file (Re)start all HW Processes Simulation Start / Continue Wait for metrics, for each instance Pause all PRMs Write global metrics: - Power: Sum of all HW - Area: Sum of all HW - Throughput: Min. from all HW Configuration Global Metrics (Area, Power, Throughput) 45

46 Co-design formalization Global area constraint (A max ): Typical N A A max vs. N i=1 Proposed A i A max N: # instances A: Accelerators area for each instance N: # instances A i : Accelerators area for instance i 46

47 Co-design formalization Throughput constraint (T min ) per instance: Typical vs. Proposed T T min T: Throughput of each instance All instances have the same throughput T is the system throughput i 1,, N, T i T min Or equivalently: min i 1,,N T i T min N: # instances T i : Throughput for instance i Exploiting the slacks induced by min i 1,,N T i 47

48 Co-design formalization Optimization objective: Let T i the throughput of instance i 1,, N The instance with the minimum throughput defines the system throughput: T system = min i 1,,N T i Co-design objective: Maximization of T system max T system max min i [1,,N] T i 48

49 Co-design formalization Configuration representation: For a single instance i [1,, N]: V i = p 1 i, p 2 i,, p K i K: Number of architectural parameters (common to all instances) p j i : Value of parameter j [1, 2,, K], for instance i At least two instances are differently configured: i 1, i 2 1,2,, N : i 1 i 2 V i1 V i2 For the overall system: p 1 1, p 2 1,, p K 1, p 1 2, p 2 2,, p K 2,, (p 1 N, p 2 N,, p K N ) 49

50 Case study: H.264 decoding server Video decoding for surveillance server 8 instances 4 accelerators per instance: Inverse cosine tranformation Motion compensation: 1 Luma, 2 Chroma I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 Surveillance server 50

51 Case study: H.264 decoding server Constraints: A max = 5.5 mm 2 T min = 13.6 frames per second Exploration: 200 random evaluations Targeting to exploration time up to 20 hours. 51

52 Case study: H.264 decoding server Proposed vs. Software-only Using a solution with the minimum possible throughput. Throughput (fps) T min =

53 Case study: H.264 decoding server Proposed vs. Overdesign Overdesign: Using the same configuration for all instances Proposed Overdesign (max. throughput) Overdesign (5 accelerators) x Area( mm ) A max = 5.5 Proposed Overdesign (5 accelerators) x Throughput (fps) T min =

54 Case study: H.264 decoding server Simulation speedup (using PRM): Bypassing: VP startup (memory allocation etc) Target software initialization Warm-up phase: The dataset is processed without obtaining metrics, in order to minimize cache misses.

55 Conclusions VP co-design framework for many-accelerator systems Groups the HW accelerators Each group uses different configuration Optimal designs H.264 use case: 1.58x less area, similar throughput. Use of Process-based Reconfigurable SystemC Module Simulation speedup Bypassing non-productive simulation phases Investing time improvements for DSE quality. H.264 use case: 40% less simulation time. 55

56 Thank you Questions? 56

Co-synthesis and Accelerator based Embedded System Design

Co-synthesis and Accelerator based Embedded System Design Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer

More information

Hardware/Software Co-design

Hardware/Software Co-design Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction

More information

Efficient Hardware Acceleration on SoC- FPGA using OpenCL

Efficient Hardware Acceleration on SoC- FPGA using OpenCL Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA

More information

Hardware Design and Simulation for Verification

Hardware Design and Simulation for Verification Hardware Design and Simulation for Verification by N. Bombieri, F. Fummi, and G. Pravadelli Universit`a di Verona, Italy (in M. Bernardo and A. Cimatti Eds., Formal Methods for Hardware Verification, Lecture

More information

Multimedia Decoder Using the Nios II Processor

Multimedia Decoder Using the Nios II Processor Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra

More information

Mali GPU acceleration of HEVC and VP9 Decoder

Mali GPU acceleration of HEVC and VP9 Decoder Mali GPU acceleration of HEVC and VP9 Decoder 2 Web Video continues to grow!!! Video accounted for 50% of the mobile traffic in 2012 - Citrix ByteMobile's 4Q 2012 Analytics Report. Globally, IP video traffic

More information

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015 Cadence SystemC Design and Verification NMI FPGA Network Meeting Jan 21, 2015 The High Level Synthesis Opportunity Raising Abstraction Improves Design & Verification Optimizes Power, Area and Timing for

More information

Lecture 7: Introduction to Co-synthesis Algorithms

Lecture 7: Introduction to Co-synthesis Algorithms Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today

More information

Hardware/Software Partitioning and Scheduling of Embedded Systems

Hardware/Software Partitioning and Scheduling of Embedded Systems Hardware/Software Partitioning and Scheduling of Embedded Systems Andrew Morton PhD Thesis Defence Electrical and Computer Engineering University of Waterloo January 13, 2005 Outline 1. Thesis Statement

More information

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization

Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis

More information

Design methodology for multi processor systems design on regular platforms

Design methodology for multi processor systems design on regular platforms Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline

More information

Accelerated Machine Learning Algorithms in Python

Accelerated Machine Learning Algorithms in Python Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies

More information

Parallel Simulation Accelerates Embedded Software Development, Debug and Test

Parallel Simulation Accelerates Embedded Software Development, Debug and Test Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores

More information

Performance Verification for ESL Design Methodology from AADL Models

Performance Verification for ESL Design Methodology from AADL Models Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr

More information

Optimization of Behavioral IPs in Multi-Processor System-on- Chips

Optimization of Behavioral IPs in Multi-Processor System-on- Chips Optimization of Behavioral IPs in Multi-Processor System-on- Chips Yidi Liu and Benjamin Carrion Schafer # Department of Electronic and Information Engineering b.carrionschafer@polyu.edu.hk # Outline High-Level

More information

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden

MOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling

More information

A novel way to efficiently simulate complex full systems incorporating hardware accelerators

A novel way to efficiently simulate complex full systems incorporating hardware accelerators ARM Research Summit 2017 Workshop A novel way to efficiently simulate complex full systems incorporating hardware accelerators Nikolaos Tampouratzis Technical University of Crete, Greece Motivation / The

More information

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays

Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)

ESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer) ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages

More information

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining

Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel

More information

Sequential Circuit Design: Principle

Sequential Circuit Design: Principle Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing asynchronous circuit 4. Inference of basic memory elements

More information

Multi processor systems with configurable hardware acceleration

Multi processor systems with configurable hardware acceleration Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations

More information

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS

Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory

More information

MPSOC Design examples

MPSOC Design examples MPSOC 2007 Eshel Haritan, VP Engineering, Inc. 1 MPSOC Design examples Freescale: ARM1136 + StarCore140e Broadcom: ARM11 + ARM9 + TeakLite + accelerators Qualcomm 4 processors + video, gps, wireless, audio

More information

QEMU and SystemC. Màrius Màrius Montón

QEMU and SystemC. Màrius Màrius Montón QEMU and SystemC March March 2011 2011 QUF'11 QUF'11 Grenoble Grenoble Màrius Màrius Montón Outline Introduction Objectives Virtual Platforms and SystemC Checkpointing for SystemC Conclusions 2 Introduction

More information

Embedded System Design

Embedded System Design Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner 9/29/2011 Outline System design trends Model-based synthesis Transaction level model generation Application

More information

A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models

A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models Int. Conf. on HW/SW codesign and HW synthesis (CODES-ISSS 2012) Embedded

More information

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.

FPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB. Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics

More information

Design Space Exploration of Systems-on-Chip: DIPLODOCUS

Design Space Exploration of Systems-on-Chip: DIPLODOCUS Design Space Exploration of Systems-on-Chip: DIPLODOCUS Ludovic Apvrille Telecom ParisTech ludovic.apvrille@telecom-paristech.fr May, 2011 Outline Context Design Space Exploration Ludovic Apvrille DIPLODOCUS

More information

Efficient use of Virtual Prototypes in HW/SW Development and Verification

Efficient use of Virtual Prototypes in HW/SW Development and Verification Efficient use of Virtual Prototypes in HW/SW Development and Verification Rocco Jonack, MINRES Technologies GmbH Eyck Jentzsch, MINRES Technologies GmbH Accellera Systems Initiative 1 Virtual prototype

More information

COE 561 Digital System Design & Synthesis Introduction

COE 561 Digital System Design & Synthesis Introduction 1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design

More information

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools

EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction

More information

Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp

Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp Lejla Batina, David Hwang, Alireza Hodjat, Bart Preneel and Ingrid Verbauwhede Outline Introduction and Motivation

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

Software Quality is Directly Proportional to Simulation Speed

Software Quality is Directly Proportional to Simulation Speed Software Quality is Directly Proportional to Simulation Speed CDNLive! 11 March 2014 Larry Lapides Page 1 Software Quality is Directly Proportional to Test Speed Intuitively obvious (so my presentation

More information

ReconOS: An RTOS Supporting Hardware and Software Threads

ReconOS: An RTOS Supporting Hardware and Software Threads ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming

More information

Building a Bridge: from Pre-Silicon Verification to Post-Silicon Validation

Building a Bridge: from Pre-Silicon Verification to Post-Silicon Validation Building a Bridge: from Pre-Silicon Verification to Post-Silicon Validation FMCAD, 2008 Moshe Levinger 26/11/2008 Talk Outline Simulation-Based Functional Verification Pre-Silicon Technologies Random Test

More information

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni

CUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3

More information

Discussion Week 8. TA: Kyle Dewey. Tuesday, November 15, 11

Discussion Week 8. TA: Kyle Dewey. Tuesday, November 15, 11 Discussion Week 8 TA: Kyle Dewey Overview Exams Interrupt priority Direct memory access (DMA) Different kinds of I/O calls Caching What I/O looks like Exams Interrupt Priority Process 1 makes an I/O request

More information

vs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs

vs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing

More information

SCope: Efficient HdS simulation for MpSoC with NoC

SCope: Efficient HdS simulation for MpSoC with NoC SCope: Efficient HdS simulation for MpSoC with NoC Eugenio Villar Héctor Posadas University of Cantabria Marcos Martínez DS2 Motivation The microprocessor will be the NAND gate of the integrated systems

More information

Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations

Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations FZI Forschungszentrum Informatik at the University of Karlsruhe Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations Oliver Bringmann 1 RESEARCH ON YOUR BEHALF Outline

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Handling Challenges of Multi-Core Technology in Automotive Software Engineering

Handling Challenges of Multi-Core Technology in Automotive Software Engineering Model Based Development Tools for Embedded Multi-Core Systems Handling Challenges of Multi-Core Technology in Automotive Software Engineering VECTOR INDIA CONFERENCE 2017 Timing-Architects Embedded Systems

More information

SoC Design for the New Millennium Daniel D. Gajski

SoC Design for the New Millennium Daniel D. Gajski SoC Design for the New Millennium Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski Outline System gap Design flow Model algebra System environment

More information

Computer Hardware Requirements for Real-Time Applications

Computer Hardware Requirements for Real-Time Applications Lecture (4) Computer Hardware Requirements for Real-Time Applications Prof. Kasim M. Al-Aubidy Computer Engineering Department Philadelphia University Real-Time Systems, Prof. Kasim Al-Aubidy 1 Lecture

More information

SDSoC: Session 1

SDSoC: Session 1 SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the

More information

Fast dynamic and partial reconfiguration Data Path

Fast dynamic and partial reconfiguration Data Path Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,

More information

Sequential Circuit Design: Principle

Sequential Circuit Design: Principle Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing async circuit 4. Inference of basic memory elements 5. Simple

More information

Parameterized System Design

Parameterized System Design Parameterized System Design Tony D. Givargis, Frank Vahid Department of Computer Science and Engineering University of California, Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Abstract Continued growth

More information

FZI Forschungszentrum Informatik

FZI Forschungszentrum Informatik FZ Forschungszentrum nformatik Microelectronic System Design (SM) Performance Analysis of Sequence Diagrams for SoC Design Alexander Viehl, Oliver Bringmann, Wolfgang Rosenstiel S M UML for SoC Design

More information

Virtual PLATFORMS for complex IP within system context

Virtual PLATFORMS for complex IP within system context Virtual PLATFORMS for complex IP within system context VP Modeling Engineer/Pre-Silicon Platform Acceleration Group (PPA) November, 12th, 2015 Rocco Jonack Legal Notice This presentation is for informational

More information

Outline. SLD challenges Platform Based Design (PBD) Leveraging state of the art CAD Metropolis. Case study: Wireless Sensor Network

Outline. SLD challenges Platform Based Design (PBD) Leveraging state of the art CAD Metropolis. Case study: Wireless Sensor Network By Alberto Puggelli Outline SLD challenges Platform Based Design (PBD) Case study: Wireless Sensor Network Leveraging state of the art CAD Metropolis Case study: JPEG Encoder SLD Challenge Establish a

More information

Transaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc.

Transaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc. Transaction Level Modeling with SystemC Thorsten Grötker Engineering Manager Synopsys, Inc. Outline Abstraction Levels SystemC Communication Mechanism Transaction Level Modeling of the AMBA AHB/APB Protocol

More information

SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping

SoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping SoC esign Environment with utomated Configurable Bus Generation for Rapid Prototyping Sang-Heon Lee, Jae-Gon Lee, Seonpil Kim, Woong Hwangbo, Chong-Min Kyung P PElectrical Engineering epartment, KIST,

More information

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015

Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015 Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs August 2015 SPMI USB 2.0 SLIMbus RFFE LPDDR 2 LPDDR 3 emmc 4.5 UFS SD 3.0 SD 4.0 UFS Bare Metal Software DSP Software Bare Metal Software

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Design and Verification of FPGA Applications

Design and Verification of FPGA Applications Design and Verification of FPGA Applications Giuseppe Ridinò Paola Vallauri MathWorks giuseppe.ridino@mathworks.it paola.vallauri@mathworks.it Torino, 19 Maggio 2016, INAF 2016 The MathWorks, Inc. 1 Agenda

More information

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters

Using Alluxio to Improve the Performance and Consistency of HDFS Clusters ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve

More information

High-Performance Data Loading and Augmentation for Deep Neural Network Training

High-Performance Data Loading and Augmentation for Deep Neural Network Training High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose

More information

Architectural-Level Synthesis. Giovanni De Micheli Integrated Systems Centre EPF Lausanne

Architectural-Level Synthesis. Giovanni De Micheli Integrated Systems Centre EPF Lausanne Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long as this note and the copyright footers are not

More information

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching

Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National

More information

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Last Class: Intro to OS An operating system is the interface between the user and the architecture. User-level Applications

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

Co-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs

Co-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs Co-Design and Co-Verification using a Synchronous Language Satnam Singh Xilinx Research Labs Virtex-II PRO Device Array Size Logic Gates PPCs GBIOs BRAMs 2VP2 16 x 22 38K 0 4 12 2VP4 40 x 22 81K 1 4

More information

Computer Architecture. R. Poss

Computer Architecture. R. Poss Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion

More information

Model homogenization for power estimation and design exploration

Model homogenization for power estimation and design exploration + Rabie Ben Atitallah, Associate Professor Université de Lille Nord de France Université de Valenciennes, LAMIH INRIA Lille, DaRT team rabie.benatitallah@univ-valenciennes.fr http://www.lifl.fr/~benatita/

More information

A Parallel Transaction-Level Model of H.264 Video Decoder

A Parallel Transaction-Level Model of H.264 Video Decoder Center for Embedded Computer Systems University of California, Irvine A Parallel Transaction-Level Model of H.264 Video Decoder Xu Han, Weiwei Chen and Rainer Doemer Technical Report CECS-11-03 June 2,

More information

Multi Agent Navigation on GPU. Avi Bleiweiss

Multi Agent Navigation on GPU. Avi Bleiweiss Multi Agent Navigation on GPU Avi Bleiweiss Reasoning Explicit Implicit Script, storytelling State machine, serial Compute intensive Fits SIMT architecture well Navigation planning Collision avoidance

More information

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the

More information

Lab 1: Using the LegUp High-level Synthesis Framework

Lab 1: Using the LegUp High-level Synthesis Framework Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile

More information

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation

Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System-Level Synthesis Group TIMA Laboratory 46, Av Félix

More information

Validation Strategies with pre-silicon platforms

Validation Strategies with pre-silicon platforms Validation Strategies with pre-silicon platforms Shantanu Ganguly Synopsys Inc April 10 2014 2014 Synopsys. All rights reserved. 1 Agenda Market Trends Emulation HW Considerations Emulation Scenarios Debug

More information

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman

Maximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael

More information

Custom computing systems

Custom computing systems Custom computing systems difference engine: Charles Babbage 1832 - compute maths tables digital orrery: MIT 1985 - special-purpose engine, found pluto motion chaotic Splash2: Supercomputing esearch Center

More information

A SystemC HDL Cosimulation Framework

A SystemC HDL Cosimulation Framework A SystemC HDL Cosimulation Framework Christian Bernard, CEA/LETI Nicolas Tribié, CEA/LETI Marcello Coppolla, ST/AST A systemc HDL cosimulation framework 1 Agenda Motivatio Cosimulation usages Framework

More information

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications

Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel

More information

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management

Effective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,

More information

ECE 448 Lecture 15. Overview of Embedded SoC Systems

ECE 448 Lecture 15. Overview of Embedded SoC Systems ECE 448 Lecture 15 Overview of Embedded SoC Systems ECE 448 FPGA and ASIC Design with VHDL George Mason University Required Reading P. Chu, FPGA Prototyping by VHDL Examples Chapter 8, Overview of Embedded

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Center Extreme Scale CS Research

Center Extreme Scale CS Research Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek

More information

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework

Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Piyush Garg Center for Embedded Computer Systems, University of California Irvine, CA 92697 pgarg@cecs.uci.edu Sandeep K.

More information

ECE 8823: GPU Architectures. Objectives

ECE 8823: GPU Architectures. Objectives ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading

More information

CS654 Advanced Computer Architecture. Lec 2 - Introduction

CS654 Advanced Computer Architecture. Lec 2 - Introduction CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California,

More information

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards

Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards U. Neffe, K. Rothbart, Ch. Steger, R. Weiss Graz University of Technology Inffeldgasse 16/1 8010 Graz, AUSTRIA {neffe, rothbart,

More information

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems

Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI

More information

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces

Yafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Yafit Snir Arindam Guha, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Agenda Overview: MIPI Verification approaches and challenges Acceleration methodology overview and

More information

CSI3131 Final Exam Review

CSI3131 Final Exam Review CSI3131 Final Exam Review Final Exam: When: April 24, 2015 2:00 PM Where: SMD 425 File Systems I/O Hard Drive Virtual Memory Swap Memory Storage and I/O Introduction CSI3131 Topics Process Computing Systems

More information

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems

Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization

More information

COMPLEX EMBEDDED SYSTEMS

COMPLEX EMBEDDED SYSTEMS COMPLEX EMBEDDED SYSTEMS Embedded System Design and Architectures Summer Semester 2012 System and Software Engineering Prof. Dr.-Ing. Armin Zimmermann Contents System Design Phases Architecture of Embedded

More information

Hardware Support for Priority Inheritance

Hardware Support for Priority Inheritance Hardware Support for Priority Inheritance Bilge. S. Akgul +, Vincent J. Mooney +, Henrik Thane* and Pramote Kuacharoen + + Center for Research on mbedded Systems and Technology (CRST) + School of lectrical

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh

Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary

More information

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei

More information

Embedded System Design Modeling, Synthesis, Verification

Embedded System Design Modeling, Synthesis, Verification Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner Chapter 4: System Synthesis Outline System design trends Model-based synthesis Transaction level model

More information

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA

FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA 1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,

More information