Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms. SAMOS XIV July 14-17,
|
|
- Violet Mills
- 5 years ago
- Views:
Transcription
1 Co-Design of Many-Accelerator Heterogeneous Systems Exploiting Virtual Platforms SAMOS XIV July 14-17,
2 Outline Introduction + Motivation Design requirements for many-accelerator SoCs Design problems Objective Proposed Co-design methodology Virtual Prototyping framework. VP simulation speedup. Co-design formalization. Case study: H.264 decoding server Conclusions 2
3 Introduction Need for performance and energy improvements Many-accelerator SoCs Massive parallelization Pipelining Energy-efficient hardware configuration 3
4 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: Group 1 Group 2 Group N 4
5 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: VIDEO 1 VIDEO 2 VIDEO N Group 1 Group 2 Group N 5
6 Introduction HW/SW co-design for many-accelerator systems Different architectural configurations required in each HW core: Example: Surveillance server: VIDEO 1 VIDEO 2 VIDEO N Group 1 Group 2 Group N Requirements 1 Requirements 2 Requirements N 6
7 Introduction Typical Co-design: Common configuration Area violation Suboptimal design Maximum allowed area
8 Introduction Using different configurations: Constraints met Design optimization Maximum allowed area
9 Design problems Problem 1: Exponentially-increased design space size: A j=1 P aj V i=1 N: # accelerator groups A: Number of accelerators per group P a : Arch. Parameters of accelerator a V p : Value of architectural parameter p Increased number of evaluations is needed. p a i N
10 Design problems Problem 2: Slow system evaluation: Accurate evaluation slow simulation. Increased number of components. Non-productive simulation phases: Phases out of evaluation scope Large number of slow simulations Highly-increased design time
11 Goal of the proposed framework VP framework for co-design of manyaccelerator systems. Supports the use of different hardware core configurations Optimal designs Simulation time reduction. Avoiding non-productive simulation phases. The increased design time can be alleviated.
12 Proposed VP framework SystemC/TLM-based Virtual Platform 12
13 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) 13
14 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Part Accelerators 14
15 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Bridge System (Global) Bus Part Accelerators 15
16 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part RAM LOCAL BUS ROM (Dataset) Bridge Profiler System (Global) Bus Part Accelerators Area Metric (SystemC ports) 16
17 Part Software Part Proposed VP framework SystemC/TLM-based Virtual Platform Instance 1 Instance 2 Instance N RAM ROM (Dataset) RAM ROM (Dataset) RAM ROM (Dataset) LOCAL BUS LOCAL BUS LOCAL BUS Bridge Profiler Bridge Profiler Bridge Profiler System (Global) Bus Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) 17
18 Proposed VP framework SystemC/TLM-based Virtual Platform Instance 1 Instance 2 Instance N RAM ROM (Dataset) RAM ROM (Dataset) RAM ROM (Dataset) Software Part LOCAL BUS LOCAL BUS LOCAL BUS Sync Bridge Profiler Sync Bridge Profiler Sync Bridge Profiler System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) 18
19 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Configuration Global Metrics (Area, Power, Throughput) 19
20 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Configuration Global Metrics (Area, Power, Throughput) 20
21 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Configuration Global Metrics (Area, Power, Throughput) 21
22 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Wait for metrics, for each instance Configuration Global Metrics (Area, Power, Throughput) 22
23 Proposed VP framework SystemC/TLM-based Virtual Platform Software Part Instance 1 RAM LOCAL BUS ROM (Dataset) Instance 2 RAM LOCAL BUS ROM (Dataset) Partial Partial Sync Bridge Profiler Metrics Sync Bridge Profiler Metrics Sync Instance N RAM LOCAL BUS Bridge ROM (Dataset) Profiler Partial Metrics System (Global) Bus Part Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Accelerators Area Metric (SystemC ports) Simulation Control Server Create config. file Simulation Start Wait for metrics, for each instance Write global metrics: - Power: Sum of all HW - Area: Sum of all HW - Throughput: Min. from all HW Configuration Global Metrics (Area, Power, Throughput) 23
24 Proposed VP framework Communication among components: Software 24
25 Proposed VP framework Communication among components: Software Send input data Get input 25
26 Proposed VP framework Communication among components: Software Send input data Get input Process Set delay 26
27 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Set delay 27
28 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay 28
29 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' 29
30 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 30
31 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Get output data Produce output; Ready = 1 Send output 31
32 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Get output data Produce output; Ready = 1 Send output YES More accelerators to manipulate? 32
33 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics 33
34 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics Synchronize Synchronization Module Synchronization point reached for current instance Wait while the remaining instances are synchronized No full Sync 34
35 Proposed VP framework Communication among components: Software Send input data Get input Ready= 0' Process Ready= 0' Set delay Ready= 0' Produce output; Ready = 1 Get output data Send output YES More accelerators to manipulate? NO Send # cycles Profiler Get cycles Combine with HW metrics Publish metrics Synchronize Synchronization Module Synchronization point reached for current instance Continue execution Full Sync. Wait while the remaining instances are synchronized No full Sync 35
36 VP Simulation speedup Process-based Reconfigurable SystemC Module (PRM): Separates the virtual platform into two O/S processes: Static VP process s, memories, auxiliary peripherals Constant during exploration of design space process All hardware accelerators, for a specific group Changes during design space exploration 36
37 VP Simulation speedup Process-based Reconfigurable SystemC Module (PRM) cont d: Instead of restarting the whole simulation, the designer: 1. Pauses simulation 2. Restarts only the hardware process 3. The simulation continues from the point it was paused. Non productive simulation phases are not repeated! Exploration speedup 37
38 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder Shared Memory User Interface (O/S process 3) Process (O/S Process 2) 38
39 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 In Out In Out SystemC SystemC Accelerator... Accelerator 2 K Process (O/S Process 2) 39
40 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Connection to interface Shared Memory User Interface (O/S process 3) Request pause Check pause Continue simulation In Out SystemC Accelerator 1 Shared Memory Forwarder Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 40
41 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) PRM Wrapper Output 1 RUNTIME DESIGN TIME Output 2 Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 41
42 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Pause Request PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 42
43 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Pause Request PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Continue simulation Data & Timing Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool Connection to interface Request pause Check pause Continue simulation Shared Memory Forwarder C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 43
44 VP Simulation speedup PRM structure: Virtual Platform (O/S Process 1) Connection to interface Pause Request Event: New HW configuration Request pause Check pause Continue simulation PRM Wrapper Synchronization Synchronization signal Check: Pause Request && Sync == 1 YES Pause Output 1 Output 2 Pause simulation Continue simulation Initializer Reset = 1 Run without propagating delay Reset = 0 Normal execution Reset signal Data & Timing Forwarder Shared Memory Forwarder Output 3 Input 1 Input 2 Input 3 Input 4 Remaining Platform RUNTIME DESIGN TIME Extracted delay and area metrics High-Level Synthesis tool C or SystemC behavioural code DESIGN TIME RUNTIME Shared Memory User Interface (O/S process 3) In Out SystemC Accelerator 1 Accelerator In Out In Out Computationally SystemC characterization intensive kernel SystemC SystemC (Delay/Area) Accelerator Accelerator... Accelerator 2 K Input ports Output ports Metrics ports Process (O/S Process 2) 44
45 Proposed VP framework + PRM Virtual Platform (Unix process 1) Instance 1 RAM ROM (Dataset) Instance 2 RAM ROM (Dataset) Instance N RAM ROM (Dataset) LOCAL BUS LOCAL BUS LOCAL BUS Partial Partial Sync Bridge Metrics Profiler Sync Bridge Metrics Profiler Sync Bridge Profiler Partial Metrics System (Global) Bus PRM Wrapper Area Metric (SystemC ports) PRM Wrapper Area Metric (SystemC ports) PRM Wrapper Area Metric (SystemC ports) PRM Control PRM Control PRM Control PRM HW Process Instance 1 (Unix process 2) HW 1 HW 2 HW 3 HW 4 PRM HW Process Instance 2 (Unix process 3) HW 1 HW 2 HW 3 HW 4 PRM HW Process Instance N (Unix process N+1) HW 1 HW 2 HW 3 HW 4 Simulation Control Server Create config. file (Re)start all HW Processes Simulation Start / Continue Wait for metrics, for each instance Pause all PRMs Write global metrics: - Power: Sum of all HW - Area: Sum of all HW - Throughput: Min. from all HW Configuration Global Metrics (Area, Power, Throughput) 45
46 Co-design formalization Global area constraint (A max ): Typical N A A max vs. N i=1 Proposed A i A max N: # instances A: Accelerators area for each instance N: # instances A i : Accelerators area for instance i 46
47 Co-design formalization Throughput constraint (T min ) per instance: Typical vs. Proposed T T min T: Throughput of each instance All instances have the same throughput T is the system throughput i 1,, N, T i T min Or equivalently: min i 1,,N T i T min N: # instances T i : Throughput for instance i Exploiting the slacks induced by min i 1,,N T i 47
48 Co-design formalization Optimization objective: Let T i the throughput of instance i 1,, N The instance with the minimum throughput defines the system throughput: T system = min i 1,,N T i Co-design objective: Maximization of T system max T system max min i [1,,N] T i 48
49 Co-design formalization Configuration representation: For a single instance i [1,, N]: V i = p 1 i, p 2 i,, p K i K: Number of architectural parameters (common to all instances) p j i : Value of parameter j [1, 2,, K], for instance i At least two instances are differently configured: i 1, i 2 1,2,, N : i 1 i 2 V i1 V i2 For the overall system: p 1 1, p 2 1,, p K 1, p 1 2, p 2 2,, p K 2,, (p 1 N, p 2 N,, p K N ) 49
50 Case study: H.264 decoding server Video decoding for surveillance server 8 instances 4 accelerators per instance: Inverse cosine tranformation Motion compensation: 1 Luma, 2 Chroma I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 I/O H264 Decoder HW 1 HW 2 Motion Detection HW HW 3 4 Surveillance server 50
51 Case study: H.264 decoding server Constraints: A max = 5.5 mm 2 T min = 13.6 frames per second Exploration: 200 random evaluations Targeting to exploration time up to 20 hours. 51
52 Case study: H.264 decoding server Proposed vs. Software-only Using a solution with the minimum possible throughput. Throughput (fps) T min =
53 Case study: H.264 decoding server Proposed vs. Overdesign Overdesign: Using the same configuration for all instances Proposed Overdesign (max. throughput) Overdesign (5 accelerators) x Area( mm ) A max = 5.5 Proposed Overdesign (5 accelerators) x Throughput (fps) T min =
54 Case study: H.264 decoding server Simulation speedup (using PRM): Bypassing: VP startup (memory allocation etc) Target software initialization Warm-up phase: The dataset is processed without obtaining metrics, in order to minimize cache misses.
55 Conclusions VP co-design framework for many-accelerator systems Groups the HW accelerators Each group uses different configuration Optimal designs H.264 use case: 1.58x less area, similar throughput. Use of Process-based Reconfigurable SystemC Module Simulation speedup Bypassing non-productive simulation phases Investing time improvements for DSE quality. H.264 use case: 40% less simulation time. 55
56 Thank you Questions? 56
Co-synthesis and Accelerator based Embedded System Design
Co-synthesis and Accelerator based Embedded System Design COE838: Embedded Computer System http://www.ee.ryerson.ca/~courses/coe838/ Dr. Gul N. Khan http://www.ee.ryerson.ca/~gnkhan Electrical and Computer
More informationHardware/Software Co-design
Hardware/Software Co-design Zebo Peng, Department of Computer and Information Science (IDA) Linköping University Course page: http://www.ida.liu.se/~petel/codesign/ 1 of 52 Lecture 1/2: Outline : an Introduction
More informationEfficient Hardware Acceleration on SoC- FPGA using OpenCL
Efficient Hardware Acceleration on SoC- FPGA using OpenCL Advisor : Dr. Benjamin Carrion Schafer Susmitha Gogineni 30 th August 17 Presentation Overview 1.Objective & Motivation 2.Configurable SoC -FPGA
More informationHardware Design and Simulation for Verification
Hardware Design and Simulation for Verification by N. Bombieri, F. Fummi, and G. Pravadelli Universit`a di Verona, Italy (in M. Bernardo and A. Cimatti Eds., Formal Methods for Hardware Verification, Lecture
More informationMultimedia Decoder Using the Nios II Processor
Multimedia Decoder Using the Nios II Processor Third Prize Multimedia Decoder Using the Nios II Processor Institution: Participants: Instructor: Indian Institute of Science Mythri Alle, Naresh K. V., Svatantra
More informationMali GPU acceleration of HEVC and VP9 Decoder
Mali GPU acceleration of HEVC and VP9 Decoder 2 Web Video continues to grow!!! Video accounted for 50% of the mobile traffic in 2012 - Citrix ByteMobile's 4Q 2012 Analytics Report. Globally, IP video traffic
More informationCadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015
Cadence SystemC Design and Verification NMI FPGA Network Meeting Jan 21, 2015 The High Level Synthesis Opportunity Raising Abstraction Improves Design & Verification Optimizes Power, Area and Timing for
More informationLecture 7: Introduction to Co-synthesis Algorithms
Design & Co-design of Embedded Systems Lecture 7: Introduction to Co-synthesis Algorithms Sharif University of Technology Computer Engineering Dept. Winter-Spring 2008 Mehdi Modarressi Topics for today
More informationHardware/Software Partitioning and Scheduling of Embedded Systems
Hardware/Software Partitioning and Scheduling of Embedded Systems Andrew Morton PhD Thesis Defence Electrical and Computer Engineering University of Waterloo January 13, 2005 Outline 1. Thesis Statement
More informationNative Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization
Native Simulation of Complex VLIW Instruction Sets Using Static Binary Translation and Hardware-Assisted Virtualization Mian-Muhammad Hamayun, Frédéric Pétrot and Nicolas Fournel System Level Synthesis
More informationDesign methodology for multi processor systems design on regular platforms
Design methodology for multi processor systems design on regular platforms Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego
Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies
More informationParallel Simulation Accelerates Embedded Software Development, Debug and Test
Parallel Simulation Accelerates Embedded Software Development, Debug and Test Larry Lapides Imperas Software Ltd. larryl@imperas.com Page 1 Modern SoCs Have Many Concurrent Processing Elements SMP cores
More informationPerformance Verification for ESL Design Methodology from AADL Models
Performance Verification for ESL Design Methodology from AADL Models Hugues Jérome Institut Supérieur de l'aéronautique et de l'espace (ISAE-SUPAERO) Université de Toulouse 31055 TOULOUSE Cedex 4 Jerome.huges@isae.fr
More informationOptimization of Behavioral IPs in Multi-Processor System-on- Chips
Optimization of Behavioral IPs in Multi-Processor System-on- Chips Yidi Liu and Benjamin Carrion Schafer # Department of Electronic and Information Engineering b.carrionschafer@polyu.edu.hk # Outline High-Level
More informationMOJTABA MAHDAVI Mojtaba Mahdavi DSP Design Course, EIT Department, Lund University, Sweden
High Level Synthesis with Catapult MOJTABA MAHDAVI 1 Outline High Level Synthesis HLS Design Flow in Catapult Data Types Project Creation Design Setup Data Flow Analysis Resource Allocation Scheduling
More informationA novel way to efficiently simulate complex full systems incorporating hardware accelerators
ARM Research Summit 2017 Workshop A novel way to efficiently simulate complex full systems incorporating hardware accelerators Nikolaos Tampouratzis Technical University of Crete, Greece Motivation / The
More informationRuntime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays
Runtime Adaptation of Application Execution under Thermal and Power Constraints in Massively Parallel Processor Arrays Éricles Sousa 1, Frank Hannig 1, Jürgen Teich 1, Qingqing Chen 2, and Ulf Schlichtmann
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationESE Back End 2.0. D. Gajski, S. Abdi. (with contributions from H. Cho, D. Shin, A. Gerstlauer)
ESE Back End 2.0 D. Gajski, S. Abdi (with contributions from H. Cho, D. Shin, A. Gerstlauer) Center for Embedded Computer Systems University of California, Irvine http://www.cecs.uci.edu 1 Technology advantages
More informationEarly Performance-Cost Estimation of Application-Specific Data Path Pipelining
Early Performance-Cost Estimation of Application-Specific Data Path Pipelining Jelena Trajkovic Computer Science Department École Polytechnique de Montréal, Canada Email: jelena.trajkovic@polymtl.ca Daniel
More informationSequential Circuit Design: Principle
Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing asynchronous circuit 4. Inference of basic memory elements
More informationMulti processor systems with configurable hardware acceleration
Multi processor systems with configurable hardware acceleration Ph.D in Electronics, Computer Science and Telecommunications Ph.D Student: Davide Rossi Ph.D Tutor: Prof. Roberto Guerrieri Outline Motivations
More informationEnergy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS
Energy Efficient Computing Systems (EECS) Magnus Jahre Coordinator, EECS Who am I? Education Master of Technology, NTNU, 2007 PhD, NTNU, 2010. Title: «Managing Shared Resources in Chip Multiprocessor Memory
More informationMPSOC Design examples
MPSOC 2007 Eshel Haritan, VP Engineering, Inc. 1 MPSOC Design examples Freescale: ARM1136 + StarCore140e Broadcom: ARM11 + ARM9 + TeakLite + accelerators Qualcomm 4 processors + video, gps, wireless, audio
More informationQEMU and SystemC. Màrius Màrius Montón
QEMU and SystemC March March 2011 2011 QUF'11 QUF'11 Grenoble Grenoble Màrius Màrius Montón Outline Introduction Objectives Virtual Platforms and SystemC Checkpointing for SystemC Conclusions 2 Introduction
More informationEmbedded System Design
Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner 9/29/2011 Outline System design trends Model-based synthesis Transaction level model generation Application
More informationA MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models
A MDD Methodology for Specification of Embedded Systems and Automatic Generation of Fast Configurable and Executable Performance Models Int. Conf. on HW/SW codesign and HW synthesis (CODES-ISSS 2012) Embedded
More informationFPGA. Agenda 11/05/2016. Scheduling tasks on Reconfigurable FPGA architectures. Definition. Overview. Characteristics of the CLB.
Agenda The topics that will be addressed are: Scheduling tasks on Reconfigurable FPGA architectures Mauro Marinoni ReTiS Lab, TeCIP Institute Scuola superiore Sant Anna - Pisa Overview on basic characteristics
More informationDesign Space Exploration of Systems-on-Chip: DIPLODOCUS
Design Space Exploration of Systems-on-Chip: DIPLODOCUS Ludovic Apvrille Telecom ParisTech ludovic.apvrille@telecom-paristech.fr May, 2011 Outline Context Design Space Exploration Ludovic Apvrille DIPLODOCUS
More informationEfficient use of Virtual Prototypes in HW/SW Development and Verification
Efficient use of Virtual Prototypes in HW/SW Development and Verification Rocco Jonack, MINRES Technologies GmbH Eyck Jentzsch, MINRES Technologies GmbH Accellera Systems Initiative 1 Virtual prototype
More informationCOE 561 Digital System Design & Synthesis Introduction
1 COE 561 Digital System Design & Synthesis Introduction Dr. Aiman H. El-Maleh Computer Engineering Department King Fahd University of Petroleum & Minerals Outline Course Topics Microelectronics Design
More informationEEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools
EEM870 Embedded System and Experiment Lecture 4: SoC Design Flow and Tools Wen-Yen Lin, Ph.D. Department of Electrical Engineering Chang Gung University Email: wylin@mail.cgu.edu.tw March 2013 Agenda Introduction
More informationHardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp
Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 µp Lejla Batina, David Hwang, Alireza Hodjat, Bart Preneel and Ingrid Verbauwhede Outline Introduction and Motivation
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationSoftware Quality is Directly Proportional to Simulation Speed
Software Quality is Directly Proportional to Simulation Speed CDNLive! 11 March 2014 Larry Lapides Page 1 Software Quality is Directly Proportional to Test Speed Intuitively obvious (so my presentation
More informationReconOS: An RTOS Supporting Hardware and Software Threads
ReconOS: An RTOS Supporting Hardware and Software Threads Enno Lübbers and Marco Platzner Computer Engineering Group University of Paderborn marco.platzner@computer.org Overview the ReconOS project programming
More informationBuilding a Bridge: from Pre-Silicon Verification to Post-Silicon Validation
Building a Bridge: from Pre-Silicon Verification to Post-Silicon Validation FMCAD, 2008 Moshe Levinger 26/11/2008 Talk Outline Simulation-Based Functional Verification Pre-Silicon Technologies Random Test
More informationCUDA Optimizations WS Intelligent Robotics Seminar. Universität Hamburg WS Intelligent Robotics Seminar Praveen Kulkarni
CUDA Optimizations WS 2014-15 Intelligent Robotics Seminar 1 Table of content 1 Background information 2 Optimizations 3 Summary 2 Table of content 1 Background information 2 Optimizations 3 Summary 3
More informationDiscussion Week 8. TA: Kyle Dewey. Tuesday, November 15, 11
Discussion Week 8 TA: Kyle Dewey Overview Exams Interrupt priority Direct memory access (DMA) Different kinds of I/O calls Caching What I/O looks like Exams Interrupt Priority Process 1 makes an I/O request
More informationvs. GPU Performance Without the Answer University of Virginia Computer Engineering g Labs
Where is the Data? Why you Cannot Debate CPU vs. GPU Performance Without the Answer Chris Gregg and Kim Hazelwood University of Virginia Computer Engineering g Labs 1 GPUs and Data Transfer GPU computing
More informationSCope: Efficient HdS simulation for MpSoC with NoC
SCope: Efficient HdS simulation for MpSoC with NoC Eugenio Villar Héctor Posadas University of Cantabria Marcos Martínez DS2 Motivation The microprocessor will be the NAND gate of the integrated systems
More informationFast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations
FZI Forschungszentrum Informatik at the University of Karlsruhe Fast and Accurate Source-Level Simulation Considering Target-Specific Compiler Optimizations Oliver Bringmann 1 RESEARCH ON YOUR BEHALF Outline
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationHandling Challenges of Multi-Core Technology in Automotive Software Engineering
Model Based Development Tools for Embedded Multi-Core Systems Handling Challenges of Multi-Core Technology in Automotive Software Engineering VECTOR INDIA CONFERENCE 2017 Timing-Architects Embedded Systems
More informationSoC Design for the New Millennium Daniel D. Gajski
SoC Design for the New Millennium Daniel D. Gajski Center for Embedded Computer Systems University of California, Irvine www.cecs.uci.edu/~gajski Outline System gap Design flow Model algebra System environment
More informationComputer Hardware Requirements for Real-Time Applications
Lecture (4) Computer Hardware Requirements for Real-Time Applications Prof. Kasim M. Al-Aubidy Computer Engineering Department Philadelphia University Real-Time Systems, Prof. Kasim Al-Aubidy 1 Lecture
More informationSDSoC: Session 1
SDSoC: Session 1 ADAM@ADIUVOENGINEERING.COM What is SDSoC SDSoC is a system optimising compiler which allows us to optimise Zynq PS / PL Zynq MPSoC PS / PL MicroBlaze What does this mean? Following the
More informationFast dynamic and partial reconfiguration Data Path
Fast dynamic and partial reconfiguration Data Path with low Michael Hübner 1, Diana Göhringer 2, Juanjo Noguera 3, Jürgen Becker 1 1 Karlsruhe Institute t of Technology (KIT), Germany 2 Fraunhofer IOSB,
More informationSequential Circuit Design: Principle
Sequential Circuit Design: Principle Chapter 8 1 Outline 1. Overview on sequential circuits 2. Synchronous circuits 3. Danger of synthesizing async circuit 4. Inference of basic memory elements 5. Simple
More informationParameterized System Design
Parameterized System Design Tony D. Givargis, Frank Vahid Department of Computer Science and Engineering University of California, Riverside, CA 92521 {givargis,vahid}@cs.ucr.edu Abstract Continued growth
More informationFZI Forschungszentrum Informatik
FZ Forschungszentrum nformatik Microelectronic System Design (SM) Performance Analysis of Sequence Diagrams for SoC Design Alexander Viehl, Oliver Bringmann, Wolfgang Rosenstiel S M UML for SoC Design
More informationVirtual PLATFORMS for complex IP within system context
Virtual PLATFORMS for complex IP within system context VP Modeling Engineer/Pre-Silicon Platform Acceleration Group (PPA) November, 12th, 2015 Rocco Jonack Legal Notice This presentation is for informational
More informationOutline. SLD challenges Platform Based Design (PBD) Leveraging state of the art CAD Metropolis. Case study: Wireless Sensor Network
By Alberto Puggelli Outline SLD challenges Platform Based Design (PBD) Case study: Wireless Sensor Network Leveraging state of the art CAD Metropolis Case study: JPEG Encoder SLD Challenge Establish a
More informationTransaction Level Modeling with SystemC. Thorsten Grötker Engineering Manager Synopsys, Inc.
Transaction Level Modeling with SystemC Thorsten Grötker Engineering Manager Synopsys, Inc. Outline Abstraction Levels SystemC Communication Mechanism Transaction Level Modeling of the AMBA AHB/APB Protocol
More informationSoC Design Environment with Automated Configurable Bus Generation for Rapid Prototyping
SoC esign Environment with utomated Configurable Bus Generation for Rapid Prototyping Sang-Heon Lee, Jae-Gon Lee, Seonpil Kim, Woong Hwangbo, Chong-Min Kyung P PElectrical Engineering epartment, KIST,
More informationHardware Software Bring-Up Solutions for ARM v7/v8-based Designs. August 2015
Hardware Software Bring-Up Solutions for ARM v7/v8-based Designs August 2015 SPMI USB 2.0 SLIMbus RFFE LPDDR 2 LPDDR 3 emmc 4.5 UFS SD 3.0 SD 4.0 UFS Bare Metal Software DSP Software Bare Metal Software
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationDesign and Verification of FPGA Applications
Design and Verification of FPGA Applications Giuseppe Ridinò Paola Vallauri MathWorks giuseppe.ridino@mathworks.it paola.vallauri@mathworks.it Torino, 19 Maggio 2016, INAF 2016 The MathWorks, Inc. 1 Agenda
More informationUsing Alluxio to Improve the Performance and Consistency of HDFS Clusters
ARTICLE Using Alluxio to Improve the Performance and Consistency of HDFS Clusters Calvin Jia Software Engineer at Alluxio Learn how Alluxio is used in clusters with co-located compute and storage to improve
More informationHigh-Performance Data Loading and Augmentation for Deep Neural Network Training
High-Performance Data Loading and Augmentation for Deep Neural Network Training Trevor Gale tgale@ece.neu.edu Steven Eliuk steven.eliuk@gmail.com Cameron Upright c.upright@samsung.com Roadmap 1. The General-Purpose
More informationArchitectural-Level Synthesis. Giovanni De Micheli Integrated Systems Centre EPF Lausanne
Architectural-Level Synthesis Giovanni De Micheli Integrated Systems Centre EPF Lausanne This presentation can be used for non-commercial purposes as long as this note and the copyright footers are not
More informationTransparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching
Transparent Throughput Elas0city for IaaS Cloud Storage Using Guest- Side Block- Level Caching Bogdan Nicolae (IBM Research, Ireland) Pierre Riteau (University of Chicago, USA) Kate Keahey (Argonne National
More informationOperating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst
Operating Systems CMPSCI 377 Spring 2017 Mark Corner University of Massachusetts Amherst Last Class: Intro to OS An operating system is the interface between the user and the architecture. User-level Applications
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationCo-Design and Co-Verification using a Synchronous Language. Satnam Singh Xilinx Research Labs
Co-Design and Co-Verification using a Synchronous Language Satnam Singh Xilinx Research Labs Virtex-II PRO Device Array Size Logic Gates PPCs GBIOs BRAMs 2VP2 16 x 22 38K 0 4 12 2VP4 40 x 22 81K 1 4
More informationComputer Architecture. R. Poss
Computer Architecture R. Poss 1 ca01-10 september 2015 Course & organization 2 ca01-10 september 2015 Aims of this course The aims of this course are: to highlight current trends to introduce the notion
More informationModel homogenization for power estimation and design exploration
+ Rabie Ben Atitallah, Associate Professor Université de Lille Nord de France Université de Valenciennes, LAMIH INRIA Lille, DaRT team rabie.benatitallah@univ-valenciennes.fr http://www.lifl.fr/~benatita/
More informationA Parallel Transaction-Level Model of H.264 Video Decoder
Center for Embedded Computer Systems University of California, Irvine A Parallel Transaction-Level Model of H.264 Video Decoder Xu Han, Weiwei Chen and Rainer Doemer Technical Report CECS-11-03 June 2,
More informationMulti Agent Navigation on GPU. Avi Bleiweiss
Multi Agent Navigation on GPU Avi Bleiweiss Reasoning Explicit Implicit Script, storytelling State machine, serial Compute intensive Fits SIMT architecture well Navigation planning Collision avoidance
More informationLast 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture
Last 2 Classes: Introduction to Operating Systems & C++ tutorial User apps OS Virtual machine interface hardware physical machine interface An operating system is the interface between the user and the
More informationLab 1: Using the LegUp High-level Synthesis Framework
Lab 1: Using the LegUp High-level Synthesis Framework 1 Introduction and Motivation This lab will give you an overview of how to use the LegUp high-level synthesis framework. In LegUp, you can compile
More informationAutomatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation
Automatic Instrumentation of Embedded Software for High Level Hardware/Software Co-Simulation Aimen Bouchhima, Patrice Gerin and Frédéric Pétrot System-Level Synthesis Group TIMA Laboratory 46, Av Félix
More informationValidation Strategies with pre-silicon platforms
Validation Strategies with pre-silicon platforms Shantanu Ganguly Synopsys Inc April 10 2014 2014 Synopsys. All rights reserved. 1 Agenda Market Trends Emulation HW Considerations Emulation Scenarios Debug
More informationMaximizing Server Efficiency from μarch to ML accelerators. Michael Ferdman
Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency from μarch to ML accelerators Michael Ferdman Maximizing Server Efficiency with ML accelerators Michael
More informationCustom computing systems
Custom computing systems difference engine: Charles Babbage 1832 - compute maths tables digital orrery: MIT 1985 - special-purpose engine, found pluto motion chaotic Splash2: Supercomputing esearch Center
More informationA SystemC HDL Cosimulation Framework
A SystemC HDL Cosimulation Framework Christian Bernard, CEA/LETI Nicolas Tribié, CEA/LETI Marcello Coppolla, ST/AST A systemc HDL cosimulation framework 1 Agenda Motivatio Cosimulation usages Framework
More informationOperating system integrated energy aware scratchpad allocation strategies for multiprocess applications
University of Dortmund Operating system integrated energy aware scratchpad allocation strategies for multiprocess applications Robert Pyka * Christoph Faßbach * Manish Verma + Heiko Falk * Peter Marwedel
More informationEffective Memory Access Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management
International Journal of Computer Theory and Engineering, Vol., No., December 01 Effective Memory Optimization by Memory Delay Modeling, Memory Allocation, and Slack Time Management Sultan Daud Khan, Member,
More informationECE 448 Lecture 15. Overview of Embedded SoC Systems
ECE 448 Lecture 15 Overview of Embedded SoC Systems ECE 448 FPGA and ASIC Design with VHDL George Mason University Required Reading P. Chu, FPGA Prototyping by VHDL Examples Chapter 8, Overview of Embedded
More informationA Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning
A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific
More informationCenter Extreme Scale CS Research
Center Extreme Scale CS Research Center for Compressible Multiphase Turbulence University of Florida Sanjay Ranka Herman Lam Outline 10 6 10 7 10 8 10 9 cores Parallelization and UQ of Rocfun and CMT-Nek
More informationEfficient Usage of Concurrency Models in an Object-Oriented Co-design Framework
Efficient Usage of Concurrency Models in an Object-Oriented Co-design Framework Piyush Garg Center for Embedded Computer Systems, University of California Irvine, CA 92697 pgarg@cecs.uci.edu Sandeep K.
More informationECE 8823: GPU Architectures. Objectives
ECE 8823: GPU Architectures Introduction 1 Objectives Distinguishing features of GPUs vs. CPUs Major drivers in the evolution of general purpose GPUs (GPGPUs) 2 1 Chapter 1 Chapter 2: 2.2, 2.3 Reading
More informationCS654 Advanced Computer Architecture. Lec 2 - Introduction
CS654 Advanced Computer Architecture Lec 2 - Introduction Peter Kemper Adapted from the slides of EECS 252 by Prof. David Patterson Electrical Engineering and Computer Sciences University of California,
More informationEnergy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards
Energy Estimation Based on Hierarchical Bus Models for Power-Aware Smart Cards U. Neffe, K. Rothbart, Ch. Steger, R. Weiss Graz University of Technology Inffeldgasse 16/1 8010 Graz, AUSTRIA {neffe, rothbart,
More informationExploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems
Exploring the Throughput-Fairness Trade-off on Asymmetric Multicore Systems J.C. Sáez, A. Pousa, F. Castro, D. Chaver y M. Prieto Complutense University of Madrid, Universidad Nacional de la Plata-LIDI
More informationYafit Snir Arindam Guha Cadence Design Systems, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces
Yafit Snir Arindam Guha, Inc. Accelerating System level Verification of SOC Designs with MIPI Interfaces Agenda Overview: MIPI Verification approaches and challenges Acceleration methodology overview and
More informationCSI3131 Final Exam Review
CSI3131 Final Exam Review Final Exam: When: April 24, 2015 2:00 PM Where: SMD 425 File Systems I/O Hard Drive Virtual Memory Swap Memory Storage and I/O Introduction CSI3131 Topics Process Computing Systems
More informationGaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems
Gaining Insights into Multicore Cache Partitioning: Bridging the Gap between Simulation and Real Systems 1 Presented by Hadeel Alabandi Introduction and Motivation 2 A serious issue to the effective utilization
More informationCOMPLEX EMBEDDED SYSTEMS
COMPLEX EMBEDDED SYSTEMS Embedded System Design and Architectures Summer Semester 2012 System and Software Engineering Prof. Dr.-Ing. Armin Zimmermann Contents System Design Phases Architecture of Embedded
More informationHardware Support for Priority Inheritance
Hardware Support for Priority Inheritance Bilge. S. Akgul +, Vincent J. Mooney +, Henrik Thane* and Pramote Kuacharoen + + Center for Research on mbedded Systems and Technology (CRST) + School of lectrical
More informationEmbedded Systems: Hardware Components (part I) Todor Stefanov
Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System
More informationTutorial 11. Final Exam Review
Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache
More informationAccelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation Kevin Hsieh
Accelerating Pointer Chasing in 3D-Stacked : Challenges, Mechanisms, Evaluation Kevin Hsieh Samira Khan, Nandita Vijaykumar, Kevin K. Chang, Amirali Boroumand, Saugata Ghose, Onur Mutlu Executive Summary
More informationDNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
IBM Research AI Systems Day DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Xiaofan Zhang 1, Junsong Wang 2, Chao Zhu 2, Yonghua Lin 2, Jinjun Xiong 3, Wen-mei
More informationEmbedded System Design Modeling, Synthesis, Verification
Modeling, Synthesis, Verification Daniel D. Gajski, Samar Abdi, Andreas Gerstlauer, Gunar Schirner Chapter 4: System Synthesis Outline System design trends Model-based synthesis Transaction level model
More informationFCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA
1 FCUDA-SoC: Platform Integration for Field-Programmable SoC with the CUDAto-FPGA Compiler Tan Nguyen 1, Swathi Gurumani 1, Kyle Rupnow 1, Deming Chen 2 1 Advanced Digital Sciences Center, Singapore {tan.nguyen,
More information