Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems
|
|
- Caroline Townsend
- 6 years ago
- Views:
Transcription
1 Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems Gert Goossens CEO Target Compiler Technologies Target Compiler Technologies 1
2 Low Power: Back to Basics Dynamic power P dyn = C (A f clock ) V dd 2 Locality of reference Avoid switching f-scaling Concurrency: task-, data-, instr.-level Low-V technology V-scaling Leakage power ["Moore's Law Meets Static Power", Computer, IEEE Comp. Soc., Dec. 2003] P leak = I leak V dd ~ (a -Vt N gates W dev ) V dd Multitreshold libraries Minimal logic Power gating 2012 Target Compiler Technologies 2
3 Heterogeneous Multicore SoC Dual and quad-core ARM Concurrency in control processing: task-level parallelism Big.LITTLE: minimal logic for given task; voltage and frequency scaling 2012 Target Compiler Technologies 3
4 Heterogeneous Multicore SoC Multiple ASIPs Application-Specific Instruction-set Processors Concurrency in data processing: task-, data-, instr.-level parallelism Minimal logic for given task: architectural specialization Locality of reference: local memories and interconnect Power gating of ASIPs when not in use 2012 Target Compiler Technologies 4
5 Heterogeneous Multicore SoC Hardwired datapath an endangered species? Except in a few cases, market dynamics require programmability ASIPs enable programmability and efficiency ARM & ASIPs offer a familiar software approach to SoC design: quick algorithm mapping from C/Matlab through SDK/debug tools 2012 Target Compiler Technologies 5
6 No MPSoC Design Without Tools Tools at IP level (ASIP cores) Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level (multicore) Code parallelisation Communication and synchronization Multicore platform generation MP Designer 2012 Target Compiler Technologies 6
7 IP Designer Tool Suite 2012 Target Compiler Technologies 7
8 Architectural optimisation space ASIP architectural optimisation space Parallelism Specialisation Instructionlevel parallelism Datalevel parallelism Tasklevel parallelism App.- specific data types App.- specific instructions Pipeline Connectivity & storage matching application s data-flow Orthogonal instruction set (VLIW) Encoded instruction set Vector processing (SIMD) Multicore Multithreading Integer, fractional, floating-point, bits, complex, vector Distributed regs, sub-ranges Multiple mem s, sub-ranges nml and IP Designer Support a wide range of ASIP architectures Enable true architectural exploration Make ASIP design easy App.-spec. memory addressing Direct, indirect, postmodification, indexed, stack indirect App.-spec. data processing Any exotic operator Single or multi-cycle App.-spec. control processing Jumps, subroutines, interrupts, HW do-loops, residual control, predication Relative or absolute, address range, delay slots 2012 Target Compiler Technologies 8
9 IP Designer s C Compiler DSPstone benchmark on TI C55x Target s compiler TI s compiler Gain (Target vs TI) Cycles Code Size Cycles Code Size Cycles Code Size Small-scale examples FIR restrict % -5% Convolution repeat % -31% LMS original % 13% Matrix repeat % 13% IIR, N=4 restrict % 18% IIR, N=16 restrict % 18% 22% 4% Large-scale examples FFT bit reverse original % 0% FFT butterfly original % 6% ADPCM original % 9% 7% 5% Graph-based C compiler technology offers retargetability and efficiency at same time Compilable sub-set of TI C55x modelled in nml in 2.5 person-months Only few C code modifications made: repeat loop, restrict pointers 2012 Target Compiler Technologies 9
10 IP Designer s RTL Generator Example: audio DSP (90 nm, clock 220 MHz, 0.9V) Area (kgates) 0-60% A B C D E Power ( W/MHz) IP Designer configuration options A Standard RTL generation B Clock gating + operand isolation for functional units C Operand isolation for multiplexers D Latching of register addresses in instruction decoder E Manual design by customer Low-power optimisations yield 60% savings Low-power optimisations have small area cost Area and power within percentages from hand-optimized design 2012 Target Compiler Technologies 10
11 IP Designer Example Wolverine platform Ultra-low power multi-core platform, optimised for audio coding Used in hearing instruments and Bluetooth headsets 20-bit precision 4 micro-dsp VLIW-ASIPs + 4 filter accelerators + 1 micro-processor core 0.04 mw/m-mac, 42 MIPS at fclock = 2 MHz (0.13u CMOS) 2012 Target Compiler Technologies 11 Sound Design Technologies Reproduced with permission
12 IP Designer Example Reed-Solomon coding ASIP FEC for wireless link in personal health-care systems IEEE a, supports RS(63, 55) encoding/decoding Concurrency Data-level: 8 elements SIMD, 6-bit/element Instruction-level: 2 scalar + 2 vector issue slots Specialisation Data and instruction-level parallelism Hardware for finite-field multiplication and addition 2012 Target Compiler Technologies 12 IMEC Reproduced with permission
13 IP Designer Example Reed-Solomon coding ASIP Performance Gate count 2012 Target Compiler Technologies 13 IMEC Reproduced with permission
14 IP Designer Example Reed-Solomon coding ASIP Energy 2012 Target Compiler Technologies IMEC Reproduced with permission
15 TM IP Designer Market Adoption Medical Audio Video & imaging Wireless Wireline Network processing High-perf. computing Automotive Crypto & identification Shown are publicly announced IP Designer customers Estimate about 150 unique SoC products based on IP Designer in the market today 2012 Target Compiler Technologies 15
16 No MPSoC Design Without Tools Tools at IP level (ASIP cores) Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level (multicore) Code parallelisation Communication and synchronization Multicore platform generation MP Designer 2012 Target Compiler Technologies 16
17 MP Designer Tool Suite 2012 Target Compiler Technologies 17
18 User-Guided Parallelisation Label C code blocks for parallelisation int main(int argc,char void main_encoder(struct *argv[]){ void image* process_du(sbyte* img) { CDU, int init_all(); vlc_init: { DCY=0;DCCb=0;DCCr=0; blk) { } parsection: { for (ypos=0..height) { jpg_open: { for (xpos=0...width) SWORD { DU[64]; jpg_fopen(jpg_filename); for (blk=0..5) { fdct_main: { writeword(0xffd8); SBYTE //SOI DU[64]; int* int_fdtbl = write_app0info(); loading: blk_fdtbl[blk]; } load_data_unit_from_rgb_buffer(img, fdct_and_quantization(cdu, main_encoder(&in_img); xpos, ypos, blk, int_fdtbl,du); process_du(du,blk); } jpg_close: { } writeword(0xffd9); } //EOI vlc_main: { jpg_fclose(); } //Encode ACs } vlc_fini: { // Bit-alignment while (i<=end0pos) of EOI marker {... } } if (bytepos>=0) { if (end0pos!=63) free(in_img.rgb_buffer); writebits((1<<(bytepos+1))-1, writebits(eob); bytepos+1); return 0; } } } } } } 2012 Target Compiler Technologies 18
19 User-Guided Parallelisation Parallelisation pragmas processor P0 type dlx processor P1 type dlx processor P2 type dlx parallel ParRegion lbl main::parsection task LOAD target P0 include lbl main_encoder::loading Sample parallelisation on 3-core architecture task DCT target P1 include lbl process_du::fdct_main task VLC target P2 include lbl main::jpg_open include lbl main_encoder::vlc_init include lbl process_du::vlc_main include lbl main_encoder::vlc_fini include lbl main::jpg_close 2012 Target Compiler Technologies 19
20 Exploration For each parallelisation, MP Designer shows task graph with estimated processor loads Tas k 0 " LOA D" Pro c 0 " P0" ( dlx) ma in_en code r::lo ading : 35.8 % IN: <no ne> par _reg ion [ b0] m ain_e ncod er() [b17 ] f or [b 18] for [b 19] OU T: <no ne> for [ b20] JPEG encoding on 3-DLX architecture [NC dep T0 -> b2 0] DU (64 ) [FF ] T ask 1 "DC T" P roc 1 "P1 " (dl x) p roce ss_d U::fd ct_m ain: 22.2 % p roce ss_d U::q uant_ main : 25.0 % * TOT AL* 47.1 % I N: i n_im g.hei ght i n_im g.wi dth p ar_r egion [b0 ] main _enc oder () [b 17] for [b18 ] for [b19 ] O UT: < none > fo r [b2 0] p roce ss_d U() [ b22] [NC dep T1 - > b 22] DU_ ZZ (1 28) [FF] en d0p os (4 ) Task 2 "V LC" Proc 2 "P 2" (d lx) main ::jpg _ope n: 0.0 % main _enc oder ::vlc_ init: 0.0 % proce ss_d U::v lc_m ain: 16.9 % main _enc oder ::vlc_ fini: 0.0 % main ::jpg _clos e: 0.0 % *TOT AL* 16.9 % IN: JPG_ filen ame SOF0 info.heig ht SOF0 info.wid th in_im g.he ight in_im g.w idth par_ regio n [b0 ] mai n_en code r() [b 17] for [b18 ] fo r [b1 9] OUT : <non e> f or [b 20] proce ss_d U() [b22 ] 2012 Target Compiler Technologies 20
21 Exploration T0 "bs" (dlx) bitstream_hdr:2.0 % bitstream_cfs: 8.2 % idct_p_inter: 8.8 % idct_b: 1.0 % make_p_mv: 0.2 % make_b_mv: 0.0 % *TOTAL* 20.4 % par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] Task graph for H263 encoding on 8 cores [NC dep T0 -> b0] h+w (4) [NC dep T0 -> b11] h+w (4) [NC dep T0 -> b19] start (4) [NC dep T0 -> b23] MBAmax (4) [NC dep T0 -> b26] B_MV (16) [FF] P_MV (48) [FF] comm_mb (4) comm_pict_hdr (2) mvdbxy (4) T1 "recon" (dlx) rec_b: 3.1 % rec_p: 20.5 % *TOTAL* 23.7 % IN: framenum has_startcode par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] [NC dep T0 -> b19] start (4) pb_frame (4) [NC dep T0 -> b23] MBA (4) MBAmax (4) [LC dep T0 -> b26] bx (4) by (4) pmvdbxy (4) [NC dep T0 -> b26] B_MV (16) [CF] Mode+pCBP+pCBPB+pCOD (4x4) pblk+bblk (768) [FF] trb+trd (2x4) [NC dep T0 -> b19] start (4) pb_frame (4) [NC dep T0 -> b23] MBAmax (4) [NC dep T1 -> b26] brec (384) [FC] prec (384) [FF] [NC dep T0 -> b26] comm_mb (4) err (4) pb_frame (4) [NC dep T0 -> b19] start (4) [NC dep T0 -> b23] MBAmax (4) T2 "addblock" (dlx) idct_p_intra: 2.8 % addblock_p_intra: 1.2 % addblock_p_inter: 3.8 % reconblock_b: 6.7 % addblock_b: 0.5 % *TOTAL* 14.9 % IN: framenum has_startcode refidct par_region [b0] while [b19] getpict() [b20] getmbs() [b23] while [b26] Global dependency analysis automatically ensures correct communication & synchronisation Manual design would be error-prone [NC dep T2 -> b26] brec (384) [CF] T3 "store_bmb" (dlx) store_mb_b: 15.1 % init_idct: 0.1 % *TOTAL* 15.3 % IN: framenum has_startcode refidct par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] store_mb_b() [b72] [NC dep T2 -> b26] prec (384) [CF] [NC dep T3 -> b72] b_rgb_mb (884) [CF] T4 "out" (dlx) store_mb_p: 21.8 % mb_rgb_copyin: 1.6 % padding_mb: 2.5 % *TOTAL* 26.0 % IN: framenum has_startcode outputname par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] store_mb_b() [b72] comm_mb (4) comm_pict_hdr (2) 2012 Target Compiler Technologies 21
22 Exploration JPEG encoding on multi-dlx architecture Algorithm # Cores Parallelisation Mcycles* seq par Speed up Load (%) P0 P1 P2 P3 P4 Efficiency (%) Original Original 2 ld+dct+q vlc Original 3 ld dct+q vlc Original 4 ld dct q vlc Optimised 2 ld+dct q+vlc Optimised 3 ld dct+q vlc Optimised 3 ld dct q+vlc Optimised 4 ld dct q vlc Split quant 3 ld dct+q0 q1+vlc Dual load 5 ld0 ld1 dct q vlc Entire exploration in only days of time * Cycles for 256x160-pixel image 2012 Target Compiler Technologies 22
23 Multicore SDK ISS Core-1 ISS Core-2 ISS Core-3 JTAG controller Multicore simulation Multicore on-chip debugging Debug controller HW Core-1 Debug controller HW Core-2 Debug controller HW Core Target Compiler Technologies 23
24 Conclusion ASIPs enable low-power, acceleration and programmability in ARM-based multicore SoCs No (efficient) multicore SoC design without tools Design and programming of individual ASIP cores Multicore parallelisation and platform generation Target can be your ASIP and multicore tools partner More information Come to our booth Brochure in your conference bag Target Compiler Technologies 24
In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures
In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures Yankin Tanurhan Vice President R&D, Solutions Group MPSoC, July 2014 2014 Synopsys, Inc. All rights reserved.
More informationEnabling the design of multicore SoCs with ARM cores and programmable accelerators
Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies
More informationAdding C Programmability to Data Path Design
Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On
More informationCase study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor
Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda
More informationLow-Power Processor Solutions for Always-on Devices
Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile
More informationGeneral Purpose Signal Processors
General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:
More informationConfigurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.
Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are
More informationStorage I/O Summary. Lecture 16: Multimedia and DSP Architectures
Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal
More information04 - DSP Architecture and Microarchitecture
September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.
More informationAn Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki
An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &
More informationINSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing
UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended
More informationModeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano
Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market
More informationEmbedded Systems. 7. System Components
Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic
More informationDesign of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1
Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later
More information04 - DSP Architecture and Microarchitecture
September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:
More informationThe World Leader in High Performance Signal Processing Solutions. DSP Processors
The World Leader in High Performance Signal Processing Solutions DSP Processors NDA required until November 11, 2008 Analog Devices Processors Broad Choice of DSPs Blackfin Media Enabled, 16/32- bit fixed
More informationInstruction Set Principles and Examples. Appendix B
Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of
More informationEmbedded Computation
Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,
More informationIndependent DSP Benchmarks: Methodologies and Results. Outline
Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationC-Based Hardware Design Platform for Dynamically Reconfigurable Processor
C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW
More informationGeneral Purpose Processors
Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program
More informationDIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING
1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation
More informationSystem-on-Chip Architecture for Mobile Applications. Sabyasachi Dey
System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution
More informationEECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007
EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides
More informationEE382V: System-on-a-Chip (SoC) Design
EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University
More informationAnand Raghunathan
ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,
More informationProcessor Design. Introduction, part I
Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital
More informationAge nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications
Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor
More informationThe Processor: Instruction-Level Parallelism
The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy
More informationARM Processors for Embedded Applications
ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or
More information4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16
4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt
More informationInstruction Set Architecture ISA ISA
Instruction Set Architecture ISA Today s topics: Note: desperate attempt to get back on schedule we won t cover all of these slides use for reference Risk vs. CISC x86 does both ISA influence on performance
More informationDeveloping and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors
Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Paul Ekas, DSP Engineering, Altera Corp. pekas@altera.com, Tel: (408) 544-8388, Fax: (408) 544-6424 Altera Corp., 101
More informationThe MAXQ TM Family of High Performance Microcontrollers
The MAXQ TM Family of High Performance Microcontrollers Kris Ardis Senior Software Engineer Dallas Semiconductor/MAXIM http://www.maxim-ic.com/maxq ic.com/maxq Microprocessor Summit [MPS-920] Booth 826
More informationSoftware Defined Modem A commercial platform for wireless handsets
Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from
More information02 - Numerical Representation and Introduction to Junior
02 - Numerical Representation and Introduction to Junior September 10, 2013 Todays lecture Finite length effects, continued from Lecture 1 How to handle overflow Introduction to the Junior processor Demonstration
More informationAn introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures
An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?
More informationA 1-GHz Configurable Processor Core MeP-h1
A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationThe Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006
The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content
More informationEvaluation of Static and Dynamic Scheduling for Media Processors. Overview
Evaluation of Static and Dynamic Scheduling for Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Wayne Wolf Overview Media Processing Present and Future Evaluation
More informationEE282 Computer Architecture. Lecture 1: What is Computer Architecture?
EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer
More informationDesign of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1
Design of Embedded DSP Processors Unit 5: Data access 9/11/2017 Unit 5 of TSEA26-2017 H1 1 Data memory in a Processor Store Data FIFO supporting DSP executions Computing buffer Parameter storage Access
More informationHardware and Software Optimisation. Tom Spink
Hardware and Software Optimisation Tom Spink Optimisation Modifying some aspect of a system to make it run more efficiently, or utilise less resources. Optimising hardware: Making it use less energy, or
More informationDesign of Embedded DSP Processors
Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1 Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3.
More information03 - The Junior Processor
September 8, 2015 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing
More informationRM3 - Cortex-M4 / Cortex-M4F implementation
Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course
More information03 - The Junior Processor
September 10, 2014 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing
More informationELC4438: Embedded System Design ARM Embedded Processor
ELC4438: Embedded System Design ARM Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University Intro to ARM Embedded Processor (UK 1990) Advanced RISC Machines (ARM) Holding Produce
More informationCadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015
Cadence SystemC Design and Verification NMI FPGA Network Meeting Jan 21, 2015 The High Level Synthesis Opportunity Raising Abstraction Improves Design & Verification Optimizes Power, Area and Timing for
More informationModel-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany
Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation
More informationReconfigurable Cell Array for DSP Applications
Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell
More informationELC4438: Embedded System Design Embedded Processor
ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture
More informationKey technologies for many core architectures
Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance
More informationMulticore Hardware and Parallelism
Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationSeparating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance
Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential
More informationMachine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory
More informationMicro-programmed Control Ch 15
Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of
More informationVersal: AI Engine & Programming Environment
Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY
More informationUniversität Dortmund. ARM Architecture
ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture
More informationDSP Core Instruction Set Architecture Design. Shih-Chieh Chang
DSP Core Instruction Set Architecture Design Shih-Chieh Chang Overview of Proposed Architecture Modified Harvard architecture (PM, XDM, YDM) Two parallel instructions per cycle Five-stage pipeline Zero-overhead
More informationTechnology Trends Presentation For Power Symposium
Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation
More informationPutting MPSOC to Work in Multimedia
Putting MPSOC to Work in Multimedia Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1 1. Multimedia subsystems appear everywhere big market
More informationA 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications
A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System
More informationENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT
ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely
More informationRISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation
RISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation Agenda Introductions RISC-V and ASIPs Implementation of Security Methods Performance results Codasip and SecureRF ASIP
More informationComputer Systems Architecture Spring 2016
Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,
More informationMPSoC Design Space Exploration Framework
MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary
More informationSimulink Design Environment
EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please
More informationProcessor (IV) - advanced ILP. Hwansoo Han
Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle
More informationSeveral Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining
Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the
More informationBetter sharc data such as vliw format, number of kind of functional units
Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com
More informationTen Reasons to Optimize a Processor
By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor
More informationVenezia: a Scalable Multicore Subsystem for Multimedia Applications
Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and
More informationMicro-programmed Control Ch 17
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to
More informationAssembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009
Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling
More informationAdvanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University
Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:
More informationHardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions
Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify
More informationDigital Signal Processor Core Technology
The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x
More informationHotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.
HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using
More informationParallelism in Hardware
Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law
More informationDigital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN
Digital Signal Processors: fundamentals & system design Lecture 1 Maria Elena Angoletta CERN Topical CAS/Digital Signal Processing Sigtuna, June 1-9, 2007 Lectures plan Lecture 1 (now!) introduction, evolution,
More informationPlatform-based Design
Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation
More informationENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design
ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University
More informationCS425 Computer Systems Architecture
CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order
More informationConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine
PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance
More informationEmbedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory
Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives
More informationRAČUNALNIŠKEA COMPUTER ARCHITECTURE
RAČUNALNIŠKEA COMPUTER ARCHITECTURE 6 Central Processing Unit - CPU RA - 6 2018, Škraba, Rozman, FRI 6 Central Processing Unit - objectives 6 Central Processing Unit objectives and outcomes: A basic understanding
More informationAll MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes
MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in
More informationQualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications
Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc. Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications 1 Hexagon DSP processors in Snapdragon products
More information55:132/22C:160, HPCA Spring 2011
55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is
More informationComputer Architecture
Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in
More informationMapping C code on MPSoC for Nomadic Embedded Systems
-1 - ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Mapping C code on MPSoC for Nomadic Embedded Systems http://www.artist-embedded.org/ Lecturer: Diederik
More informationClassification of Semiconductor LSI
Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low
More informationTSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G
TSEA 26 exam page 1 of 10 20171019 Examination Design of Embedded DSP Processors, TSEA26 Date 8-12, 2017-10-19 Room G34, G32, FOI hus G Time 08-12AM Course code TSEA26 Exam code TEN1 Design of Embedded
More informationIBM's POWER5 Micro Processor Design and Methodology
IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*
More informationCSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era
More informationISA: The Hardware Software Interface
ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important
More information