Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems

Size: px
Start display at page:

Download "Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems"

Transcription

1 Power Reduction through Software-Programmable Accelerators for ARM-based Subsystems Gert Goossens CEO Target Compiler Technologies Target Compiler Technologies 1

2 Low Power: Back to Basics Dynamic power P dyn = C (A f clock ) V dd 2 Locality of reference Avoid switching f-scaling Concurrency: task-, data-, instr.-level Low-V technology V-scaling Leakage power ["Moore's Law Meets Static Power", Computer, IEEE Comp. Soc., Dec. 2003] P leak = I leak V dd ~ (a -Vt N gates W dev ) V dd Multitreshold libraries Minimal logic Power gating 2012 Target Compiler Technologies 2

3 Heterogeneous Multicore SoC Dual and quad-core ARM Concurrency in control processing: task-level parallelism Big.LITTLE: minimal logic for given task; voltage and frequency scaling 2012 Target Compiler Technologies 3

4 Heterogeneous Multicore SoC Multiple ASIPs Application-Specific Instruction-set Processors Concurrency in data processing: task-, data-, instr.-level parallelism Minimal logic for given task: architectural specialization Locality of reference: local memories and interconnect Power gating of ASIPs when not in use 2012 Target Compiler Technologies 4

5 Heterogeneous Multicore SoC Hardwired datapath an endangered species? Except in a few cases, market dynamics require programmability ASIPs enable programmability and efficiency ARM & ASIPs offer a familiar software approach to SoC design: quick algorithm mapping from C/Matlab through SDK/debug tools 2012 Target Compiler Technologies 5

6 No MPSoC Design Without Tools Tools at IP level (ASIP cores) Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level (multicore) Code parallelisation Communication and synchronization Multicore platform generation MP Designer 2012 Target Compiler Technologies 6

7 IP Designer Tool Suite 2012 Target Compiler Technologies 7

8 Architectural optimisation space ASIP architectural optimisation space Parallelism Specialisation Instructionlevel parallelism Datalevel parallelism Tasklevel parallelism App.- specific data types App.- specific instructions Pipeline Connectivity & storage matching application s data-flow Orthogonal instruction set (VLIW) Encoded instruction set Vector processing (SIMD) Multicore Multithreading Integer, fractional, floating-point, bits, complex, vector Distributed regs, sub-ranges Multiple mem s, sub-ranges nml and IP Designer Support a wide range of ASIP architectures Enable true architectural exploration Make ASIP design easy App.-spec. memory addressing Direct, indirect, postmodification, indexed, stack indirect App.-spec. data processing Any exotic operator Single or multi-cycle App.-spec. control processing Jumps, subroutines, interrupts, HW do-loops, residual control, predication Relative or absolute, address range, delay slots 2012 Target Compiler Technologies 8

9 IP Designer s C Compiler DSPstone benchmark on TI C55x Target s compiler TI s compiler Gain (Target vs TI) Cycles Code Size Cycles Code Size Cycles Code Size Small-scale examples FIR restrict % -5% Convolution repeat % -31% LMS original % 13% Matrix repeat % 13% IIR, N=4 restrict % 18% IIR, N=16 restrict % 18% 22% 4% Large-scale examples FFT bit reverse original % 0% FFT butterfly original % 6% ADPCM original % 9% 7% 5% Graph-based C compiler technology offers retargetability and efficiency at same time Compilable sub-set of TI C55x modelled in nml in 2.5 person-months Only few C code modifications made: repeat loop, restrict pointers 2012 Target Compiler Technologies 9

10 IP Designer s RTL Generator Example: audio DSP (90 nm, clock 220 MHz, 0.9V) Area (kgates) 0-60% A B C D E Power ( W/MHz) IP Designer configuration options A Standard RTL generation B Clock gating + operand isolation for functional units C Operand isolation for multiplexers D Latching of register addresses in instruction decoder E Manual design by customer Low-power optimisations yield 60% savings Low-power optimisations have small area cost Area and power within percentages from hand-optimized design 2012 Target Compiler Technologies 10

11 IP Designer Example Wolverine platform Ultra-low power multi-core platform, optimised for audio coding Used in hearing instruments and Bluetooth headsets 20-bit precision 4 micro-dsp VLIW-ASIPs + 4 filter accelerators + 1 micro-processor core 0.04 mw/m-mac, 42 MIPS at fclock = 2 MHz (0.13u CMOS) 2012 Target Compiler Technologies 11 Sound Design Technologies Reproduced with permission

12 IP Designer Example Reed-Solomon coding ASIP FEC for wireless link in personal health-care systems IEEE a, supports RS(63, 55) encoding/decoding Concurrency Data-level: 8 elements SIMD, 6-bit/element Instruction-level: 2 scalar + 2 vector issue slots Specialisation Data and instruction-level parallelism Hardware for finite-field multiplication and addition 2012 Target Compiler Technologies 12 IMEC Reproduced with permission

13 IP Designer Example Reed-Solomon coding ASIP Performance Gate count 2012 Target Compiler Technologies 13 IMEC Reproduced with permission

14 IP Designer Example Reed-Solomon coding ASIP Energy 2012 Target Compiler Technologies IMEC Reproduced with permission

15 TM IP Designer Market Adoption Medical Audio Video & imaging Wireless Wireline Network processing High-perf. computing Automotive Crypto & identification Shown are publicly announced IP Designer customers Estimate about 150 unique SoC products based on IP Designer in the market today 2012 Target Compiler Technologies 15

16 No MPSoC Design Without Tools Tools at IP level (ASIP cores) Architectural exploration SDK generation: C compiler, ISS, debugger RTL generation IP Designer Tools at IP subsystem level (multicore) Code parallelisation Communication and synchronization Multicore platform generation MP Designer 2012 Target Compiler Technologies 16

17 MP Designer Tool Suite 2012 Target Compiler Technologies 17

18 User-Guided Parallelisation Label C code blocks for parallelisation int main(int argc,char void main_encoder(struct *argv[]){ void image* process_du(sbyte* img) { CDU, int init_all(); vlc_init: { DCY=0;DCCb=0;DCCr=0; blk) { } parsection: { for (ypos=0..height) { jpg_open: { for (xpos=0...width) SWORD { DU[64]; jpg_fopen(jpg_filename); for (blk=0..5) { fdct_main: { writeword(0xffd8); SBYTE //SOI DU[64]; int* int_fdtbl = write_app0info(); loading: blk_fdtbl[blk]; } load_data_unit_from_rgb_buffer(img, fdct_and_quantization(cdu, main_encoder(&in_img); xpos, ypos, blk, int_fdtbl,du); process_du(du,blk); } jpg_close: { } writeword(0xffd9); } //EOI vlc_main: { jpg_fclose(); } //Encode ACs } vlc_fini: { // Bit-alignment while (i<=end0pos) of EOI marker {... } } if (bytepos>=0) { if (end0pos!=63) free(in_img.rgb_buffer); writebits((1<<(bytepos+1))-1, writebits(eob); bytepos+1); return 0; } } } } } } 2012 Target Compiler Technologies 18

19 User-Guided Parallelisation Parallelisation pragmas processor P0 type dlx processor P1 type dlx processor P2 type dlx parallel ParRegion lbl main::parsection task LOAD target P0 include lbl main_encoder::loading Sample parallelisation on 3-core architecture task DCT target P1 include lbl process_du::fdct_main task VLC target P2 include lbl main::jpg_open include lbl main_encoder::vlc_init include lbl process_du::vlc_main include lbl main_encoder::vlc_fini include lbl main::jpg_close 2012 Target Compiler Technologies 19

20 Exploration For each parallelisation, MP Designer shows task graph with estimated processor loads Tas k 0 " LOA D" Pro c 0 " P0" ( dlx) ma in_en code r::lo ading : 35.8 % IN: <no ne> par _reg ion [ b0] m ain_e ncod er() [b17 ] f or [b 18] for [b 19] OU T: <no ne> for [ b20] JPEG encoding on 3-DLX architecture [NC dep T0 -> b2 0] DU (64 ) [FF ] T ask 1 "DC T" P roc 1 "P1 " (dl x) p roce ss_d U::fd ct_m ain: 22.2 % p roce ss_d U::q uant_ main : 25.0 % * TOT AL* 47.1 % I N: i n_im g.hei ght i n_im g.wi dth p ar_r egion [b0 ] main _enc oder () [b 17] for [b18 ] for [b19 ] O UT: < none > fo r [b2 0] p roce ss_d U() [ b22] [NC dep T1 - > b 22] DU_ ZZ (1 28) [FF] en d0p os (4 ) Task 2 "V LC" Proc 2 "P 2" (d lx) main ::jpg _ope n: 0.0 % main _enc oder ::vlc_ init: 0.0 % proce ss_d U::v lc_m ain: 16.9 % main _enc oder ::vlc_ fini: 0.0 % main ::jpg _clos e: 0.0 % *TOT AL* 16.9 % IN: JPG_ filen ame SOF0 info.heig ht SOF0 info.wid th in_im g.he ight in_im g.w idth par_ regio n [b0 ] mai n_en code r() [b 17] for [b18 ] fo r [b1 9] OUT : <non e> f or [b 20] proce ss_d U() [b22 ] 2012 Target Compiler Technologies 20

21 Exploration T0 "bs" (dlx) bitstream_hdr:2.0 % bitstream_cfs: 8.2 % idct_p_inter: 8.8 % idct_b: 1.0 % make_p_mv: 0.2 % make_b_mv: 0.0 % *TOTAL* 20.4 % par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] Task graph for H263 encoding on 8 cores [NC dep T0 -> b0] h+w (4) [NC dep T0 -> b11] h+w (4) [NC dep T0 -> b19] start (4) [NC dep T0 -> b23] MBAmax (4) [NC dep T0 -> b26] B_MV (16) [FF] P_MV (48) [FF] comm_mb (4) comm_pict_hdr (2) mvdbxy (4) T1 "recon" (dlx) rec_b: 3.1 % rec_p: 20.5 % *TOTAL* 23.7 % IN: framenum has_startcode par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] [NC dep T0 -> b19] start (4) pb_frame (4) [NC dep T0 -> b23] MBA (4) MBAmax (4) [LC dep T0 -> b26] bx (4) by (4) pmvdbxy (4) [NC dep T0 -> b26] B_MV (16) [CF] Mode+pCBP+pCBPB+pCOD (4x4) pblk+bblk (768) [FF] trb+trd (2x4) [NC dep T0 -> b19] start (4) pb_frame (4) [NC dep T0 -> b23] MBAmax (4) [NC dep T1 -> b26] brec (384) [FC] prec (384) [FF] [NC dep T0 -> b26] comm_mb (4) err (4) pb_frame (4) [NC dep T0 -> b19] start (4) [NC dep T0 -> b23] MBAmax (4) T2 "addblock" (dlx) idct_p_intra: 2.8 % addblock_p_intra: 1.2 % addblock_p_inter: 3.8 % reconblock_b: 6.7 % addblock_b: 0.5 % *TOTAL* 14.9 % IN: framenum has_startcode refidct par_region [b0] while [b19] getpict() [b20] getmbs() [b23] while [b26] Global dependency analysis automatically ensures correct communication & synchronisation Manual design would be error-prone [NC dep T2 -> b26] brec (384) [CF] T3 "store_bmb" (dlx) store_mb_b: 15.1 % init_idct: 0.1 % *TOTAL* 15.3 % IN: framenum has_startcode refidct par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] store_mb_b() [b72] [NC dep T2 -> b26] prec (384) [CF] [NC dep T3 -> b72] b_rgb_mb (884) [CF] T4 "out" (dlx) store_mb_p: 21.8 % mb_rgb_copyin: 1.6 % padding_mb: 2.5 % *TOTAL* 26.0 % IN: framenum has_startcode outputname par_region [b0] initdec() [b11] while [b19] getpict() [b20] getmbs() [b23] while [b26] store_mb_b() [b72] comm_mb (4) comm_pict_hdr (2) 2012 Target Compiler Technologies 21

22 Exploration JPEG encoding on multi-dlx architecture Algorithm # Cores Parallelisation Mcycles* seq par Speed up Load (%) P0 P1 P2 P3 P4 Efficiency (%) Original Original 2 ld+dct+q vlc Original 3 ld dct+q vlc Original 4 ld dct q vlc Optimised 2 ld+dct q+vlc Optimised 3 ld dct+q vlc Optimised 3 ld dct q+vlc Optimised 4 ld dct q vlc Split quant 3 ld dct+q0 q1+vlc Dual load 5 ld0 ld1 dct q vlc Entire exploration in only days of time * Cycles for 256x160-pixel image 2012 Target Compiler Technologies 22

23 Multicore SDK ISS Core-1 ISS Core-2 ISS Core-3 JTAG controller Multicore simulation Multicore on-chip debugging Debug controller HW Core-1 Debug controller HW Core-2 Debug controller HW Core Target Compiler Technologies 23

24 Conclusion ASIPs enable low-power, acceleration and programmability in ARM-based multicore SoCs No (efficient) multicore SoC design without tools Design and programming of individual ASIP cores Multicore parallelisation and platform generation Target can be your ASIP and multicore tools partner More information Come to our booth Brochure in your conference bag Target Compiler Technologies 24

In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures

In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures In the Days of IoT Dealing with Software Parallelization for Heterogeneous Multicore Architectures Yankin Tanurhan Vice President R&D, Solutions Group MPSoC, July 2014 2014 Synopsys, Inc. All rights reserved.

More information

Enabling the design of multicore SoCs with ARM cores and programmable accelerators

Enabling the design of multicore SoCs with ARM cores and programmable accelerators Enabling the design of multicore SoCs with ARM cores and programmable accelerators Target Compiler Technologies www.retarget.com Sol Bergen-Bartel China Business Development 03 Target Compiler Technologies

More information

Adding C Programmability to Data Path Design

Adding C Programmability to Data Path Design Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

The World Leader in High Performance Signal Processing Solutions. DSP Processors

The World Leader in High Performance Signal Processing Solutions. DSP Processors The World Leader in High Performance Signal Processing Solutions DSP Processors NDA required until November 11, 2008 Analog Devices Processors Broad Choice of DSPs Blackfin Media Enabled, 16/32- bit fixed

More information

Instruction Set Principles and Examples. Appendix B

Instruction Set Principles and Examples. Appendix B Instruction Set Principles and Examples Appendix B Outline What is Instruction Set Architecture? Classifying ISA Elements of ISA Programming Registers Type and Size of Operands Addressing Modes Types of

More information

Embedded Computation

Embedded Computation Embedded Computation What is an Embedded Processor? Any device that includes a programmable computer, but is not itself a general-purpose computer [W. Wolf, 2000]. Commonly found in cell phones, automobiles,

More information

Independent DSP Benchmarks: Methodologies and Results. Outline

Independent DSP Benchmarks: Methodologies and Results. Outline Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor

C-Based Hardware Design Platform for Dynamically Reconfigurable Processor C-Based Hardware Design Platform for Dynamically Reconfigurable Processor September 22 nd, 2005 IPFlex Inc. Agenda Merits of C-Based hardware design Hardware enabling C-Based hardware design DAPDNA-FW

More information

General Purpose Processors

General Purpose Processors Calcolatori Elettronici e Sistemi Operativi Specifications Device that executes a program General Purpose Processors Program list of instructions Instructions are stored in an external memory Stored program

More information

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING

DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: OUTLINE APPLICATIONS OF DIGITAL SIGNAL PROCESSING 1 DSP applications DSP platforms The synthesis problem Models of computation OUTLINE 2 DIGITAL VS. ANALOG SIGNAL PROCESSING Digital signal processing (DSP) characterized by: Time-discrete representation

More information

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey

System-on-Chip Architecture for Mobile Applications. Sabyasachi Dey System-on-Chip Architecture for Mobile Applications Sabyasachi Dey Email: sabyasachi.dey@gmail.com Agenda What is Mobile Application Platform Challenges Key Architecture Focus Areas Conclusion Mobile Revolution

More information

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007

EECS Components and Design Techniques for Digital Systems. Lec 20 RTL Design Optimization 11/6/2007 EECS 5 - Components and Design Techniques for Digital Systems Lec 2 RTL Design Optimization /6/27 Shauki Elassaad Electrical Engineering and Computer Sciences University of California, Berkeley Slides

More information

EE382V: System-on-a-Chip (SoC) Design

EE382V: System-on-a-Chip (SoC) Design EE382V: System-on-a-Chip (SoC) Design Lecture 10 Task Partitioning Sources: Prof. Margarida Jacome, UT Austin Prof. Lothar Thiele, ETH Zürich Andreas Gerstlauer Electrical and Computer Engineering University

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052,

More information

Processor Design. Introduction, part I

Processor Design. Introduction, part I Processor Design Introduction, part I Professor Jari Nurmi Institute of Digital and Computer Systems Tampere University of Technology, Finland email jari.nurmi@tut.fi Background Some trends in digital

More information

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications

Age nda. Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications Intel PXA27x Processor Family: An Applications Processor for Phone and PDA applications N.C. Paver PhD Architect Intel Corporation Hot Chips 16 August 2004 Age nda Overview of the Intel PXA27X processor

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Instruction Set Architecture ISA ISA

Instruction Set Architecture ISA ISA Instruction Set Architecture ISA Today s topics: Note: desperate attempt to get back on schedule we won t cover all of these slides use for reference Risk vs. CISC x86 does both ISA influence on performance

More information

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors

Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Developing and Integrating FPGA Co-processors with the Tic6x Family of DSP Processors Paul Ekas, DSP Engineering, Altera Corp. pekas@altera.com, Tel: (408) 544-8388, Fax: (408) 544-6424 Altera Corp., 101

More information

The MAXQ TM Family of High Performance Microcontrollers

The MAXQ TM Family of High Performance Microcontrollers The MAXQ TM Family of High Performance Microcontrollers Kris Ardis Senior Software Engineer Dallas Semiconductor/MAXIM http://www.maxim-ic.com/maxq ic.com/maxq Microprocessor Summit [MPS-920] Booth 826

More information

Software Defined Modem A commercial platform for wireless handsets

Software Defined Modem A commercial platform for wireless handsets Software Defined Modem A commercial platform for wireless handsets Charles F Sturman VP Marketing June 22 nd ~ 24 th Brussels charles.stuman@cognovo.com www.cognovo.com Agenda SDM Separating hardware from

More information

02 - Numerical Representation and Introduction to Junior

02 - Numerical Representation and Introduction to Junior 02 - Numerical Representation and Introduction to Junior September 10, 2013 Todays lecture Finite length effects, continued from Lecture 1 How to handle overflow Introduction to the Junior processor Demonstration

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

A 1-GHz Configurable Processor Core MeP-h1

A 1-GHz Configurable Processor Core MeP-h1 A 1-GHz Configurable Processor Core MeP-h1 Takashi Miyamori, Takanori Tamai, and Masato Uchiyama SoC Research & Development Center, TOSHIBA Corporation Outline Background Pipeline Structure Bus Interface

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006

The Use Of Virtual Platforms In MP-SoC Design. Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 The Use Of Virtual Platforms In MP-SoC Design Eshel Haritan, VP Engineering CoWare Inc. MPSoC 2006 1 MPSoC Is MP SoC design happening? Why? Consumer Electronics Complexity Cost of ASIC Increased SW Content

More information

Evaluation of Static and Dynamic Scheduling for Media Processors. Overview

Evaluation of Static and Dynamic Scheduling for Media Processors. Overview Evaluation of Static and Dynamic Scheduling for Media Processors Jason Fritts Assistant Professor Department of Computer Science Co-Author: Wayne Wolf Overview Media Processing Present and Future Evaluation

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1

Design of Embedded DSP Processors Unit 5: Data access. 9/11/2017 Unit 5 of TSEA H1 1 Design of Embedded DSP Processors Unit 5: Data access 9/11/2017 Unit 5 of TSEA26-2017 H1 1 Data memory in a Processor Store Data FIFO supporting DSP executions Computing buffer Parameter storage Access

More information

Hardware and Software Optimisation. Tom Spink

Hardware and Software Optimisation. Tom Spink Hardware and Software Optimisation Tom Spink Optimisation Modifying some aspect of a system to make it run more efficiently, or utilise less resources. Optimising hardware: Making it use less energy, or

More information

Design of Embedded DSP Processors

Design of Embedded DSP Processors Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1 Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3.

More information

03 - The Junior Processor

03 - The Junior Processor September 8, 2015 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

RM3 - Cortex-M4 / Cortex-M4F implementation

RM3 - Cortex-M4 / Cortex-M4F implementation Formation Cortex-M4 / Cortex-M4F implementation: This course covers both Cortex-M4 and Cortex-M4F (with FPU) ARM core - Processeurs ARM: ARM Cores RM3 - Cortex-M4 / Cortex-M4F implementation This course

More information

03 - The Junior Processor

03 - The Junior Processor September 10, 2014 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

ELC4438: Embedded System Design ARM Embedded Processor

ELC4438: Embedded System Design ARM Embedded Processor ELC4438: Embedded System Design ARM Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University Intro to ARM Embedded Processor (UK 1990) Advanced RISC Machines (ARM) Holding Produce

More information

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015

Cadence SystemC Design and Verification. NMI FPGA Network Meeting Jan 21, 2015 Cadence SystemC Design and Verification NMI FPGA Network Meeting Jan 21, 2015 The High Level Synthesis Opportunity Raising Abstraction Improves Design & Verification Optimizes Power, Area and Timing for

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

Reconfigurable Cell Array for DSP Applications

Reconfigurable Cell Array for DSP Applications Outline econfigurable Cell Array for DSP Applications Chenxin Zhang Department of Electrical and Information Technology Lund University, Sweden econfigurable computing Coarse-grained reconfigurable cell

More information

ELC4438: Embedded System Design Embedded Processor

ELC4438: Embedded System Design Embedded Processor ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture

More information

Key technologies for many core architectures

Key technologies for many core architectures Key technologies for many core architectures Thierry Collette CEA, LIST thierry.collette@c ea.fr 1 Embedded computing Silicon area offers perfo rmance Applications x 40 from 90 to 45 ns Computing performance

More information

Multicore Hardware and Parallelism

Multicore Hardware and Parallelism Multicore Hardware and Parallelism Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential

More information

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4)

Machine Instructions vs. Micro-instructions. Micro-programmed Control Ch 15. Machine Instructions vs. Micro-instructions (2) Hardwired Control (4) Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Machine Instructions vs. Micro-instructions Memory execution unit CPU control memory

More information

Micro-programmed Control Ch 15

Micro-programmed Control Ch 15 Micro-programmed Control Ch 15 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to modify Lots of

More information

Versal: AI Engine & Programming Environment

Versal: AI Engine & Programming Environment Engineering Director, Xilinx Silicon Architecture Group Versal: Engine & Programming Environment Presented By Ambrose Finnerty Xilinx DSP Technical Marketing Manager October 16, 2018 MEMORY MEMORY MEMORY

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

DSP Core Instruction Set Architecture Design. Shih-Chieh Chang

DSP Core Instruction Set Architecture Design. Shih-Chieh Chang DSP Core Instruction Set Architecture Design Shih-Chieh Chang Overview of Proposed Architecture Modified Harvard architecture (PM, XDM, YDM) Two parallel instructions per cycle Five-stage pipeline Zero-overhead

More information

Technology Trends Presentation For Power Symposium

Technology Trends Presentation For Power Symposium Technology Trends Presentation For Power Symposium 2006 8-23-06 Darryl Solie, Distinguished Engineer, Chief System Architect IBM Systems & Technology Group From Ingenuity to Impact Copyright IBM Corporation

More information

Putting MPSOC to Work in Multimedia

Putting MPSOC to Work in Multimedia Putting MPSOC to Work in Multimedia Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1 1. Multimedia subsystems appear everywhere big market

More information

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications

A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications A 50Mvertices/s Graphics Processor with Fixed-Point Programmable Vertex Shader for Mobile Applications Ju-Ho Sohn, Jeong-Ho Woo, Min-Wuk Lee, Hye-Jung Kim, Ramchan Woo, Hoi-Jun Yoo Semiconductor System

More information

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT

ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT ENHANCED TOOLS FOR RISC-V PROCESSOR DEVELOPMENT THE FREE AND OPEN RISC INSTRUCTION SET ARCHITECTURE Codasip is the leading provider of RISC-V processor IP Codasip Bk: A portfolio of RISC-V processors Uniquely

More information

RISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation

RISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation RISC-V as a basis for ASIP design A Quantum-Resistant IoT Security Implementation Agenda Introductions RISC-V and ASIPs Implementation of Security Methods Performance results Codasip and SecureRF ASIP

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

MPSoC Design Space Exploration Framework

MPSoC Design Space Exploration Framework MPSoC Design Space Exploration Framework Gerd Ascheid RWTH Aachen University, Germany Outline Motivation: MPSoC requirements in wireless and multimedia MPSoC design space exploration framework Summary

More information

Simulink Design Environment

Simulink Design Environment EE219A Spring 2008 Special Topics in Circuits and Signal Processing Lecture 4 Simulink Design Environment Dejan Markovic dejan@ee.ucla.edu Announcements Class wiki Material being constantly updated Please

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

Venezia: a Scalable Multicore Subsystem for Multimedia Applications

Venezia: a Scalable Multicore Subsystem for Multimedia Applications Venezia: a Scalable Multicore Subsystem for Multimedia Applications Takashi Miyamori Toshiba Corporation Outline Background Venezia Hardware Architecture Venezia Software Architecture Evaluation Chip and

More information

Micro-programmed Control Ch 17

Micro-programmed Control Ch 17 Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary 1 Hardwired Control (4) Complex Fast Difficult to design Difficult to

More information

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009 Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions

Hardwired Control (4) Micro-programmed Control Ch 17. Micro-programmed Control (3) Machine Instructions vs. Micro-instructions Micro-programmed Control Ch 17 Micro-instructions Micro-programmed Control Unit Sequencing Execution Characteristics Course Summary Hardwired Control (4) Complex Fast Difficult to design Difficult to modify

More information

Digital Signal Processor Core Technology

Digital Signal Processor Core Technology The World Leader in High Performance Signal Processing Solutions Digital Signal Processor Core Technology Abhijit Giri Satya Simha November 4th 2009 Outline Introduction to SHARC DSP ADSP21469 ADSP2146x

More information

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla.

HotChips An innovative HD video and digital image processor for low-cost digital entertainment products. Deepu Talla. HotChips 2007 An innovative HD video and digital image processor for low-cost digital entertainment products Deepu Talla Texas Instruments 1 Salient features of the SoC HD video encode and decode using

More information

Parallelism in Hardware

Parallelism in Hardware Parallelism in Hardware Minsoo Ryu Department of Computer Science and Engineering 2 1 Advent of Multicore Hardware 2 Multicore Processors 3 Amdahl s Law 4 Parallelism in Hardware 5 Q & A 2 3 Moore s Law

More information

Digital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN

Digital Signal Processors: fundamentals & system design. Lecture 1. Maria Elena Angoletta CERN Digital Signal Processors: fundamentals & system design Lecture 1 Maria Elena Angoletta CERN Topical CAS/Digital Signal Processing Sigtuna, June 1-9, 2007 Lectures plan Lecture 1 (now!) introduction, evolution,

More information

Platform-based Design

Platform-based Design Platform-based Design The New System Design Paradigm IEEE1394 Software Content CPU Core DSP Core Glue Logic Memory Hardware BlueTooth I/O Block-Based Design Memory Orthogonalization of concerns: the separation

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance

More information

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory

Embedded Systems. 8. Hardware Components. Lothar Thiele. Computer Engineering and Networks Laboratory Embedded Systems 8. Hardware Components Lothar Thiele Computer Engineering and Networks Laboratory Do you Remember? 8 2 8 3 High Level Physical View 8 4 High Level Physical View 8 5 Implementation Alternatives

More information

RAČUNALNIŠKEA COMPUTER ARCHITECTURE

RAČUNALNIŠKEA COMPUTER ARCHITECTURE RAČUNALNIŠKEA COMPUTER ARCHITECTURE 6 Central Processing Unit - CPU RA - 6 2018, Škraba, Rozman, FRI 6 Central Processing Unit - objectives 6 Central Processing Unit objectives and outcomes: A basic understanding

More information

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes

All MSEE students are required to take the following two core courses: Linear systems Probability and Random Processes MSEE Curriculum All MSEE students are required to take the following two core courses: 3531-571 Linear systems 3531-507 Probability and Random Processes The course requirements for students majoring in

More information

Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications

Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc. Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications 1 Hexagon DSP processors in Snapdragon products

More information

55:132/22C:160, HPCA Spring 2011

55:132/22C:160, HPCA Spring 2011 55:132/22C:160, HPCA Spring 2011 Second Lecture Slide Set Instruction Set Architecture Instruction Set Architecture ISA, the boundary between software and hardware Specifies the logical machine that is

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Mapping C code on MPSoC for Nomadic Embedded Systems

Mapping C code on MPSoC for Nomadic Embedded Systems -1 - ARTIST2 Summer School 2008 in Europe Autrans (near Grenoble), France September 8-12, 8 2008 Mapping C code on MPSoC for Nomadic Embedded Systems http://www.artist-embedded.org/ Lecturer: Diederik

More information

Classification of Semiconductor LSI

Classification of Semiconductor LSI Classification of Semiconductor LSI 1. Logic LSI: ASIC: Application Specific LSI (you have to develop. HIGH COST!) For only mass production. ASSP: Application Specific Standard Product (you can buy. Low

More information

TSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G

TSEA 26 exam page 1 of Examination. Design of Embedded DSP Processors, TSEA26 Date 8-12, G34, G32, FOI hus G TSEA 26 exam page 1 of 10 20171019 Examination Design of Embedded DSP Processors, TSEA26 Date 8-12, 2017-10-19 Room G34, G32, FOI hus G Time 08-12AM Course code TSEA26 Exam code TEN1 Design of Embedded

More information

IBM's POWER5 Micro Processor Design and Methodology

IBM's POWER5 Micro Processor Design and Methodology IBM's POWER5 Micro Processor Design and Methodology Ron Kalla IBM Systems Group Outline POWER5 Overview Design Process Power POWER Server Roadmap 2001 POWER4 2002-3 POWER4+ 2004* POWER5 2005* POWER5+ 2006*

More information

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Computer Abstractions and Technology (4) Fengguang Song Department of Computer & Information Science IUPUI Contents 1.7 - End of Chapter 1 Power wall The multicore era

More information

ISA: The Hardware Software Interface

ISA: The Hardware Software Interface ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important

More information