Anand Raghunathan

Size: px
Start display at page:

Download "Anand Raghunathan"

Transcription

1 ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.15: ASIP: Approaches to Design Anand Raghunathan ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014, ME 1052, T Th 12:00PM-1:15PM 2014 Anand Raghunathan 1

2 Approaches to ASIP Design Design from scratch Design from a base architecture Application Design ISA and micro-architecture from scratch ASIP Application Register File LD/ST ALU MUL FU User-Defined Execution Units LD/ST Register File ALU MUL FU Customize base architecture Memory User-Defined Register File ASIP Memory Base Architecture ECE 695R: System-on-Chip Design, Fall

3 ECE 695R: System-on-Chip Design, Fall Approaches to ASIP Design: Design from Scratch Example LISATek (TU Aachen / Co-Ware / Synopsys) Key technology: Processor description language to specify ISA + microarchitecture

4 ECE 695R: System-on-Chip Design, Fall Approaches to ASIP Design: Design from Scratch Example LISATek (TU Aachen / Co-Ware) Key technology: Automatic generation of HW (RTL)

5 ECE 695R: System-on-Chip Design, Fall Approaches to ASIP Design: Design from Scratch Example LISATek (TU Aachen / Co-Ware) Key technology: Automatic generation of SW tool chain

6 Approaches to ASIP Design: Design from a Base Architecture Micro-architecture configuration Tune various parameters that affect performance and hardware complexity, not the ISA Instruction Set Sub-setting Use only the sub-set of the base ISA that a given application needs Instruction-set Extension Extend the base ISA by adding custom instructions ECE 695R: System-on-Chip Design, Fall

7 ECE 695R: System-on-Chip Design, Fall Micro-architecture Configuration Typical Parameters: Reg. file size, cache size & associativity, local memories, load/store buffers, interfaces, Configuration Example: Xtensa from Tensilica Instruction Fetch / Decode Base ISA Execution Pipeline Register File Base ALU Optional Execution Units Processor Controls Trace/TJAG/OCD Interrupts, Breakpoints, Timers Local Instruction Memories External Bus Interface Processor Interface (PIF) to System Bus Base ISA Feature Configurable Functions Optional Function Optional & Configurable Load/Store Unit #2 Vectra LX DSP Engine Data Load/Store Unit Local Data Memories Xtensa Local Memory Interface

8 ISA Sub-setting: Motivation Applications do not use complete ISA! 100% 75% 50% 25% 0% bubble_sort crc Percent of ISA Used des fft fir quant iquant turbo vlc bitcnts CRC32 qsort sha stringsearch FFT dijkstra patricia gol dct dhry AVERAGE Potential for hardware (area, power) reduction ECE 695R: System-on-Chip Design, Fall 2014 Source: Yiannacouras et al., U. Toronto 8

9 ECE 695R: System-on-Chip Design, Fall 2014 ISA Sub-setting Simplify processor HW by reducing the ISA Fraction of Area Automatically eliminate unused components and connections during HW generation bubble_sort crc des fft fir quant iquant turbo vlc bitcnts CRC32 qsort Benchmark Area reduced by 60% in some cases, 23% on average Similar reductions for energy, small impact on performance sha stringsearch FFT_MI dijkstra patricia gol dct dhry AVERAGE 23% Source: Yiannacouras et al., U. Toronto 9

10 ECE 695R: System-on-Chip Design, Fall ISA Extension Design custom instructions and corresponding HW that fits into processor pipeline Configuration Example: Xtensa from Tensilica Instruction Set Extension User Defined Queues / Ports up to 1M Pins Base ISA Feature Configurable Functions Optional Function Optional & Configurable Designer Defined Features (TIE) Instruction Fetch / Decode Designer-defined FLIX parallel execution pipelines - N wide User Defined Execution Units, Register Files and Interfaces Load/Store Unit #2 User Defined Execution Units, Register Files and Interfaces Base ISA Execution Pipeline Register File Base ALU Optional Execution Units User Defined Execution Unit Vectra LX DSP Engine Data Load/Store Unit Processor Controls Trace/TJAG/OCD Interrupts, Breakpoints, Timers Local Instruction Memories External Bus Interface Local Data Memories Xtensa Local Memory Interface Processor Interface (PIF) to System Bus

11 ECE 695R: System-on-Chip Design, Fall Simple Example (Nios) t 1 = a * b; t 2 = b * 0xf0; ; t 3 = c * 0x12; t 4 = t 1 + t 2 ; t 5 = t 2 + t 3 ; t 6 = t 5 + t 4 ; a b c 0xf0 extop1 t 1 = extop1(a, b, 0xf0); t 2 = extop2(b, c, 0xf0, 0x12); t 3 = t 1 + t 2 ; * * * x12 extop2 *: 2 clock cycles +: 1 clock cycles Execution time: 59 clock cycles Speedup: 1.8 Extended Instruction Set: I extop1 expop2

12 Complex Example: Collapse a subset of the instructions in a basic block into a single instruction Exploit the parallelism within the basic block Simplify operations with constant operands Optimize sequences of instructions (logic, arithmetic, etc.) Exploit limited precision ECE 695R: System-on-Chip Design, Fall 2014 K. Atasu, L. Pozzi, P. Ienne, DAC 03 12

13 Benefits of Configurability and Extensibility Consumer Electronics DSP Networking Extensible optimized Extensible out-of-box MIPS64 20Kc ARM1020E MIPS64b (NEC VR5000) MIPS32b (NEC VR4122) ConsumerMarks/MHz TeleMarks/MHz NetMarks/MHz Source: EEMBC Benchmark Consortium Copyright 2009, Grant Martin

14 ASIP impact on energy consumption Application Reference Processor Optimized Processor Energy Improvement Dot-Product Area (mm2) Cycles (K) Power (mw/mhz) Energy (µj) x AES Area (mm2) Cycles (K) Power (mw/mhz) Energy (µj) x Viterbi Area (mm2) Cycles (K) Power (mw/mhz) Energy (µj) x FFT Area (mm2) Cycles (K) Power (mw/mhz) Energy (µj) x Copyright 2009, Grant Martin. 14

15 Extensible Processor Tool Flow Tensilica Tool Flow Hardware + GUI-Based Specification of Coarsegrained configuration parameters PDL-Based Specificaion of instruction extensions Processor Generation Tools and Process Software Tool Chain Estimation of QoR ECE 695R: System-on-Chip Design, Fall 2014 Manual iteration 15

16 Summary ASIPs are a promising candidate to become a building block for future complex SoCs Replace hardwired processing units with programmable versions Sea of processors Two basic approaches to ASIP design From scratch Key technology: Architectural Description Language + automatic generation of full processor HW + re-targetable SW tool-chain From base architecture ISA sub-setting or extension Key technology: Instruction description language + automatic generation of custom instruction HW + re-targetable SW toolchain Easier to design and automate due to more constrained nature of changes ECE 695R: System-on-Chip Design, Fall

Grant Martin Chief Scientist, Tensilica Monday 10 March 2008:

Grant Martin Chief Scientist, Tensilica Monday 10 March 2008: The Engine of SOC Design DATE 008: Tutorial A: Automatically Realising Embedded Systems From High-Level Functional Models: - From Platform-Independent Models to Platform- Specific Implementations for Processor-Centric

More information

Custom Code Generation for Soft Processors

Custom Code Generation for Soft Processors Custom Code Generation for Soft Processors Martin Labrecque, Peter Yiannacouras and J. Gregory Steffan Department of Electrical and Computer Engineering University of Toronto Email:{martinl,yiannac,steffan}@eecg.toronto.edu

More information

Energy Consumption Evaluation of an Adaptive Extensible Processor

Energy Consumption Evaluation of an Adaptive Extensible Processor Energy Consumption Evaluation of an Adaptive Extensible Processor Hamid Noori, Farhad Mehdipour, Maziar Goudarzi, Seiichiro Yamaguchi, Koji Inoue, and Kazuaki Murakami December 2007 Outline Introduction

More information

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension

Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Enhancing Energy Efficiency of Processor-Based Embedded Systems thorough Post-Fabrication ISA Extension Hamid Noori, Farhad Mehdipour, Koji Inoue, and Kazuaki Murakami Institute of Systems, Information

More information

Microprocessors, Lecture 1: Introduction to Microprocessors

Microprocessors, Lecture 1: Introduction to Microprocessors Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

Towards Optimal Custom Instruction Processors

Towards Optimal Custom Instruction Processors Towards Optimal Custom Instruction Processors Wayne Luk Kubilay Atasu, Rob Dimond and Oskar Mencer Department of Computing Imperial College London HOT CHIPS 18 Overview 1. background: extensible processors

More information

Long Words and Wide Ports: Reinventing the Configurable Processor

Long Words and Wide Ports: Reinventing the Configurable Processor The Configurable Processor Company Long Words and Wide Ports: Reinventing the Configurable Processor Dhanendra Jani Gulbin Ezer James Kim SOC Design Challenges 1. Ever more complex requirements Media-Centric

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1

Design of Embedded DSP Processors Unit 2: Design basics. 9/11/2017 Unit 2 of TSEA H1 1 Design of Embedded DSP Processors Unit 2: Design basics 9/11/2017 Unit 2 of TSEA26-2017 H1 1 ASIP/ASIC design flow We need to have the flow in mind, so that we will know what we are talking about in later

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.26: Example: Hardware Architecture Anand Raghunathan raghunathan@purdue.edu ECE 695R: System-on-Chip Design, Fall 2014 Fall 2014,

More information

The ARM10 Family of Advanced Microprocessor Cores

The ARM10 Family of Advanced Microprocessor Cores The ARM10 Family of Advanced Microprocessor Cores Stephen Hill ARM Austin Design Center 1 Agenda Design overview Microarchitecture ARM10 o o Memory System Interrupt response 3. Power o o 4. VFP10 ETM10

More information

A New Look at SOC Design

A New Look at SOC Design C HAPTER 3 A New Look at SOC Design This book focuses on a particular SOC design technology and methodology, here called the advanced or processor-centric SOC design method. The essential enabler for this

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2014 Conclusions - Instruction set design An assembly language instruction set must be more efficient than Junior Accelerations shall be implemented at arithmetic and algorithmic levels.

More information

The Microarchitecture of FPGA-Based Soft Processors

The Microarchitecture of FPGA-Based Soft Processors The Microarchitecture of FPGA-Based Soft Processors Peter Yiannacouras, Jonathan Rose, and J. Gregory Steffan Department of Electrical and Computer Engineering University of Toronto 10 King s College Road

More information

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra School of Computing, National University of Singapore {ramkumar,liuhb,tulika}@comp.nus.edu.sg

More information

Anand Raghunathan

Anand Raghunathan ECE 695R: SYSTEM-ON-CHIP DESIGN Module 2: HW/SW Partitioning Lecture 2.13: HW/SW Co-Synthesis: Automatic Partitioning Anand Raghunathan raghunathan@purdue.edu Fall 2014, ME 1052, T Th 12:00PM-1:15PM 2014

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

Synthesis-driven Exploration of Pipelined Embedded Processors Λ

Synthesis-driven Exploration of Pipelined Embedded Processors Λ Synthesis-driven Exploration of Pipelined Embedded Processors Λ Prabhat Mishra Arun Kejariwal Nikil Dutt pmishra@cecs.uci.edu arun kejariwal@ieee.org dutt@cecs.uci.edu Architectures and s for Embedded

More information

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design

VLIW DSP Processor Design for Mobile Communication Applications. Contents crafted by Dr. Christian Panis Catena Radio Design VLIW DSP Processor Design for Mobile Communication Applications Contents crafted by Dr. Christian Panis Catena Radio Design Agenda Trends in mobile communication Architectural core features with significant

More information

ECE 486/586. Computer Architecture. Lecture # 7

ECE 486/586. Computer Architecture. Lecture # 7 ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix

More information

Xtensa 7 Configurable Processor Core

Xtensa 7 Configurable Processor Core FEATURES 32-bit synthesizable RISC architecture with 5-stage pipeline, 16/24-bit instruction encoding with modeless switching Designer-configurable processor options (MMU/MPU, local memory types and sizes,

More information

Characterizing Embedded Applications for Instruction-Set Extensible Processors

Characterizing Embedded Applications for Instruction-Set Extensible Processors Characterizing Embedded Applications for Instruction-Set Extensible Processors Pan Yu panyu@comp.nus.edu.sg School of Computing National University of Singapore Singapore 117543 Tulika Mitra tulika@comp.nus.edu.sg

More information

04 - DSP Architecture and Microarchitecture

04 - DSP Architecture and Microarchitecture September 11, 2015 Memory indirect addressing (continued from last lecture) ; Reality check: Data hazards! ; Assembler code v3: repeat 256,endloop load r0,dm1[dm0[ptr0++]] store DM0[ptr1++],r0 endloop:

More information

Ten Reasons to Optimize a Processor

Ten Reasons to Optimize a Processor By Neil Robinson SoC designs today require application-specific logic that meets exacting design requirements, yet is flexible enough to adjust to evolving industry standards. Optimizing your processor

More information

VLSI Signal Processing

VLSI Signal Processing VLSI Signal Processing Programmable DSP Architectures Chih-Wei Liu VLSI Signal Processing Lab Department of Electronics Engineering National Chiao Tung University Outline DSP Arithmetic Stream Interface

More information

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning

A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning A Study of the Speedups and Competitiveness of FPGA Soft Processor Cores using Dynamic Hardware/Software Partitioning By: Roman Lysecky and Frank Vahid Presented By: Anton Kiriwas Disclaimer This specific

More information

Xtensa. Andrew Mihal 290A Fall 2002

Xtensa. Andrew Mihal 290A Fall 2002 Xtensa Andrew Mihal 290A Fall 2002 1 Outline Introduction Single processor Xtensa system architecture Exporting a programming model for single processor Multiple processor system architecture Exporting

More information

NISC Application and Advantages

NISC Application and Advantages NISC Application and Advantages Daniel D. Gajski Mehrdad Reshadi Center for Embedded Computer Systems University of California, Irvine Irvine, CA 92697-3425, USA {gajski, reshadi}@cecs.uci.edu CECS Technical

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

Design of Embedded DSP Processors

Design of Embedded DSP Processors Design of Embedded DSP Processors Unit 3: Microarchitecture, Register file, and ALU 9/11/2017 Unit 3 of TSEA26-2017 H1 1 Contents 1. Microarchitecture and its design 2. Hardware design fundamentals 3.

More information

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions

Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Exploiting Forwarding to Improve Data Bandwidth of Instruction-Set Extensions Ramkumar Jayaseelan, Haibin Liu, Tulika Mitra School of Computing, National University of Singapore {ramkumar,liuhb,tulika}@comp.nus.edu.sg

More information

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform

Design of Transport Triggered Architecture Processor for Discrete Cosine Transform Design of Transport Triggered Architecture Processor for Discrete Cosine Transform by J. Heikkinen, J. Sertamo, T. Rautiainen,and J. Takala Presented by Aki Happonen Table of Content Introduction Transport

More information

Interfacing a High Speed Crypto Accelerator to an Embedded CPU

Interfacing a High Speed Crypto Accelerator to an Embedded CPU Interfacing a High Speed Crypto Accelerator to an Embedded CPU Alireza Hodjat ahodjat @ee.ucla.edu Electrical Engineering Department University of California, Los Angeles Ingrid Verbauwhede ingrid @ee.ucla.edu

More information

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009

Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems. July 2009 Assembling and Debugging VPs of Complex Cycle Accurate Multicore Systems July 2009 Model Requirements in a Virtual Platform Control initialization, breakpoints, etc Visibility PV registers, memories, profiling

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

Contents of this presentation: Some words about the ARM company

Contents of this presentation: Some words about the ARM company The architecture of the ARM cores Contents of this presentation: Some words about the ARM company The ARM's Core Families and their benefits Explanation of the ARM architecture Architecture details, features

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

An ASIP Design Methodology for Embedded Systems

An ASIP Design Methodology for Embedded Systems An ASIP Design Methodology for Embedded Systems Abstract A well-known challenge during processor design is to obtain the best possible results for a typical target application domain that is generally

More information

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany

Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany Model-Based Design for effective HW/SW Co-Design Alexander Schreiber Senior Application Engineer MathWorks, Germany 2013 The MathWorks, Inc. 1 Agenda Model-Based Design of embedded Systems Software Implementation

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

ECE 154A Introduction to. Fall 2012

ECE 154A Introduction to. Fall 2012 ECE 154A Introduction to Computer Architecture Fall 2012 Dmitri Strukov Lecture 10 Floating point review Pipelined design IEEE Floating Point Format single: 8 bits double: 11 bits single: 23 bits double:

More information

Final Lecture. A few minutes to wrap up and add some perspective

Final Lecture. A few minutes to wrap up and add some perspective Final Lecture A few minutes to wrap up and add some perspective 1 2 Instant replay The quarter was split into roughly three parts and a coda. The 1st part covered instruction set architectures the connection

More information

Putting MPSOC to Work in Multimedia

Putting MPSOC to Work in Multimedia Putting MPSOC to Work in Multimedia Six billion people want live multimedia entertainment and information anywhere and anytime at the lowest cost 1 1. Multimedia subsystems appear everywhere big market

More information

CUSTOMIZABLE EMBEDDED PROCESSORS

CUSTOMIZABLE EMBEDDED PROCESSORS CUSTOMIZABLE EMBEDDED PROCESSORS DESIGN TECHNOLOGIES AND APPLICATIONS Paolo lenne Ecole Polytechnique Federale de Lausanne (EPFL) Rainer Leupers RWTH Aachen University AMSTERDAM BOSTON HEIDELBERG LONDON

More information

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance

More information

Lecture #8: Lab 3 EE183 Pipelined Processor

Lecture #8: Lab 3 EE183 Pipelined Processor Lecture #8: Lab 3 EE183 Pipelined Processor Kunle Stanford EE183 February 3, 2003 Lab Stuff Lab #2 due Friday at 6pm I ll be in the lab at 5pm or so for demos. Any questions? 1 System-on-Chip (SoC) Design

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

Adding C Programmability to Data Path Design

Adding C Programmability to Data Path Design Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On

More information

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI

High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI High-performance and Low-power Consumption Vector Processor for LTE Baseband LSI Yi Ge Mitsuru Tomono Makiko Ito Yoshio Hirose Recently, the transmission rate for handheld devices has been increasing by

More information

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011

15-740/ Computer Architecture Lecture 10: Out-of-Order Execution. Prof. Onur Mutlu Carnegie Mellon University Fall 2011, 10/3/2011 5-740/8-740 Computer Architecture Lecture 0: Out-of-Order Execution Prof. Onur Mutlu Carnegie Mellon University Fall 20, 0/3/20 Review: Solutions to Enable Precise Exceptions Reorder buffer History buffer

More information

are Softw Instruction Set Architecture Microarchitecture are rdw

are Softw Instruction Set Architecture Microarchitecture are rdw Program, Application Software Programming Language Compiler/Interpreter Operating System Instruction Set Architecture Hardware Microarchitecture Digital Logic Devices (transistors, etc.) Solid-State Physics

More information

CS146 Computer Architecture. Fall Midterm Exam

CS146 Computer Architecture. Fall Midterm Exam CS146 Computer Architecture Fall 2002 Midterm Exam This exam is worth a total of 100 points. Note the point breakdown below and budget your time wisely. To maximize partial credit, show your work and state

More information

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)

More information

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures

Storage I/O Summary. Lecture 16: Multimedia and DSP Architectures Storage I/O Summary Storage devices Storage I/O Performance Measures» Throughput» Response time I/O Benchmarks» Scaling to track technological change» Throughput with restricted response time is normal

More information

CS61C : Machine Structures

CS61C : Machine Structures inst.eecs.berkeley.edu/~cs61c CS61C : Machine Structures Lecture #19 Designing a Single-Cycle CPU 27-7-26 Scott Beamer Instructor AI Focuses on Poker CS61C L19 CPU Design : Designing a Single-Cycle CPU

More information

Grand Challenge Scaling - Pushing a Fully Programmable TeraOp into Handset Imaging

Grand Challenge Scaling - Pushing a Fully Programmable TeraOp into Handset Imaging Grand Challenge Scaling - Pushing a Fully Programmable TeraOp into Handset Imaging Chris Rowen Cadence Fellow/Tensilica CTO Outline Grand challenge problem for the next decade: video and vision intelligence

More information

Computer Architecture Dr. Charles Kim Howard University

Computer Architecture Dr. Charles Kim Howard University EECE416 Microcomputer Fundamentals Computer Architecture Dr. Charles Kim Howard University 1 Computer Architecture Computer Architecture Art of selecting and interconnecting hardware components to create

More information

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit

Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit Custom Instruction Generation Using Temporal Partitioning Techniques for a Reconfigurable Functional Unit Farhad Mehdipour, Hamid Noori, Morteza Saheb Zamani, Kazuaki Murakami, Koji Inoue, Mehdi Sedighi

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Designing and Prototyping Digital Systems on SoC FPGA The MathWorks, Inc. 1

Designing and Prototyping Digital Systems on SoC FPGA The MathWorks, Inc. 1 Designing and Prototyping Digital Systems on SoC FPGA Hitu Sharma Application Engineer Vinod Thomas Sr. Training Engineer 2015 The MathWorks, Inc. 1 What is an SoC FPGA? A typical SoC consists of- A microcontroller,

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out

More information

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John

Computer Performance Evaluation and Benchmarking. EE 382M Dr. Lizy Kurian John Computer Performance Evaluation and Benchmarking EE 382M Dr. Lizy Kurian John Desirable features for modeling/evaluation techniques Accurate Not expensive Non-invasive User-friendly Fast Easy to change

More information

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego

ECE 111 ECE 111. Advanced Digital Design. Advanced Digital Design Winter, Sujit Dey. Sujit Dey. ECE Department UC San Diego Advanced Digital Winter, 2009 ECE Department UC San Diego dey@ece.ucsd.edu http://esdat.ucsd.edu Winter 2009 Advanced Digital Objective: of a hardware-software embedded system using advanced design methodologies

More information

A Reconfigurable Functional Unit for an Adaptive Extensible Processor

A Reconfigurable Functional Unit for an Adaptive Extensible Processor A Reconfigurable Functional Unit for an Adaptive Extensible Processor Hamid Noori Farhad Mehdipour Kazuaki Murakami Koji Inoue and Morteza SahebZamani Department of Informatics, Graduate School of Information

More information

05 - Microarchitecture, RF and ALU

05 - Microarchitecture, RF and ALU September 15, 2015 Microarchitecture Design Step 1: Partition each assembly instruction into microoperations, allocate each microoperation into corresponding hardware modules. Step 2: Collect all microoperations

More information

SA-1500: A 300 MHz RISC CPU with Attached Media Processor*

SA-1500: A 300 MHz RISC CPU with Attached Media Processor* and Bridges Division SA-1500: A 300 MHz RISC CPU with Attached Media Processor* Prashant P. Gandhi, Ph.D. and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 Prashant.Gandhi@intel.com

More information

EE 3170 Microcontroller Applications

EE 3170 Microcontroller Applications EE 3170 Microcontroller Applications Lecture 4 : Processors, Computers, and Controllers - 1.2 (reading assignment), 1.3-1.5 Based on slides for ECE3170 by Profs. Kieckhafer, Davis, Tan, and Cischke Outline

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc.

Configurable Processors for SOC Design. Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Configurable s for SOC Design Contents crafted by Technology Evangelist Steve Leibson Tensilica, Inc. Why Listen to This Presentation? Understand how SOC design techniques, now nearly 20 years old, are

More information

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

asoc: : A Scalable On-Chip Communication Architecture

asoc: : A Scalable On-Chip Communication Architecture asoc: : A Scalable On-Chip Communication Architecture Russell Tessier, Jian Liang,, Andrew Laffely,, and Wayne Burleson University of Massachusetts, Amherst Reconfigurable Computing Group Supported by

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

HW/SW-Codesign Lab. Seminar 2 WS 2016/2017. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G.

HW/SW-Codesign Lab. Seminar 2 WS 2016/2017. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis HW/SW-Codesign Lab Seminar WS / TU Dresden, Slide CORE FEATURES TU Dresden HW/SW-Codesign Lab Slide corelx_hwswcd Xtensa

More information

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano

Modeling and Simulation of System-on. Platorms. Politecnico di Milano. Donatella Sciuto. Piazza Leonardo da Vinci 32, 20131, Milano Modeling and Simulation of System-on on-chip Platorms Donatella Sciuto 10/01/2007 Politecnico di Milano Dipartimento di Elettronica e Informazione Piazza Leonardo da Vinci 32, 20131, Milano Key SoC Market

More information

16.1. Unit 16. Computer Organization Design of a Simple Processor

16.1. Unit 16. Computer Organization Design of a Simple Processor 6. Unit 6 Computer Organization Design of a Simple Processor HW SW 6.2 You Can Do That Cloud & Distributed Computing (CyberPhysical, Databases, Data Mining,etc.) Applications (AI, Robotics, Graphics, Mobile)

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 7 September 2018 Announcements Reminder: The class notes are posted to the website. HW#1 will

More information

ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing

ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing ERCBench An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing Daniel Chang Chris Jenkins, Philip Garcia, Syed Gilani, Paula Aguilera, Aishwarya Nagarajan, Michael Anderson, Matthew

More information

Precise Exceptions and Out-of-Order Execution. Samira Khan

Precise Exceptions and Out-of-Order Execution. Samira Khan Precise Exceptions and Out-of-Order Execution Samira Khan Multi-Cycle Execution Not all instructions take the same amount of time for execution Idea: Have multiple different functional units that take

More information

HW/SW Co-Design Lab. Seminar 2 WS 2018/2019. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G.

HW/SW Co-Design Lab. Seminar 2 WS 2018/2019. chair. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Vodafone Chair Mobile Communications Systems, Prof. Dr.-Ing. Dr. h.c. G. Fettweis HW/SW Co-Design Lab Seminar WS 8/9 TU Dresden, Slide CORE FEATURES Slide corelx_hwswcd Xtensa LX ALU -bit MUL Load/Store

More information

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Page 1. Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: March 16, 2001 Prof. David A. Patterson Computer Science 252 Spring

More information

Nios Soft Core Embedded Processor

Nios Soft Core Embedded Processor Nios Soft Core Embedded Processor June 2000, ver. 1 Data Sheet Features... Preliminary Information Part of Altera s Excalibur TM embedded processor solutions, the Nios TM soft core embedded processor is

More information

Computer Systems Architecture Spring 2016

Computer Systems Architecture Spring 2016 Computer Systems Architecture Spring 2016 Lecture 01: Introduction Shuai Wang Department of Computer Science and Technology Nanjing University [Adapted from Computer Architecture: A Quantitative Approach,

More information

EC 413 Computer Organization

EC 413 Computer Organization EC 413 Computer Organization Review I Prof. Michel A. Kinsy Computing: The Art of Abstraction Application Algorithm Programming Language Operating System/Virtual Machine Instruction Set Architecture (ISA)

More information

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1

Design of Embedded DSP Processors Unit 7: Programming toolchain. 9/26/2017 Unit 7 of TSEA H1 1 Design of Embedded DSP Processors Unit 7: Programming toolchain 9/26/2017 Unit 7 of TSEA26 2017 H1 1 Toolchain introduction There are two kinds of tools 1.The ASIP design tool for HW designers Frontend

More information

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution CS252 Graduate Computer Architecture Recall from Pipelining Review Lecture 15: Instruction Level Parallelism and Dynamic Execution March 11, 2002 Prof. David E. Culler Computer Science 252 Spring 2002

More information

Compiler-Assisted Memory Encryption for Embedded Processors

Compiler-Assisted Memory Encryption for Embedded Processors Compiler-Assisted Memory Encryption for Embedded Processors Vijay Nagarajan, Rajiv Gupta, and Arvind Krishnaswamy University of Arizona, Dept. of Computer Science, Tucson, AZ 85721 {vijay,gupta,arvind}@cs.arizona.edu

More information

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

EE282 Computer Architecture. Lecture 1: What is Computer Architecture? EE282 Computer Architecture Lecture : What is Computer Architecture? September 27, 200 Marc Tremblay Computer Systems Laboratory Stanford University marctrem@csl.stanford.edu Goals Understand how computer

More information

Lecture 4: RISC Computers

Lecture 4: RISC Computers Lecture 4: RISC Computers Introduction Program execution features RISC characteristics RISC vs. CICS Zebo Peng, IDA, LiTH 1 Introduction Reduced Instruction Set Computer (RISC) is an important innovation

More information

Reminder: tutorials start next week!

Reminder: tutorials start next week! Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected

More information

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications

Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Resource Efficiency of Scalable Processor Architectures for SDR-based Applications Thorsten Jungeblut 1, Johannes Ax 2, Gregor Sievers 2, Boris Hübener 2, Mario Porrmann 2, Ulrich Rückert 1 1 Cognitive

More information

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design

A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design A Unified HW/SW Interface Model to Remove Discontinuities between HW and SW Design Ahmed Amine JERRAYA EPFL November 2005 TIMA Laboratory 46 Avenue Felix Viallet 38031 Grenoble CEDEX, France Email: Ahmed.Jerraya@imag.fr

More information