Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications

Size: px
Start display at page:

Download "Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications"

Transcription

1 Lucian Codrescu Sr. Director, Technology Qualcomm Technologies, Inc. Qualcomm Hexagon DSP: An architecture optimized for mobile multimedia and communications 1

2 Hexagon DSP processors in Snapdragon products Snapdragon 800 adsp: Real-time media & sensor processing Camera Display Adreno GP Krait CP Krait CP Audio Sensors JPEG Video Krait CP Krait CP Hexagon adsp Misc. Connectivity Other 2MB L2 Multimedia Fabric System Fabric Fabric & Memory Controller Modem Hexagon mdsp LPDDR3 LPDDR3 mdsp: Dedicated modem processing 2

3 Expansion of Hexagon DSP use cases beyond audio HexagonV2/V3 Image Enhancement Camera, Still, Video HexagonV4 based products Computer Vision & Augmented Reality HexagonV4 based products Video HexagonV5 based products Voice Audio Sensors HexagonV5 based products Hexagon DSP is evolving for use beyond voice and audio to computer vision, video and imaging features 3

4 The Hexagon DSP evolution Generational improvements in performance and power efficiency driven by both architecture and implementation V3M 45nm June 2009 V4M 28nm Dec 2010 V5A 28nm Dec 2012 V1 65nm Oct 2006 V2 65nm Dec 2007 V3L 45nm Nov 2009 V4L 28nm Apr 2011 V3C 45nm Aug 2009 V4C 28nm Dec 2010 V5H 28nm Dec 2012 Time 4

5 Key characteristics of modem & multimedia applications Requirements Require fixed real-time performance level (fps, Mbit/sec, etc.) Extremely aggressive power & area targets Characteristics Mix of signal processing & control code For modem, Qualcomm does not use a split CP/DSP architecture. All processing is done on Hexagon DSP Multimedia apps have significant control in the RTOS & frameworks Heavy L2$ misses Multimedia is data intensive Modem is code intensive 5

6 Hexagon DSP blends features targeted to modem & multimedia VLIW Need multi-issue to meet performance Low complexity for Area & Power Multi-Threading To reduce L2$ miss penalty without the need for a large L2 Increases instructions/vliw packet because compiler doesn t need to schedule latency Hexagon DSP Innovate in ISA to maximize IPC More work/vliw packet reduces energy/instruction Keep the pipelines full for MIPS/mm2 Target both Signal Processing & Control code 6

7 VLIW: Area & power efficient multi-issue Variable sized instruction packets (1 to 4 instructions per Packet) Instruction Cache Instruction nit Dual 64-bit execution units Standard 8/16/32/64bit data types SIMD vectorized MPY / AL / SHIFT, Permute, BitOps p to 8 16b MAC/cycle 2 SP FMA/cycle Device DDR Memory L2 Cache / TCM Dual 64-bit load/store units Also 32-bit AL Data nit (Load/ Store/ AL) Data nit (Load/ Store/ AL) Data Cache Execution nit (64-bit Vector) Register File/Thread Register File Register File Execution nit (64-bit Vector) nified 32x32bit General Register File is best for compiler. No separate Address or Accum Regs Per-Thread 7

8 Maximizing the signal processing code work/packet Example from inner loop of FFT: Executing 29 simple RISC ops in 1 cycle 64-bit Load and 64-bit Store with post-update addressing { R17:16 = MEMD(R0++M1) MEMD(R6++M1) = R25:24 R20 = CMPY(R20, R8):<<1:rnd:sat R11:10 = VADDH(R11:10, R13:12) }:endloop0 Rs Rt Complex multiply with round and saturation I I R R I I R R Rs Rt * * * * Zero-overhead loops Dec count Compare Jump top Vector 4x16-bit Add 0x <<0-1 Add Sat_32 32 << <<0-1 - Add Sat_32 32 <<0-1 0x High 16bits High 16bits I R Rd 8

9 Maximizing the control code work/packet Hexagon DSP ISA improves control code efficiency over traditional VLIW Example C code void example(int *ptr, int val) { if (ptr!=0) { *ptr = *ptr + val + 2; }} Tradional VLIW Assembly Code Hexagon DSP: Dot-New Predication Hexagon DSP: Compound AL Hexagon DSP: New-Value Store { } { p0 = cmp.eq(r0,#0) if (!p0) r2=memw(r0) if (p0) jumpr:nt r31 r2 = add(r2,#2) r1 = add(r1,r2) memw(r0) = r1 jumpr r { } { } p0 = cmp.eq (r0,#0) if (!p0.new) r2=memw(r0) if (p0.new) jumpr:nt r31 r2 = add(r2,#2) r1 = add(r1,r2) memw(r0) = r1 jumpr r { } { } p0 = cmp.eq(r0,#0) if (!p0.new) r2=memw(r0) if (p0.new) jumpr:nt r31 r1 = add(r1,add(r2,#2)) memw(r0) = r1 jumpr r { } { } p0 = cmp.eq(r0,#0) if (!p0.new) r2=memw(r0) if (p0.new) jumpr:nt r31 r1 = add(r1,add(r2,#2)) memw(r0) = r1.new jumpr r31 } Instr/Packet = 7 instr/5 packets = 1.4 Instr/Packet = 7 instr/2packets = 3.5 9

10 Average Instructions / VLIW Packet High avg. instructions/packet for targeted use cases Compound instructions count as Computer Vision Video Imaging Control Audio Source: Qualcomm internal measurements 10

11 Programmer s view of Hexagon DSP HW multi-threading Hexagon V5 includes three hardware threads Architected to look like a multi-core with communication through shared memory Shared Instruction Cache Thread 0 Thread 1 Thread 2 D D X X D D X X D D X X L2 Cache / TCM Register File Register File Register File Shared Data Cache 11

12 Hexagon DSP V1-V4: Interleaved multi-threading Simple round-robin thread scheduling Number of threads match execution pipe depth (three threads three execute stages) All instructions complete before next packet dispatch Compiler schedules for zero-latency which helps to increase instructions/vliw packet Thread 0 Dispatch Thread 1 Dispatch Thread 2 Dispatch T0: { Ld Ld Add Cmp } T1: { St Ld Mpy Add } T2: { Ld Add Jump } T0: { Ld Ld Add Cmp } T1: { St Ld Mpy Add } T0: { Ld Ld Add Cmp } 12

13 Hexagon DSP V5: Dynamic HW multi-threading Recover some performance when threads idle or stalled Remove a thread from IMT rotation On L2 cache misses When in wait-for-interrupt or off mode Additional forwarding to support 2-cycle packets VLIW packets with dependencies between long latency instructions will stall But many VLIW packets with simple instructions can complete in 2 processor clocks Coremarks/ MHz Dhrystone DMIPS/MHz IMT DMT IMT DMT Source: Qualcomm internal measurements 13

14 Average Instructions / Cycle Hexagon DSP instructions per cycle Multi-Threaded Apps Single-Threaded Apps IPC_DMT IPC_IMT Source: Qualcomm internal measurements 14

15 DSP Performance per MHz BDTImark2000 /MHz Hexagon DSP V5: Efficient Architecture Highly efficient mobile application processor designed for more performance per MHz Mobile Competitor Qualcomm Hexagon V5 (1 thread) Qualcomm Hexagon V5 (3 threads) Clock Rate (MHz) DSP Performance (BDTImark2000) * Source: BDTI - For more detailed information see All scores 2013 BDTI * - Projected best case score for 3-threads 15

16 Hexagon DSP Power Benefits 16

17 Lower is better MP3 playback power for competitive smartphones Power Competitor A Qualcomm / Hexagon-based Competitor B Competitor C Competitor D Competitor E Competitor F Competitor G Power measured at the battery for various phones Includes everything: DSP, CP, memory, analog components, etc Source: Qualcomm internal measurements 17

18 Computer vision offload ARM/neon to Hexagon DSP App CP VeNum ARM/VeNum FastCV Library Feature Detect Function Augmented Reality Java Application Call Feature Detect FastCV Call Router ARM Only FastCV Library Feature Detect Function App DSP Augmented Reality Java App finding objects in image using FastCV Feature Detect Comparison of Feature Detect run on: App CP (ARM/Neon) App DSP (Hexagon) Hexagon (QDSP6) FastCV Library Feature Detect Function CP tilization (%) Detection Time (%) Total Device Power (%) 52% Less CP 7% Less Time 32% Less Power* Source: Qualcomm internal measurements. * Power measured at the device battery 18

19 Hexagon DSP power for different thread utilizations Excellent near-linear power scalability (as threads go idle, power used by the thread is nearly eliminated) Achieved through optimized clock tree design & clock gating Dhrystone Power, IMT Mode FIR Power, IMT Mode 100% 100% 90% 90% 80% 80% 70% 70% 60% 60% 50% 50% 40% 30% 20% Actual Ideal 40% 30% 20% Actual Ideal 10% 10% 0% 0% Source: Qualcomm internal measurements 19

20 Hexagon DSP Software Development 20

21 Independent Algorithm Developers on Hexagon DSP 21

22 Announcing the Hexagon DSP SDK See the Hexagon DSP SDK in action at plinq2013 ( Visit for more information. 22

23 Thank you Follow us on: For more information on Qualcomm, visit us at: & Qualcomm Technologies, Inc. Qualcomm and Hexagon are trademarks of QALCOMM Incorporated, registered in the nited States and other countries. All QALCOMM Incorporated trademarks are used with permission. Other product and brand names may be trademarks or registered trademarks of their respective owners. Hexagon is a product of Qualcomm Technologies, Inc. 23

Specializing Hardware for Image Processing

Specializing Hardware for Image Processing Lecture 6: Specializing Hardware for Image Processing Visual Computing Systems So far, the discussion in this class has focused on generating efficient code for multi-core processors such as CPUs and GPUs.

More information

Porting LLVM to a Next Generation DSP

Porting LLVM to a Next Generation DSP Porting LLVM to a Next Generation DSP Presented by: L. Taylor Simpson LLVM Developers Meeting: 11/18/2011 PAGE 1 Agenda Hexagon DSP Initial porting Performance improvement Future plans PAGE 2 Hexagon DSP

More information

Heterogeneous Multi-Core Architecture Support for Dronecode

Heterogeneous Multi-Core Architecture Support for Dronecode Heterogeneous Multi-Core Architecture Support for Dronecode Mark Charlebois, March 24 th 2015 Qualcomm Technologies Inc (QTI) is a Silver member of Dronecode Dronecode has 2 main projects: https://www.dronecode.org/software/where-dronecode-used

More information

Perform. Travis Lanier Sr. Director, Product Management Qualcomm Technologies,

Perform. Travis Lanier Sr. Director, Product Management Qualcomm Technologies, Perform Travis Lanier Sr. Director, Product Management Qualcomm Technologies, Inc. @qualcomm More powerful and longer lasting mobile experiences Because nobody wants to be this person U.K. U.S. Germany

More information

RISC-V: Opportunities and Challenges in SoCs

RISC-V: Opportunities and Challenges in SoCs December 5, 2018 @qualcomm Santa Clara, CA RISC-V: Opportunities and Challenges in SoCs Greg Wright Sr Director, Engineering Qualcomm Technologies, Inc. Introductions Who am I? Why am I here? 2 Quick tour

More information

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance

Separating Reality from Hype in Processors' DSP Performance. Evaluating DSP Performance Separating Reality from Hype in Processors' DSP Performance Berkeley Design Technology, Inc. +1 (51) 665-16 info@bdti.com Copyright 21 Berkeley Design Technology, Inc. 1 Evaluating DSP Performance! Essential

More information

Lecture-13 (ROB and Multi-threading) CS422-Spring

Lecture-13 (ROB and Multi-threading) CS422-Spring Lecture-13 (ROB and Multi-threading) CS422-Spring 2018 Biswa@CSE-IITK Cycle 62 (Scoreboard) vs 57 in Tomasulo Instruction status: Read Exec Write Exec Write Instruction j k Issue Oper Comp Result Issue

More information

Snapdragon S4 System on Chip

Snapdragon S4 System on Chip Snapdragon S4 System on Chip Analyst Webinar 10/19/2011 2011 QUALCOMM Incorporated. All rights reserved. 1 2011 QUALCOMM Incorporated. All rights reserved. 2 New Snapdragon Brand and Roadmap Features Overview

More information

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading)

CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) CMSC 411 Computer Systems Architecture Lecture 13 Instruction Level Parallelism 6 (Limits to ILP & Threading) Limits to ILP Conflicting studies of amount of ILP Benchmarks» vectorized Fortran FP vs. integer

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Building Ultra-Low Power Wearable SoCs

Building Ultra-Low Power Wearable SoCs Building Ultra-Low Power Wearable SoCs 1 Wearable noun An item that can be worn adjective Easy to wear, suitable for wearing 2 Wearable Opportunity: Fastest Growing Market Segment Projected Growth from

More information

Snapdragon S4 System on Chip

Snapdragon S4 System on Chip Snapdragon S4 System on Chip Solutions for a New Mobile Age Reiner Klement, VP Product Marketing October, 2011 2011 QUALCOMM Incorporated. All rights reserved. 1 Snapdragon Delivers a Complete System Solution

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

Preparing for Mass Market Virtual Reality: A Mobile Perspective. Qualcomm Technologies, Inc. September 16, 2017

Preparing for Mass Market Virtual Reality: A Mobile Perspective. Qualcomm Technologies, Inc. September 16, 2017 Preparing for Mass Market Virtual Reality: A Mobile Perspective Qualcomm Technologies, Inc. September 16, 2017 Immersive Always-connected VR is meant to be Mobile 2 Automotive video streaming Crowded event

More information

Low-Power Processor Solutions for Always-on Devices

Low-Power Processor Solutions for Always-on Devices Low-Power Processor Solutions for Always-on Devices Pieter van der Wolf MPSoC 2014 July 7 11, 2014 2014 Synopsys, Inc. All rights reserved. 1 Always-on Mobile Devices Mobile devices on the move Mobile

More information

CS 61C: Great Ideas in Computer Architecture Case Studies: Server and Cellphone microprocessors. Not yet in producvon, the next core awer Ivy Bridge!

CS 61C: Great Ideas in Computer Architecture Case Studies: Server and Cellphone microprocessors. Not yet in producvon, the next core awer Ivy Bridge! CS 61C: Great Ideas in Computer Architecture Case Studies: Server and Cellphone microprocessors Instructors: Krste Asanovic, Randy H. Katz hcp://inst.eecs.berkeley.edu/~cs61c/fa12 Today: Intel Haswell

More information

New Technologies for UAV/UGV

New Technologies for UAV/UGV Qualcomm Research New Technologies for UAV/UGV Charles Bergan VP, Engineering Qualcomm Research Qualcomm Technologies, Inc. 2013-2016 QUALCOMM Incorporated and/or its subsidiaries. All rights reserved

More information

Instructions: Language of the Computer

Instructions: Language of the Computer CS359: Computer Architecture Instructions: Language of the Computer Yanyan Shen Department of Computer Science and Engineering 1 The Language a Computer Understands Word a computer understands: instruction

More information

03 - The Junior Processor

03 - The Junior Processor September 10, 2014 Designing a minimal instruction set What is the smallest instruction set you can get away with while retaining the capability to execute all possible programs you can encounter? Designing

More information

Immersion. Tim Leland Vice President, Product Management Qualcomm Technologies,

Immersion. Tim Leland Vice President, Product Management Qualcomm Technologies, Immersion Tim Leland Vice President, Product Management Qualcomm Technologies, Inc. @qualcomm Capturing life experiences and extending Realities Capture Color is a power which directly influences the soul

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

CEVA-X1 Lightweight Multi-Purpose Processor for IoT

CEVA-X1 Lightweight Multi-Purpose Processor for IoT CEVA-X1 Lightweight Multi-Purpose Processor for IoT 1 Cellular IoT for The Massive Internet of Things Narrowband LTE Technologies Days Battery Life Years LTE-Advanced LTE Cat-1 Cat-M1 Cat-NB1 >10Mbps Up

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures

An introduction to DSP s. Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures An introduction to DSP s Examples of DSP applications Why a DSP? Characteristics of a DSP Architectures DSP example: mobile phone DSP example: mobile phone with video camera DSP: applications Why a DSP?

More information

Qualcomm Snapdragon 450 Mobile Platform

Qualcomm Snapdragon 450 Mobile Platform Qualcomm Snapdragon 450 Mobile Platform Kedar Kondap Vice President, Product Management Qualcomm Technologies, Inc. Snapdragon 400 Tier @qualcomm Qualcomm Snapdragon is a product of Qualcomm Technologies,

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Unleash the DSP performance of Arm Cortex processors

Unleash the DSP performance of Arm Cortex processors Unleash the DSP performance of Arm Cortex processors Arm Tech Symposia 2017 Lionel Belnet Senior Product Manager Agenda Unleash the DSP performance of Cortex processors 1 Introducing Arm Cortex technology

More information

Qualcomm Snapdragon Technologies

Qualcomm Snapdragon Technologies March 2018 Game Developer Conference (GDC) Qualcomm Snapdragon Technologies Hiren Bhinde, Director, XR Product Management Qualcomm Technologies, Inc. Qualcomm Technologies announcements & updates Snapdragon

More information

The Future of Mobility. Keith Kressin Senior Vice President, Product Management Qualcomm Technologies,

The Future of Mobility. Keith Kressin Senior Vice President, Product Management Qualcomm Technologies, The Future of Mobility Keith Kressin Senior Vice President, Product Management Qualcomm Technologies, Inc. @qualcomm The future of mobility Path to 5G extended Reality Artificial Intelligence Trends Evolution

More information

COSC 6385 Computer Architecture - Pipelining

COSC 6385 Computer Architecture - Pipelining COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage

More information

Microprocessors vs. DSPs (ESC-223)

Microprocessors vs. DSPs (ESC-223) Insight, Analysis, and Advice on Signal Processing Technology Microprocessors vs. DSPs (ESC-223) Kenton Williston Berkeley Design Technology, Inc. Berkeley, California USA +1 (510) 665-1600 info@bdti.com

More information

Adding C Programmability to Data Path Design

Adding C Programmability to Data Path Design Adding C Programmability to Data Path Design Gert Goossens Sr. Director R&D, Synopsys May 6, 2015 1 Smart Products Drive SoC Developments Feature-Rich Multi-Sensing Multi-Output Wirelessly Connected Always-On

More information

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor

Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Case study: Performance-efficient Implementation of Robust Header Compression (ROHC) using an Application-Specific Processor Gert Goossens, Patrick Verbist, Erik Brockmeyer, Luc De Coster Synopsys 1 Agenda

More information

CS/COE1541: Introduction to Computer Architecture

CS/COE1541: Introduction to Computer Architecture CS/COE1541: Introduction to Computer Architecture Dept. of Computer Science University of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/1541p/index.html 1 Computer Architecture? Application pull Operating

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Independent DSP Benchmarks: Methodologies and Results. Outline

Independent DSP Benchmarks: Methodologies and Results. Outline Independent DSP Benchmarks: Methodologies and Results Berkeley Design Technology, Inc. 2107 Dwight Way, Second Floor Berkeley, California U.S.A. +1 (510) 665-1600 info@bdti.com http:// Copyright 1 Outline

More information

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 Lecture 3 Pipelining Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1 A "Typical" RISC ISA 32-bit fixed format instruction (3 formats) 32 32-bit GPR (R0 contains zero, DP take pair)

More information

Francisco Giacomini Soares Sr. Director, Government Affairs. Mobile technologies innovation enhancing accessibility

Francisco Giacomini Soares Sr. Director, Government Affairs. Mobile technologies innovation enhancing accessibility Francisco Giacomini Soares Sr. Director, Government Affairs Mobile technologies innovation enhancing accessibility 1 Born Mobile Nearly 30 years of driving the evolution of wireless communications Making

More information

Itanium 2 Processor Microarchitecture Overview

Itanium 2 Processor Microarchitecture Overview Itanium 2 Processor Microarchitecture Overview Don Soltis, Mark Gibson Cameron McNairy, August 2002 Block Diagram F 16KB L1 I-cache Instr 2 Instr 1 Instr 0 M/A M/A M/A M/A I/A Template I/A B B 2 FMACs

More information

Better sharc data such as vliw format, number of kind of functional units

Better sharc data such as vliw format, number of kind of functional units Better sharc data such as vliw format, number of kind of functional units Pictures of pipe would help Build up zero overhead loop example better FIR inner loop in coldfire Mine more material from bsdi.com

More information

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering VLIW, Vector, and Multithreaded Machines Assigned April 7 Problem Set #5 Due April 21 http://inst.eecs.berkeley.edu/~cs152/sp09 The problem sets are intended

More information

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki

An Ultra High Performance Scalable DSP Family for Multimedia. Hot Chips 17 August 2005 Stanford, CA Erik Machnicki An Ultra High Performance Scalable DSP Family for Multimedia Hot Chips 17 August 2005 Stanford, CA Erik Machnicki Media Processing Challenges Increasing performance requirements Need for flexibility &

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science Pipeline Overview Dr. Jiang Li Adapted from the slides provided by the authors Outline MIPS An ISA for Pipelining 5 stage pipelining Structural and Data Hazards Forwarding Branch Schemes Exceptions and

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

One instruction specifies multiple operations All scheduling of execution units is static

One instruction specifies multiple operations All scheduling of execution units is static VLIW Architectures Very Long Instruction Word Architecture One instruction specifies multiple operations All scheduling of execution units is static Done by compiler Static scheduling should mean less

More information

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers ARM Architecture ARM Ltd! Founded in November 1990! Spun out of Acorn Computers! Designs the ARM range of RISC processor cores! Licenses ARM core designs to semiconductor partners who fabricate and sell

More information

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.

Processor Applications. The Processor Design Space. World s Cellular Subscribers. Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley. Processor Applications CS 152 Computer Architecture and Engineering Introduction to Architectures for Digital Signal Processing Nov. 12, 1997 Bob Brodersen (http://infopad.eecs.berkeley.edu) 1 General

More information

Making XR a reality for everyone

Making XR a reality for everyone May 29, 2018 @qualcomm Augmented World Expo Making XR a reality for everyone Hugo Swart, Senior Director, Head of XR Business Management Hiren Bhinde, Director, XR Product Management Qualcomm Technologies,

More information

Ultra-low Power Always-On Computer Vision

Ultra-low Power Always-On Computer Vision March 20, 2019 @qualcomm_tech Sunnyvale, California Ultra-low Power Always-On Computer Vision Edwin Park Principal Engineer Qualcomm Artificial Intelligence (AI) Research Qualcomm Technologies, Inc. Qualcomm

More information

VR Development Platform

VR Development Platform VR Development Platform The Qualcomm Snapdragon VR820 headset is a VR development platform based on the Qualcomm Snapdragon 820 (APQ8096) processor by Qualcomm Technologies, Inc. Quick Start Guide Most

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

Embedded. Connected. Aware. SW Delivery Process. February Inforce Computing. Do NOT Copy/Distribute without prior written permission

Embedded. Connected. Aware. SW Delivery Process. February Inforce Computing. Do NOT Copy/Distribute without prior written permission 2016 Inforce Computing. Do NOT Copy/Distribute without prior written permission Embedded. Connected. Aware. SW Delivery Process February 2016 BSP Delivery The overall BSP package which is downloaded from

More information

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction

CS 61C: Great Ideas in Computer Architecture. Multiple Instruction Issue, Virtual Memory Introduction CS 61C: Great Ideas in Computer Architecture Multiple Instruction Issue, Virtual Memory Introduction Instructor: Justin Hsia 7/26/2012 Summer 2012 Lecture #23 1 Parallel Requests Assigned to computer e.g.

More information

Computer Architecture. Lecture 6.1: Fundamentals of

Computer Architecture. Lecture 6.1: Fundamentals of CS3350B Computer Architecture Winter 2015 Lecture 6.1: Fundamentals of Instructional Level Parallelism Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Jack Kang ( 剛至堅 ) VP Product June 2018

Jack Kang ( 剛至堅 ) VP Product June 2018 Jack Kang ( 剛至堅 ) VP Product June 2018 SiFive RISC-V Core IP Product Offering SiFive RISC-V Core IP Industry leading 32-bit and 64-bit Embedded Cores High performance 64-bit Application Cores High Performance

More information

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices

Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Emerging Vision Technologies: Enabling a New Era of Intelligent Devices Computer vision overview Computer vision is being integrated in our daily lives Acquiring, processing, and understanding visual data

More information

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key

Computer Architecture and Engineering. CS152 Quiz #5. April 23rd, Professor Krste Asanovic. Name: Answer Key Computer Architecture and Engineering CS152 Quiz #5 April 23rd, 2009 Professor Krste Asanovic Name: Answer Key Notes: This is a closed book, closed notes exam. 80 Minutes 8 Pages Not all questions are

More information

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy

Original PlayStation: no vector processing or floating point support. Photorealism at the core of design strategy Competitors using generic parts Performance benefits to be had for custom design Original PlayStation: no vector processing or floating point support Geometry issues Photorealism at the core of design

More information

Intel Enterprise Processors Technology

Intel Enterprise Processors Technology Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

Heterogeneous Computing Made Easy:

Heterogeneous Computing Made Easy: Heterogeneous Computing Made Easy: Qualcomm Symphony System Manager SDK Wenjia Ruan Sr. Engineer, Advanced Content Group Qualcomm Technologies, Inc. May 2017 Qualcomm Symphony System Manager SDK is a product

More information

Digital Signal Processing Applications for Mobile Computing Devices

Digital Signal Processing Applications for Mobile Computing Devices Digital Signal Processing Applications for Mobile Computing Devices Raj Talluri, VP of Product Management, Qualcomm CDMA Technologies 2012 QUALCOMM Incorporated. All rights reserved. 1 2012 QUALCOMM Incorporated.

More information

Leading the world to 5G

Leading the world to 5G June 28, 2018 @5GwirelessEDGE Shanghai, China Leading the world to 5G Serge Willenegger SVP & GM, 5G & Industrial IoT Qualcomm Wireless GmbH A unifying connectivity platform to drive growth and innovation

More information

Computer System Architecture Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer

Computer System Architecture Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer Computer System Architecture 6.823 Quiz #5 December 14th, 2005 Professor Arvind Dr. Joel Emer Name: This is a closed book, closed notes exam. 80 Minutes 15 Pages Notes: Not all questions are of equal difficulty,

More information

REAL TIME DIGITAL SIGNAL PROCESSING

REAL TIME DIGITAL SIGNAL PROCESSING REAL TIME DIGITAL SIGNAL PROCESSING UTN-FRBA 2010 Introduction Why Digital? A brief comparison with analog. Advantages Flexibility. Easily modifiable and upgradeable. Reproducibility. Don t depend on components

More information

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need??

Outline EEL 5764 Graduate Computer Architecture. Chapter 3 Limits to ILP and Simultaneous Multithreading. Overcoming Limits - What do we need?? Outline EEL 7 Graduate Computer Architecture Chapter 3 Limits to ILP and Simultaneous Multithreading! Limits to ILP! Thread Level Parallelism! Multithreading! Simultaneous Multithreading Ann Gordon-Ross

More information

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks

Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Vector Architectures Vs. Superscalar and VLIW for Embedded Media Benchmarks Christos Kozyrakis Stanford University David Patterson U.C. Berkeley http://csl.stanford.edu/~christos Motivation Ideal processor

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 18 Advanced Processors II 2006-10-31 John Lazzaro (www.cs.berkeley.edu/~lazzaro) Thanks to Krste Asanovic... TAs: Udam Saini and Jue Sun www-inst.eecs.berkeley.edu/~cs152/

More information

Embedded HW/SW Co-Development

Embedded HW/SW Co-Development Embedded HW/SW Co-Development It May be Driven by the Hardware Stupid! Frank Schirrmeister EDPS 2013 Monterey April 18th SPMI USB 2.0 SLIMbus RFFE LPDDR 2 LPDDR 3 emmc 4.5 UFS SD 3.0 SD 4.0 UFS Bare Metal

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Putting it all Together: Modern Computer Architecture

Putting it all Together: Modern Computer Architecture Putting it all Together: Modern Computer Architecture Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. May 10, 2018 L23-1 Administrivia Quiz 3 tonight on room 50-340 (Walker Gym) Quiz

More information

The World Leader in High Performance Signal Processing Solutions. DSP Processors

The World Leader in High Performance Signal Processing Solutions. DSP Processors The World Leader in High Performance Signal Processing Solutions DSP Processors NDA required until November 11, 2008 Analog Devices Processors Broad Choice of DSPs Blackfin Media Enabled, 16/32- bit fixed

More information

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version MIPS Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK]

Adapted from instructor s. Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Review and Advanced d Concepts Adapted from instructor s supplementary material from Computer Organization and Design, 4th Edition, Patterson & Hennessy, 2008, MK] Pipelining Review PC IF/ID ID/EX EX/M

More information

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections )

Lecture 9: More ILP. Today: limits of ILP, case studies, boosting ILP (Sections ) Lecture 9: More ILP Today: limits of ILP, case studies, boosting ILP (Sections 3.8-3.14) 1 ILP Limits The perfect processor: Infinite registers (no WAW or WAR hazards) Perfect branch direction and target

More information

Developing the Bifrost GPU architecture for mainstream graphics

Developing the Bifrost GPU architecture for mainstream graphics Developing the Bifrost GPU architecture for mainstream graphics Anand Patel Senior Product Manager, Media Processing Group ARM Tech Symposia India December 7 th 2016 Graphics processing drivers Virtual

More information

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine

ConnX D2 DSP Engine. A Flexible 2-MAC DSP. Dual-MAC, 16-bit Fixed-Point Communications DSP PRODUCT BRIEF FEATURES BENEFITS. ConnX D2 DSP Engine PRODUCT BRIEF ConnX D2 DSP Engine Dual-MAC, 16-bit Fixed-Point Communications DSP FEATURES BENEFITS Both SIMD and 2-way FLIX (parallel VLIW) operations Optimized, vectorizing XCC Compiler High-performance

More information

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng 1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors used in ES 4 Processors used in Embedded Systems RISC type ARM

More information

RISC, CISC, and ISA Variations

RISC, CISC, and ISA Variations RISC, CISC, and ISA Variations CS 3410 Computer System Organization & Programming These slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. iclicker

More information

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of

MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of An Independent Analysis of the: MIPS Technologies MIPS32 M4K Synthesizable Processor Core By the staff of Berkeley Design Technology, Inc. OVERVIEW MIPS Technologies, Inc. is an Intellectual Property (IP)

More information

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Thread Level Parallelism (TLP) CS425 - Vassilis Papaefstathiou 1 Multiple Issue CPI = CPI IDEAL + Stalls STRUC + Stalls RAW + Stalls WAR + Stalls WAW + Stalls

More information

Making always-on vision a reality. Dr. Evgeni Gousev Sr. Director, Engineering Qualcomm Technologies, Inc. September 22,

Making always-on vision a reality. Dr. Evgeni Gousev Sr. Director, Engineering Qualcomm Technologies, Inc. September 22, Making always-on vision a reality Dr. Evgeni Gousev Sr. Director, Engineering Qualcomm Technologies, Inc. September 22, 2017 @qualcomm Outline 1. Problem statement Challenges to develop always-on vision

More information

Lecture 9: Multiple Issue (Superscalar and VLIW)

Lecture 9: Multiple Issue (Superscalar and VLIW) Lecture 9: Multiple Issue (Superscalar and VLIW) Iakovos Mavroidis Computer Science Department University of Crete Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP

CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP CISC 662 Graduate Computer Architecture Lecture 13 - Limits of ILP Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Making Mobile 5G a Commercial Reality. Peter Carson Senior Director Product Marketing Qualcomm Technologies, Inc.

Making Mobile 5G a Commercial Reality. Peter Carson Senior Director Product Marketing Qualcomm Technologies, Inc. Making Mobile 5G a Commercial Reality Peter Carson Senior Director Product Marketing Qualcomm Technologies, Inc. Insatiable global data demand First phase of 5G NR will focus on enhanced MBB Enhanced mobile

More information

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley

Computer Systems Architecture I. CSE 560M Lecture 19 Prof. Patrick Crowley Computer Systems Architecture I CSE 560M Lecture 19 Prof. Patrick Crowley Plan for Today Announcement No lecture next Wednesday (Thanksgiving holiday) Take Home Final Exam Available Dec 7 Due via email

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

Chapter 4 The Processor 1. Chapter 4D. The Processor

Chapter 4 The Processor 1. Chapter 4D. The Processor Chapter 4 The Processor 1 Chapter 4D The Processor Chapter 4 The Processor 2 Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

VLIW and the MC Layer

VLIW and the MC Layer VLW and the MC Layer Presented by: Mario Guerra Qualcomm nnovation Center, nc PAGE 1 ntroduction What is VLW? Very Long nstruction Word architecture Hardware designed to execute multiple instructions in

More information

Multithreaded Processors. Department of Electrical Engineering Stanford University

Multithreaded Processors. Department of Electrical Engineering Stanford University Lecture 12: Multithreaded Processors Department of Electrical Engineering Stanford University http://eeclass.stanford.edu/ee382a Lecture 12-1 The Big Picture Previous lectures: Core design for single-thread

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design

COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design COPROCESSOR APPROACH TO ACCELERATING MULTIMEDIA APPLICATION [CLAUDIO BRUNELLI, JARI NURMI ] Processor Design Lecture Objectives Background Need for Accelerator Accelerators and different type of parallelizm

More information