CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

Size: px
Start display at page:

Download "CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng"

Transcription

1 1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng

2 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES

3 3 Processors Single Purpose ( Hardware ) General Purpose ( Software ) Application Specific ( Software )

4 Custom single-purpose processors: Hardware * Read chapter 2 and 3 in Embedded System Design: A unified Hardware/Software Introduction, Frank Vahid and Tony Givargis.

5 Introduction 5 Processor Digital circuit that performs a computation tasks Controller and datapath General-purpose: Di it l hi variety of computation CCD tasks Standard single-purpose: A2D one particular common task Off-the-shelf a.k.a., peripherals Custom single-purpose: p non-standard task lens Digital camera chip JPEG codec DMA controller CCD preprocessor Microcontroller Pixel coprocessor D2A Multiplier/Accum Display ctrl A custom single-purpose processor may be Fast, small, low power But, high NRE, longer time-tomarket, less flexible Memory controller ISA bus interface UART LCD ctrl

6 6 Custom single-purpose processor basic model external control inputs controller datapath control inputs external data inputs datapath controller next-state and control logic datapath registers datapath control outputs state register functional units external control outputs external data outputs controller and datapath a view inside the controller and datapath

7 Custom single-purpose processors 7 Can be built to execute algorithms Typically y start with FSMD CAD tools can be of great assistance Custom vs. Standard

8 General-Purpose Processors: Software

9 Introduction 9 General-Purpose Processor Processor designed for a variety of computation tasks aka a.k.a., microprocessor micro used when they were implemented on one or a few chips rather than entire rooms Low unit cost, in part because manufacturer spreads NRE cost over large numbers of units Motorola sold half a billion 68HC05 microcontrollers in 1996 alone ARM processors : 1.5 billion processors Carefully designed since higher NRE is acceptable Can yield good performance, size and power

10 Basic Architecture 10 Control unit and datapath Note similarity to singlepurpose processor Key differences Datapath is general Control unit Controller Processor Control /Status Datapath ALU Control unit doesn t store the algorithm the algorithm is programmed into the memory PC IR Registers I/O Memory

11 Two Memory Architectures 11 Pi Princeton Fewer memory wires Processor Processor Harvard Simultaneous program and data memory access Program memory Data memory Memory (program and data) Advantage, disadvantage? d Harvard Princeton

12 Princeton vs. Harvard 12 Harvard can t use selfmodifying code. Harvard allows two simultaneous memory fetches. Most DSPs use Harvard architecture for streaming data: greater memory bandwidth Most high performance processors use Harvard architecture At cache memory level

13 Cache Memory 13 Memory access may be slow Cache is small, but fast memory close to processor Holds copy of part of memory Hits and misses Fast/expensive technology, usually on the same chip Processor Cache Memory Slower/cheaper technology, usually on a different chip

14 Why use microprocessors? 14 Alternatives: field-programmable gate arrays (FPGAs), custom logic, etc. (Custom Singlepurpose Processor or HW Logic) Microprocessors are often very efficient: Low NRE cost, short time-to-market User just writes software; no processor design High flexibility can use same logic to perform many different functions Microprocessors simplify the design of families of products

15 The performance paradox 15 Microprocessors use much more logic to implement a function than does custom logic. But microprocessors are often very fast Performance increases over time P f d bl th You can easily take this advantage Performance doubles every months

16 The performance paradox 16 Carefully designed since higher NRE is acceptable heavily pipelined; large design teams; aggressive VLSI technology. Performance doubles every months Clock frequency Deeper pipelines IPC(Instructions per cycle)

17 17 Pipelining: Increasing Instruction Throughputh Wash Dry Non-pipelined Pipelined non-pipelined dish cleaning Time pipelined dish cleaning Time Fetch-instr. Decode Fetch ops Pipelined Execute Instruction 1 Store res pipelined instruction execution Time

18 Power 18 Custom logic is a clear winner for low power devices. Modern micro- processors offer features to help control power consumption. Software design techniques can help reduce power consumption.

19 Application-Specific Processors (ASPs)

20 Microprocessor varieties 20 Desktop vs. Embedded Processors Embedded Processors : including CPU core(s), Memory, Peripherals, I/O devices, Networks, etc. SoC processors Netsilicon NET+ARM Embedded Processor

21 Embedded Processors s varieties es 21 General Purpose vs. Application Specific Processors Digital signal processor (DSP): microprocessor optimized for digital signal processing. Application Specific Instruction-set Processors (ASIPs) Microcontroller and Microprocessors Microcontroller: includes I/O devices, on-board memory Usually used in control applications Typical embedded word sizes: 8-bit, 16-bit, 32-bit.

22 22 Many Types of Programmable Processors Past Now / Future Microprocessor Network Processor Microcontroller Sensor Processor DSP Cryptoprocessor Graphics Game Processor Processor Wearable Processor Mobile Processor

23 Application-Specific Processors (ASPs) 23 General-purpose processors Sometimes too general to be effective in demanding application e.g., video processing requires huge video buffers and operations on large arrays of data, inefficient on a GPP But single-purpose processors high NRE, not programmable ASPs targeted to a particular domain Contain architectural features specific to that domain e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. Still programmable

24 A Common ASP: Microcontroller 24 For embedded control applications Reading sensors, setting actuators Mostly dealing with events (bits): data is present, but not in huge amounts e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven Microcontroller features On-chip pperipheralsp Timers, analog-digital converters, serial communication, etc. Tightly integrated for programmer, typically part of register space On-chip program and data memory Direct programmer access to many of the chip s pins Specialized instructions for bit-manipulation and other low-level operations

25 25 Another Common ASP: Digital Signal Processors (DSP) For signal processing applications Large amounts of digitized data, often streaming Data transformations must be applied fast e.g., cell-phone voice filter, digital TV, music synthesizer DSP features Several instruction execution units Multiple-accumulate single-cycle instruction, other instrs. Efficient vector operations e.g., add two arrays Vector ALUs, loop buffers, etc.

26 Trend: Even More Customized ASPs 26 In the past, microprocessors were acquired as chips Today, we increasingly acquire a processor as Intellectual Property (IP) e.g., synthesizable VHDL model Customizable Processors Opportunity to add A custom datapath hardware and A few custom instructions, or delete a few instructions (ASIPs) Can have significant performance, power and size impacts Problem: need compiler/debugger for customized ASIP Remember, most development uses structured languages One solution: automatic compiler/debugger generation e.g.,

27 Selecting a Microprocessor 27 Issues Technical: speed, power, size, cost Other: development environment, prior expertise, licensing, etc. Speed: how evaluate a processor s speed? Clock speed but instructions per cycle may differ Instructions per second but work per instr. may differ Dhrystone: Synthetic benchmark, developed in Dhrystones/sec. MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second SPEC: set of more realistic benchmarks, but oriented to desktops EEMBC EDN Embedded Benchmark Consortium, Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

28 Processors 비교 28 Processor Clock speed Periph. Bus Width MIPS Power Trans. Price General Purpose Processors Intel PIII 1GHz 2x16 K 32 ~900 97W ~7M $900 L1, 256K L2, MMX IBM 550 MHz 2x32 K 32/64 ~1300 5W ~7M $900 PowerPC L1, 256K 750X L2 MIPS 250 MHz 2x32 K 32/64 NA NA 3.6M NA R way set assoc. StrongARM 233 MHz None W 2.1M NA SA-110 Microcontroller Intel 12 MHz 4K ROM, 128 RAM, 8 ~1 ~0.2W ~10K $ I/O, Timer, UART Motorola 3MHz 4K ROM, 192 RAM, 8 ~.5 ~0.1W ~10K $5 68HC I/O, Timer, WDT, SPI Digital Signal Processors TI C MHz 128K, SRAM, 3 T1 16/32 ~600 NA NA $34 Ports, DMA, 13 ADC, 9 DAC Lucent 80 MHz 16K Inst., 2K Data, NA NA $75 DSP32C Serial Ports, DMA Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

29 Summary 29 General-purpose rpose processors Good performance, low NRE, flexible Controller, datapath, and memory Structured languages prevail But some assembly level programming still necessary Many tools available Including instruction-set simulators, and in-circuit emulators ASPs Microcontrollers, DSPs, network processors, more customized ASIPs Choosing processors is an important step

30 CPU Performance

31 Elements of CPU performance 31 Cycle time Process technologies: transistor size CPU pipeline Instruction level parallelism Number of Transistors per die Types Superscalar VLIW Multi-threading Memory system

32 Pipelining 32 Several instructions ti are executed simultaneously l at different stages of completion Performance Measure Latency Throughput Various conditions can cause pipeline bubbles that reduce utilization: branches memory system delays, etc.

33 Pipeline structures 33 ARM7 has 3-stage pipes: fetch instruction from memory decode opcode and operands execute ARM9 have 5-stage pipes: Instruction fetch Decode Execute Data memory access Register write

34 ARM7 pipeline execution 34 fth fetch decoded execute add r0,r1,#5r1 sub r2,r3,r6r3 r6 fetch decode execute cmp r2,#3 fetch decode execute time

35 ARM9 core instruction pipeline 35

36 Performance measures 36 Latency time it takes for an instruction to get through the pipeline Throughput number of instructions executed per time period Pipelining increases throughput without reducing latency

37 Pipeline stalls 37 If every step cannot be completed in the same amount of time, pipeline stalls Bubbles introduced by stall increase latency, reduce throughput

38 ARM multi-cycle LDMIA instruction 38 ldmia r0,{r2,r3} fetch decodeex ld r2ex ld r3 sub r2,r3,r6 cmp r2,#3 fetch decode ex sub fetch decodeex cmp time

39 Control stalls 39 Branches often introduce stalls (branch penalty) Stall time may depend on whether branch is taken May have to squash instructions that already started executing Don t know what to fetch until condition is evaluated

40 ARM pipelined branch 40 bne foo fetch decode ex bne ex bne ex bne sub r2,r3,r6 foo add r0,r1,r2 fetch decode fetch decode ex add time

41 Example: ARM7 execution time 41 Determine execution time of FIR filter: for (i=0; i<n; i++) f = f + c[i]*x[i]; ;loop initiation code MOV r0,#0 ;use r0 for i, set to 0 MOV r8,#0 ;use separate index for arrays 7 ADR r2,n ;get address for N LDR r1,[r2] ;get value of N MOV r2,#0 ;use re for f, set to 0 ADR r3,c ;load r3 with the add of base of c array ADR r5,x ;load r5 with the add of base of x array ;loop body loop LDR r4,[r3,r8] ;get value of c[i] 4 LDR r6,[r5,r8] ;get value of x[i] MUL r4,r4,r6 ADD r2,r2,r4 ;add into running sum f ;update loop counter and array index ADD r8,r8,#4 ;add one word offset to array index 2 ADD r0,r0,#1 ;add 1 to i ;test for exit 2 or 4 CMP r0,r1 r1 BLT loop ;if i<n, continue loop Loopend.

42 ARM7 execution time(2) 42 Only branch in loop test may take more than one cycle. BLT loop takes 1 cycle best case, 3 worst case. t loop = t init + N(t body +t update )+(N-1)t test,worst +t test,best Branch Penalty Delayed branch Branch Prediction Branch Folding

43 Delayed branch 43 To increase pipeline efficiency, delayed branch mechanism requires n instructions after branch always executed whether branch is executed or not loop loopend. ;loop initiation code.. ;loop body LDR r4,[r3,r8] ;get value of c[i] LDR r6,[r5,r8] ;get value of x[i] MUL r4,r4,r6 ADD r2,r2,r4 ;add into running sum f ;update loop counter and array index ADD r8,r8,#4 ;add one word offset to array index ADD r0,r0,#1 ;add 1 to i ;test for exit CMP r0,r1 BLT loop ;if i<n, continue loop NOP NOP ;loop initiation code.. ;loop body loop LDR r4,[r3,r8] ;get value of c[i] LDR r6,[r5,r8] ;get value of x[i] MUL r4,r4,r6 ;update loop counter and array index ADD r0,r0,#1 ;add 1 to i ;test for exit CMP r0,r1 BLT loop ;if i<n, continue loop ADD r2,r2,r4 ;add into running sum f ADD r8,r8,#4 8 ;add one word offset to array index loopend.

44 ARM10 processor execution time 44 Impossible to describe briefly the exact behavior of all instructions in all circumstances Branch prediction Prefetch buffer Branch folding The independent Load/Store Unit Data alignment How many accesses hit in the cache and TLB

45 ARM10 integer core 45 3 instr s

46 Branch Folding 46

47 Branch Foding(2) 47

48 Integer core 48 Prefetch Unit Fetches instructions from I-cache or external memory Predicts the outcome of branches whenever it can Integer Unit Decode Barrel shifter, ALU, Multiplier Main instruction sequencer Load/store Unit Load or store two registers(64bits) per cycle Decouple from the integer unit after the first access of a LDM or STM instruction Supports Hit-Under-Miss (HUM) operation

49 Pipeline 49 Fetch Issue I-cache access, branch prediction Initial instruction decode Decode Final instruction decode, register read for ALU op, forwarding, and initial interlock resolution Execute Data address calculation, shift, flag setting, CC check, branch mispredict detection, and store data register read Memory Write Data cache access Register writes, instruction retirement

50 Typical operations 50

51 Load or store operation 51

52 LDR operation that misses 52

53 Interlocks 53 Integer core forwarding to resolve data dependencies between instructions Pipeline interlocks Data dependency interlocks: Instructions that have a source register that is loaded from memory by the previous instruction Hardware dependencyd A new load waiting for the LSU to finish an existing LDM or STM A load that misses when the HUM slot is already occupied A new multiply l waiting for a previous multiply l to free up the first stage of the multiply

54 Pipeline forwarding paths 54

55 Example of interlocking and forwarding 55 Execute-to-execute mov r0, #1 add r1, r0, #1 Memory-to-execute ldr r0, [r5] sub r1, r2, #2 add r2, r0, #1

56 56 Example of interlocking and forwarding, cont d Single cycle interlock ldr r0, [r1, r2] str r3, [r0, r4] fetch issue decode execute memory write ldr r1+r2 r0 read fetch issue decoded execute memory write str r0+r4 r3 write

57 Instruction Level Parallelism 57 Instructions may be performed in parallel Data dependencies Control dependencies Resource dependencies Dependency Analysis At compile time At run time

58 58 Data and Control dependencies Execution time depends d on operands, not just opcode. Speculative execution: assume branch direction and execute unwind if wrong add r2,r0,r1 add r3,r2,r5 r0 data dependency r2 r1 r3 r5 a1: cmp r0,r1 a2: blt b1 a3: add r1,r2,r3 b1: sub r1,r2,r3 b1 a1 a2 a3

59 Parallelism extraction 59 Staticti Dynamic use compiler to analyze programs Simpler CPU control Can make use of high level language constructs use hardware to identify opportunities More complex CPU Can make use of data value Can t depend on data values Superscalar VLIW

60 Superscalar and VLIW Architectures 60 Performance can be improved by: Faster clock (but there s a limit) Pipelining: slice up instruction into stages, overlap stages Multiple ALUs to support more than one instruction stream Superscalar Scalar: non-vector operations Fetches instructions in batches, executes as many as possible May require extensive hardware to detect independent instructions VLIW: each word in memory has multiple independent instructions Currently growing in popularity Relies on the compiler to detect and schedule instructions

61 Superscalar execution 61 Superscalar processor can execute several instructions per cycle. Uses multiple pipelined data paths. Programs execute faster, but it is harder to determine how much faster. Superscalar CPU checks data dependencies dynamically:

62 VLIW processors 62 Parallelism extraction: compile time Parallel operations encoded in one long word (Instruction bundle) Instruction Bundle instruction 1 instruction 2 instruction 3 instruction 4 FP unit Integer unit Integer unit Memory unit Slot utilization static scheduling trace scheduling multi-threading

63 Memory system performance 63 Caches introduce indeterminacy in execution time Depends on order of execution Cache miss penalty: added time due to a cache miss Several reasons for a miss: compulsory, conflict, capacity

64 Instruction Sets

65 RISC vs. CISC 65 Complex instruction set computer (CISC): many addressing modes; many operations. Reduced instruction set computer (RISC): load/store; pipelinable instructions.

66 CISC 프로세서 66 Intel 계열마이크로프로세서의종류및역사 연도 프로세서이름 트렌지스터개수 ,250 인텔의첫마이크로프로세스, Busicom 계산기에사용 특징 ,500 Mark-8 에서사용, 최초의가정용컴퓨터 ,000 Altair 에서사용 / ,000 IBM-PC XT 에서사용, 인텔이대기업으로성장 ,000 IBM-PC AT 에서사용, 6 년간천 5 백만대판매 , 비트멀티테스킹지원 ,180,000 수치보조프로세서내장 1993 Pentium 3,100, 음성, 이미지처리기능강화 1995 Pentium 5,500,000 Dynamic Execution 구조채택 Pro 1997 Pentium 2 7,500,000, MMX 기술지원 1999 Pentium 3 24,000,000 SIMD 지원, 12 스테이지파이프라인 2001 Itanium 25,000,000 64비트, Explicitly Parallel Instruction Computing(EPIC) 2002 Pentium 4 55,000, 스테이지하이퍼파이프라인, 하이퍼쓰레딩 2003 Itanium 2 410,000,000 Machine Check Architecture, EPIC, 6MB L3 캐시

67 CISC - History : Packaging 기술변천 67

68 CISC - History 68

69 Instruction set characteristics 69 Fixed vs. variable length. Addressing modes. Number of operands. Types of operands.

70 ARM data processing Instruction Formats (RISC) 70 Data processing immediate shift cond 000 opcode S Rn Rd shift amount shift 0 Rm Data processing register shift cond 000 opcode S Rn Rd Rs 0 shift 1 Rm Data processing 32-bit immediate cond 001 opcode S Rn Rd rotate immediate-8

71 71 Nios II processor Instruction Formats (RISC) Instruction formats I-type R-type J-type

72 Intel IA-32 Instruction Format (CISC) 72

73 Programming model 73 Programming model: registers visible to the programmer. Some registers are not visible (IR).

74 Multiple implementations 74 Successful architectures have several implementations: varying clock speeds; different bus widths; different cache sizes; etc.

Elements of CPU performance

Elements of CPU performance Elements of CPU performance Cycle time. CPU pipeline. Superscalar design. Memory system. Texec = instructions ( )( program cycles instruction seconds )( ) cycle ARM7TDM CPU Core ARM Cortex A-9 Microarchitecture

More information

Memory management units

Memory management units Memory management units Memory management unit (MMU) translates addresses: CPU logical address memory management unit physical address main memory Computers as Components 1 Access time comparison Media

More information

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language.

Architectures & instruction sets R_B_T_C_. von Neumann architecture. Computer architecture taxonomy. Assembly language. Architectures & instruction sets Computer architecture taxonomy. Assembly language. R_B_T_C_ 1. E E C E 2. I E U W 3. I S O O 4. E P O I von Neumann architecture Memory holds data and instructions. Central

More information

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors

CPUs. Caching: The Basic Idea. Cache : MainMemory :: Window : Caches. Memory management. CPU performance. 1. Door 2. Bigger Door 3. The Great Outdoors CPUs Caches. Memory management. CPU performance. Cache : MainMemory :: Window : 1. Door 2. Bigger Door 3. The Great Outdoors 4. Horizontal Blinds 18% 9% 64% 9% Door Bigger Door The Great Outdoors Horizontal

More information

Basic Computer Architecture

Basic Computer Architecture Basic Computer Architecture CSCE 496/896: Embedded Systems Witawas Srisa-an Review of Computer Architecture Credit: Most of the slides are made by Prof. Wayne Wolf who is the author of the textbook. I

More information

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng

CS 310 Embedded Computer Systems CPUS. Seungryoul Maeng 1 EMBEDDED SYSTEM HW CPUS Seungryoul Maeng 2 CPUs Types of Processors CPU Performance Instruction Sets Processors used in ES 3 Processors used in ES 4 Processors used in Embedded Systems RISC type ARM

More information

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University Advanced d Instruction ti Level Parallelism Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu ILP Instruction-Level Parallelism (ILP) Pipelining:

More information

The Processor: Instruction-Level Parallelism

The Processor: Instruction-Level Parallelism The Processor: Instruction-Level Parallelism Computer Organization Architectures for Embedded Computing Tuesday 21 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy

More information

Processor (IV) - advanced ILP. Hwansoo Han

Processor (IV) - advanced ILP. Hwansoo Han Processor (IV) - advanced ILP Hwansoo Han Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel To increase ILP Deeper pipeline Less work per stage shorter clock cycle

More information

Embedded Systems and Software

Embedded Systems and Software Embedded Systems and Software Lecture 1: Introduction Artist's concept of Mars Exploration Rover. Courtesy NASA Lecture 1-1 Organizational Class Website (be sure to check it often): http://siihr64.iihr.uiowa.edu/myweb/teaching/ece_55036_2013/in

More information

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan

Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan Processors Hi Hsiao-Lung Chan, Ph.D. Dept Electrical Engineering Chang Gung University, Taiwan chanhl@maili.cgu.edu.twcgu General-purpose p processor Control unit Controllerr Control/ status Datapath ALU

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Real Processors Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in parallel

More information

Introduction. Definition. What is an embedded system? What are embedded systems? Challenges in embedded computing system design. Design methodologies.

Introduction. Definition. What is an embedded system? What are embedded systems? Challenges in embedded computing system design. Design methodologies. Introduction What are embedded systems? Challenges in embedded computing system design. Design methodologies. What is an embedded system? Communication Avionics Automobile Consumer Electronics Office Equipment

More information

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter

Latches. IT 3123 Hardware and Software Concepts. Registers. The Little Man has Registers. Data Registers. Program Counter IT 3123 Hardware and Software Concepts Notice: This session is being recorded. CPU and Memory June 11 Copyright 2005 by Bob Brown Latches Can store one bit of data Can be ganged together to store more

More information

Advanced Computer Architecture

Advanced Computer Architecture Advanced Computer Architecture Chapter 1 Introduction into the Sequential and Pipeline Instruction Execution Martin Milata What is a Processors Architecture Instruction Set Architecture (ISA) Describes

More information

ELC4438: Embedded System Design Embedded Processor

ELC4438: Embedded System Design Embedded Processor ELC4438: Embedded System Design Embedded Processor Liang Dong Electrical and Computer Engineering Baylor University 1. Processor Architecture General PC Von Neumann Architecture a.k.a. Princeton Architecture

More information

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation.

UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. UNIT 8 1. Explain in detail the hardware support for preserving exception behavior during Speculation. July 14) (June 2013) (June 2015)(Jan 2016)(June 2016) H/W Support : Conditional Execution Also known

More information

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor

CPI < 1? How? What if dynamic branch prediction is wrong? Multiple issue processors: Speculative Tomasulo Processor 1 CPI < 1? How? From Single-Issue to: AKS Scalar Processors Multiple issue processors: VLIW (Very Long Instruction Word) Superscalar processors No ISA Support Needed ISA Support Needed 2 What if dynamic

More information

Intel released new technology call P6P

Intel released new technology call P6P P6 and IA-64 8086 released on 1978 Pentium release on 1993 8086 has upgrade by Pipeline, Super scalar, Clock frequency, Cache and so on But 8086 has limit, Hard to improve efficiency Intel released new

More information

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16 4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3 Emil Sekerinski, McMaster University, Fall Term 2015/16 Instruction Execution Consider simplified MIPS: lw/sw rt, offset(rs) add/sub/and/or/slt

More information

Advanced processor designs

Advanced processor designs Advanced processor designs We ve only scratched the surface of CPU design. Today we ll briefly introduce some of the big ideas and big words behind modern processors by looking at two example CPUs. The

More information

Multiple Instruction Issue. Superscalars

Multiple Instruction Issue. Superscalars Multiple Instruction Issue Multiple instructions issued each cycle better performance increase instruction throughput decrease in CPI (below 1) greater hardware complexity, potentially longer wire lengths

More information

Processing Unit CS206T

Processing Unit CS206T Processing Unit CS206T Microprocessors The density of elements on processor chips continued to rise More and more elements were placed on each chip so that fewer and fewer chips were needed to construct

More information

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

INSTRUCTION LEVEL PARALLELISM

INSTRUCTION LEVEL PARALLELISM INSTRUCTION LEVEL PARALLELISM Slides by: Pedro Tomás Additional reading: Computer Architecture: A Quantitative Approach, 5th edition, Chapter 2 and Appendix H, John L. Hennessy and David A. Patterson,

More information

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor

CPI IPC. 1 - One At Best 1 - One At best. Multiple issue processors: VLIW (Very Long Instruction Word) Speculative Tomasulo Processor Single-Issue Processor (AKA Scalar Processor) CPI IPC 1 - One At Best 1 - One At best 1 From Single-Issue to: AKS Scalar Processors CPI < 1? How? Multiple issue processors: VLIW (Very Long Instruction

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

EE 4980 Modern Electronic Systems. Processor Advanced

EE 4980 Modern Electronic Systems. Processor Advanced EE 4980 Modern Electronic Systems Processor Advanced Architecture General Purpose Processor User Programmable Intended to run end user selected programs Application Independent PowerPoint, Chrome, Twitter,

More information

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics

Computer and Hardware Architecture I. Benny Thörnberg Associate Professor in Electronics Computer and Hardware Architecture I Benny Thörnberg Associate Professor in Electronics Hardware architecture Computer architecture The functionality of a modern computer is so complex that no human can

More information

ECE332, Week 2, Lecture 3. September 5, 2007

ECE332, Week 2, Lecture 3. September 5, 2007 ECE332, Week 2, Lecture 3 September 5, 2007 1 Topics Introduction to embedded system Design metrics Definitions of general-purpose, single-purpose, and application-specific processors Introduction to Nios

More information

ECE332, Week 2, Lecture 3

ECE332, Week 2, Lecture 3 ECE332, Week 2, Lecture 3 September 5, 2007 1 Topics Introduction to embedded system Design metrics Definitions of general-purpose, single-purpose, and application-specific processors Introduction to Nios

More information

Embedded Systems Ch 15 ARM Organization and Implementation

Embedded Systems Ch 15 ARM Organization and Implementation Embedded Systems Ch 15 ARM Organization and Implementation Byung Kook Kim Dept of EECS Korea Advanced Institute of Science and Technology Summary ARM architecture Very little change From the first 3-micron

More information

Pipelining. CSC Friday, November 6, 2015

Pipelining. CSC Friday, November 6, 2015 Pipelining CSC 211.01 Friday, November 6, 2015 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory register file ALU data memory register file Not

More information

CS425 Computer Systems Architecture

CS425 Computer Systems Architecture CS425 Computer Systems Architecture Fall 2017 Multiple Issue: Superscalar and VLIW CS425 - Vassilis Papaefstathiou 1 Example: Dynamic Scheduling in PowerPC 604 and Pentium Pro In-order Issue, Out-of-order

More information

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

ASSEMBLY LANGUAGE MACHINE ORGANIZATION ASSEMBLY LANGUAGE MACHINE ORGANIZATION CHAPTER 3 1 Sub-topics The topic will cover: Microprocessor architecture CPU processing methods Pipelining Superscalar RISC Multiprocessing Instruction Cycle Instruction

More information

Microprocessors, Lecture 1: Introduction to Microprocessors

Microprocessors, Lecture 1: Introduction to Microprocessors Microprocessors, Lecture 1: Introduction to Microprocessors Computing Systems General-purpose standalone systems (سيستم ھای نھفته ( systems Embedded 2 General-purpose standalone systems Stand-alone computer

More information

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)

More information

The Nios II Family of Configurable Soft-core Processors

The Nios II Family of Configurable Soft-core Processors The Nios II Family of Configurable Soft-core Processors James Ball August 16, 2005 2005 Altera Corporation Agenda Nios II Introduction Configuring your CPU FPGA vs. ASIC CPU Design Instruction Set Architecture

More information

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing

INSTITUTO SUPERIOR TÉCNICO. Architectures for Embedded Computing UNIVERSIDADE TÉCNICA DE LISBOA INSTITUTO SUPERIOR TÉCNICO Departamento de Engenharia Informática for Embedded Computing MEIC-A, MEIC-T, MERC Lecture Slides Version 3.0 - English Lecture 22 Title: and Extended

More information

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines

CS450/650 Notes Winter 2013 A Morton. Superscalar Pipelines CS450/650 Notes Winter 2013 A Morton Superscalar Pipelines 1 Scalar Pipeline Limitations (Shen + Lipasti 4.1) 1. Bounded Performance P = 1 T = IC CPI 1 cycletime = IPC frequency IC IPC = instructions per

More information

Embedded Computing Platform. Architecture and Instruction Set

Embedded Computing Platform. Architecture and Instruction Set Embedded Computing Platform Microprocessor: Architecture and Instruction Set Ingo Sander ingo@kth.se Microprocessor A central part of the embedded platform A platform is the basic hardware and software

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?)

EECC551 - Shaaban. 1 GHz? to???? GHz CPI > (?) Evolution of Processor Performance So far we examined static & dynamic techniques to improve the performance of single-issue (scalar) pipelined CPU designs including: static & dynamic scheduling, static

More information

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University

Chapter 2 Instructions Sets. Hsung-Pin Chang Department of Computer Science National ChungHsing University Chapter 2 Instructions Sets Hsung-Pin Chang Department of Computer Science National ChungHsing University Outline Instruction Preliminaries ARM Processor SHARC Processor 2.1 Instructions Instructions sets

More information

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design

EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design EN164: Design of Computing Systems Topic 06.b: Superscalar Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown

More information

Lecture 12: EIT090 Computer Architecture

Lecture 12: EIT090 Computer Architecture Lecture 12: EIT090 Computer Architecture Anders Ardö EIT Electrical and Information Technology, Lund University December 1, 2009 A. Ardö, EIT Lecture 12: EIT090 Computer Architecture December 1, 2009 1

More information

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1

Parallelism. Execution Cycle. Dual Bus Simple CPU. Pipelining COMP375 1 Pipelining COMP375 Computer Architecture and dorganization Parallelism The most common method of making computers faster is to increase parallelism. There are many levels of parallelism Macro Multiple

More information

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard.

RISC & Superscalar. COMP 212 Computer Organization & Architecture. COMP 212 Fall Lecture 12. Instruction Pipeline no hazard. COMP 212 Computer Organization & Architecture Pipeline Re-Cap Pipeline is ILP -Instruction Level Parallelism COMP 212 Fall 2008 Lecture 12 RISC & Superscalar Divide instruction cycles into stages, overlapped

More information

ECE 571 Advanced Microprocessor-Based Design Lecture 4

ECE 571 Advanced Microprocessor-Based Design Lecture 4 ECE 571 Advanced Microprocessor-Based Design Lecture 4 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 28 January 2016 Homework #1 was due Announcements Homework #2 will be posted

More information

1 Hazards COMP2611 Fall 2015 Pipelined Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor 1 Hazards Dependences in Programs 2 Data dependence Example: lw $1, 200($2) add $3, $4, $1 add can t do ID (i.e., read register $1) until lw updates $1 Control dependence Example: bne $1, $2, target add

More information

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors

William Stallings Computer Organization and Architecture 8 th Edition. Chapter 14 Instruction Level Parallelism and Superscalar Processors William Stallings Computer Organization and Architecture 8 th Edition Chapter 14 Instruction Level Parallelism and Superscalar Processors What is Superscalar? Common instructions (arithmetic, load/store,

More information

ECE 471 Embedded Systems Lecture 2

ECE 471 Embedded Systems Lecture 2 ECE 471 Embedded Systems Lecture 2 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 3 September 2015 Announcements HW#1 will be posted today, due next Thursday. I will send out

More information

Getting CPI under 1: Outline

Getting CPI under 1: Outline CMSC 411 Computer Systems Architecture Lecture 12 Instruction Level Parallelism 5 (Improving CPI) Getting CPI under 1: Outline More ILP VLIW branch target buffer return address predictor superscalar more

More information

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: C Multiple Issue Based on P&H COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface Chapter 4 The Processor: C Multiple Issue Based on P&H Instruction-Level Parallelism (ILP) Pipelining: executing multiple instructions in

More information

Processor Architecture

Processor Architecture Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

More information

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation

More information

Advanced issues in pipelining

Advanced issues in pipelining Advanced issues in pipelining 1 Outline Handling exceptions Supporting multi-cycle operations Pipeline evolution Examples of real pipelines 2 Handling exceptions 3 Exceptions In pipelined execution, one

More information

EC 513 Computer Architecture

EC 513 Computer Architecture EC 513 Computer Architecture Complex Pipelining: Superscalar Prof. Michel A. Kinsy Summary Concepts Von Neumann architecture = stored-program computer architecture Self-Modifying Code Princeton architecture

More information

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading

TDT Coarse-Grained Multithreading. Review on ILP. Multi-threaded execution. Contents. Fine-Grained Multithreading Review on ILP TDT 4260 Chap 5 TLP & Hierarchy What is ILP? Let the compiler find the ILP Advantages? Disadvantages? Let the HW find the ILP Advantages? Disadvantages? Contents Multi-threading Chap 3.5

More information

Real instruction set architectures. Part 2: a representative sample

Real instruction set architectures. Part 2: a representative sample Real instruction set architectures Part 2: a representative sample Some historical architectures VAX: Digital s line of midsize computers, dominant in academia in the 70s and 80s Characteristics: Variable-length

More information

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department

IA-32 Architecture COE 205. Computer Organization and Assembly Language. Computer Engineering Department IA-32 Architecture COE 205 Computer Organization and Assembly Language Computer Engineering Department King Fahd University of Petroleum and Minerals Presentation Outline Basic Computer Organization Intel

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1 Michela Taufer http://www.cis.udel.edu/~taufer/teaching/cis662f07 Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer

More information

Embedded Systems. 7. System Components

Embedded Systems. 7. System Components Embedded Systems 7. System Components Lothar Thiele 7-1 Contents of Course 1. Embedded Systems Introduction 2. Software Introduction 7. System Components 10. Models 3. Real-Time Models 4. Periodic/Aperiodic

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

Chapter 4 The Processor (Part 4)

Chapter 4 The Processor (Part 4) Department of Electr rical Eng ineering, Chapter 4 The Processor (Part 4) 王振傑 (Chen-Chieh Wang) ccwang@mail.ee.ncku.edu.tw ncku edu Depar rtment of Electr rical Engineering, Feng-Chia Unive ersity Outline

More information

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining

Several Common Compiler Strategies. Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Several Common Compiler Strategies Instruction scheduling Loop unrolling Static Branch Prediction Software Pipelining Basic Instruction Scheduling Reschedule the order of the instructions to reduce the

More information

Embedded Systems: Hardware Components (part I) Todor Stefanov

Embedded Systems: Hardware Components (part I) Todor Stefanov Embedded Systems: Hardware Components (part I) Todor Stefanov Leiden Embedded Research Center Leiden Institute of Advanced Computer Science Leiden University, The Netherlands Outline Generic Embedded System

More information

Photo David Wright STEVEN R. BAGLEY PIPELINES AND ILP

Photo David Wright   STEVEN R. BAGLEY PIPELINES AND ILP Photo David Wright https://www.flickr.com/photos/dhwright/3312563248 STEVEN R. BAGLEY PIPELINES AND ILP INTRODUCTION Been considering what makes the CPU run at a particular speed Spent the last two weeks

More information

Introduction to Microcontrollers

Introduction to Microcontrollers Introduction to Microcontrollers Embedded Controller Simply an embedded controller is a controller that is embedded in a greater system. One can define an embedded controller as a controller (or computer)

More information

Universität Dortmund. ARM Architecture

Universität Dortmund. ARM Architecture ARM Architecture The RISC Philosophy Original RISC design (e.g. MIPS) aims for high performance through o reduced number of instruction classes o large general-purpose register set o load-store architecture

More information

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts

Advance CPU Design. MMX technology. Computer Architectures. Tien-Fu Chen. National Chung Cheng Univ. ! Basic concepts Computer Architectures Advance CPU Design Tien-Fu Chen National Chung Cheng Univ. Adv CPU-0 MMX technology! Basic concepts " small native data types " compute-intensive operations " a lot of inherent parallelism

More information

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş

Evolution of Computers & Microprocessors. Dr. Cahit Karakuş Evolution of Computers & Microprocessors Dr. Cahit Karakuş Evolution of Computers First generation (1939-1954) - vacuum tube IBM 650, 1954 Evolution of Computers Second generation (1954-1959) - transistor

More information

Multi-cycle Instructions in the Pipeline (Floating Point)

Multi-cycle Instructions in the Pipeline (Floating Point) Lecture 6 Multi-cycle Instructions in the Pipeline (Floating Point) Introduction to instruction level parallelism Recap: Support of multi-cycle instructions in a pipeline (App A.5) Recap: Superpipelining

More information

Multiple Issue ILP Processors. Summary of discussions

Multiple Issue ILP Processors. Summary of discussions Summary of discussions Multiple Issue ILP Processors ILP processors - VLIW/EPIC, Superscalar Superscalar has hardware logic for extracting parallelism - Solutions for stalls etc. must be provided in hardware

More information

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline

NOW Handout Page 1. Review from Last Time #1. CSE 820 Graduate Computer Architecture. Lec 8 Instruction Level Parallelism. Outline CSE 820 Graduate Computer Architecture Lec 8 Instruction Level Parallelism Based on slides by David Patterson Review Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level Parallelism

More information

ARM Processors for Embedded Applications

ARM Processors for Embedded Applications ARM Processors for Embedded Applications Roadmap for ARM Processors ARM Architecture Basics ARM Families AMBA Architecture 1 Current ARM Core Families ARM7: Hard cores and Soft cores Cache with MPU or

More information

Keywords and Review Questions

Keywords and Review Questions Keywords and Review Questions lec1: Keywords: ISA, Moore s Law Q1. Who are the people credited for inventing transistor? Q2. In which year IC was invented and who was the inventor? Q3. What is ISA? Explain

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction

More information

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor. COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined

More information

More advanced CPUs. August 4, Howard Huang 1

More advanced CPUs. August 4, Howard Huang 1 More advanced CPUs In the last two weeks we presented the design of a basic processor. The datapath performs operations on register and memory data. A control unit translates program instructions into

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc.

Configurable and Extensible Processors Change System Design. Ricardo E. Gonzalez Tensilica, Inc. Configurable and Extensible Processors Change System Design Ricardo E. Gonzalez Tensilica, Inc. Presentation Overview Yet Another Processor? No, a new way of building systems Puts system designers in the

More information

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design

ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design ENGN1640: Design of Computing Systems Topic 06: Advanced Processor Design Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University

More information

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1

CSE 820 Graduate Computer Architecture. week 6 Instruction Level Parallelism. Review from Last Time #1 CSE 820 Graduate Computer Architecture week 6 Instruction Level Parallelism Based on slides by David Patterson Review from Last Time #1 Leverage Implicit Parallelism for Performance: Instruction Level

More information

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EITF20: Computer Architecture Part2.2.1: Pipeline-1 EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle

More information

General Purpose Signal Processors

General Purpose Signal Processors General Purpose Signal Processors First announced in 1978 (AMD) for peripheral computation such as in printers, matured in early 80 s (TMS320 series). General purpose vs. dedicated architectures: Pros:

More information

EEL 4783: Hardware/Software Co-design with FPGAs

EEL 4783: Hardware/Software Co-design with FPGAs EEL 4783: Hardware/Software Co-design with FPGAs Lecture 5: Digital Camera: Software Implementation* Prof. Mingjie Lin * Some slides based on ISU CPrE 588 1 Design Determine system s architecture Processors

More information

Computer Architecture

Computer Architecture Lecture 3: Pipelining Iakovos Mavroidis Computer Science Department University of Crete 1 Previous Lecture Measurements and metrics : Performance, Cost, Dependability, Power Guidelines and principles in

More information

Architectural Performance. Superscalar Processing. 740 October 31, i486 Pipeline. Pipeline Stage Details. Page 1

Architectural Performance. Superscalar Processing. 740 October 31, i486 Pipeline. Pipeline Stage Details. Page 1 Superscalar Processing 740 October 31, 2012 Evolution of Intel Processor Pipelines 486, Pentium, Pentium Pro Superscalar Processor Design Speculative Execution Register Renaming Branch Prediction Architectural

More information

Full Datapath. Chapter 4 The Processor 2

Full Datapath. Chapter 4 The Processor 2 Pipelining Full Datapath Chapter 4 The Processor 2 Datapath With Control Chapter 4 The Processor 3 Performance Issues Longest delay determines clock period Critical path: load instruction Instruction memory

More information

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining Single-Cycle Design Problems Assuming fixed-period clock every instruction datapath uses one

More information

Pipelining and Vector Processing

Pipelining and Vector Processing Pipelining and Vector Processing Chapter 8 S. Dandamudi Outline Basic concepts Handling resource conflicts Data hazards Handling branches Performance enhancements Example implementations Pentium PowerPC

More information

Modern Computer Architecture

Modern Computer Architecture Modern Computer Architecture Lecture2 Pipelining: Basic and Intermediate Concepts Hongbin Sun 国家集成电路人才培养基地 Xi an Jiaotong University Pipelining: Its Natural! Laundry Example Ann, Brian, Cathy, Dave each

More information

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design

INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 ELEC : Computer Architecture and Design INTEL Architectures GOPALAKRISHNAN IYER FALL 2009 GBI0001@AUBURN.EDU ELEC 6200-001: Computer Architecture and Design Silicon Technology Moore s law Moore's Law describes a long-term trend in the history

More information