ECE331: Hardware Organization and Design

Similar documents
ECE232: Hardware Organization and Design

ECE331: Hardware Organization and Design

Chapter 4 The Processor 1. Chapter 4A. The Processor

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Computer Architecture. Lecture 6.1: Fundamentals of

ECE331: Hardware Organization and Design

ECE 154A Introduction to. Fall 2012

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

Lecture 7 Pipelining. Peng Liu.

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Processor Architecture

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

LECTURE 3: THE PROCESSOR

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

COSC 6385 Computer Architecture - Pipelining

Chapter 4. The Processor

ECE260: Fundamentals of Computer Engineering

ECE331: Hardware Organization and Design

COMPUTER ORGANIZATION AND DESIGN

ECE232: Hardware Organization and Design. Computer Organization - Previously covered

Chapter 4. The Processor

Pipelining. Maurizio Palesi

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

14:332:331 Pipelined Datapath

Modern Computer Architecture

ECE232: Hardware Organization and Design

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Computer Systems Architecture Spring 2016

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

Outline Marquette University

Computer Architecture Review. Jo, Heeseung

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Outline. A pipelined datapath Pipelined control Data hazards and forwarding Data hazards and stalls Branch (control) hazards Exception

Processor (I) - datapath & control. Hwansoo Han

ECE154A Introduction to Computer Architecture. Homework 4 solution

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer Architecture

Chapter 4. The Processor

ECS 154B Computer Architecture II Spring 2009

ECE232: Hardware Organization and Design

1 Hazards COMP2611 Fall 2015 Pipelined Processor

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Lecture 10: Simple Data Path

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

ECEC 355: Pipelining

are Softw Instruction Set Architecture Microarchitecture are rdw

CSEE 3827: Fundamentals of Computer Systems

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Full Datapath. Chapter 4 The Processor 2

CSEE 3827: Fundamentals of Computer Systems

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Chapter 4 (Part II) Sequential Laundry

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Processor (II) - pipelining. Hwansoo Han

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Introduction to Pipelined Datapath

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Chapter 4 The Processor 1. Chapter 4B. The Processor

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Advanced Computer Architecture

ECE260: Fundamentals of Computer Engineering

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

CS3350B Computer Architecture Winter 2015

Lecture 6 Datapath and Controller

Thomas Polzer Institut für Technische Informatik

Chapter 4. The Processor

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

COMPUTER ORGANIZATION AND DESIGN

Improving Performance: Pipelining

Pipeline: Introduction

COMP2611: Computer Organization. The Pipelined Processor

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Format. 10 multiple choice 8 points each. 1 short answer 20 points. Same basic principals as the midterm

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Systems Architecture

Overview. Appendix A. Pipelining: Its Natural! Sequential Laundry 6 PM Midnight. Pipelined Laundry: Start work ASAP

Pipelining. CSC Friday, November 6, 2015

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

Transcription:

ECE331: Hardware Organization and Design Lecture 35: Final Exam Review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB

Material from Earlier in the Semester Throughput and latency of a circuit Processor pipeline and removing hazards Multiplier and multiply/divide instructions Basic MIPS instructions and knowledge Single-cycle datapath Be familiar with homeworks, projects, and midterm 1 and 2 ECE232: Final Exam Review 2

Example Machine Organization Workstation design target 25% of cost - processor 25% of cost - memory (minimum memory size) Rest - I/O devices, power supplies, box Processor (CPU) Computer Memory Devices Keyboard, Mouse Control Input Disk Datapath Output Display, Printer ECE232: Final Exam Review 3

PC Motherboard Closeup Courtesy: www.tigerdirect.com ECE232: Final Exam Review 4

Inside the Processor AMD Barcelona: 4 processor cores ECE232: Final Exam Review 5

System Layers Application software Written in high-level language System software Compiler: translates high level language code to machine code Operating System: service code Handling input/output Managing memory and storage Scheduling tasks & sharing resources Hardware Processor, memory, I/O controllers ECE232: Final Exam Review 6

Levels of Program Code High-level language Level of abstraction closer to problem domain Provides for productivity and portability Assembly language Textual representation of instructions Hardware representation Binary digits (bits) Encoded instructions and data ECE232: Final Exam Review 7

Datapath I/O A wire (or by extension, a bus) can be driven by only one tri-state at a time If InPass is active, AluPass must be inactive If AluPass is active, InPass must be inactive InPass OutPass LoadX X Y LoadY Function ALU AluPass ECE232: Final Exam Review 8

Program View of Memory Processor (CPU) Control Datapath Computer Memory Devices Input Output Memory viewed as a large, single -dimension array, with an address? 8 bits of data A memory address is an index into array The index points to a byte of memory - "Byte addressing" A 32-bit machine addresses memory by a 32-bit address Access bytes (8 bits), words (32 bits) or half-words 0 1 2 3 4 5 6... 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data 8 bits of data ECE232: Final Exam Review 9

MIPS Instruction Types Arithmetic & Logical - manipulate data in registers add $s1, $s2, $s3 $s1 = $s2 + $s3 or $s3, $s4, $s5 $s3 = $s4 OR $s5 Data Transfer - move register data to/from memory lw $s1, 100($s2) $s1 = Memory[$s2 + 100] sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Branch - alter program flow beq $s1, $s2, 25 if ($s1==$s2) PC = PC + 4 + 4*25 ECE232: Final Exam Review 10

Registers vs. Memory Registers in a register file are faster to access than memory Operating on memory data requires loads and stores More instructions to be executed Compiler must use registers for variables as much as possible Only spill to memory for less frequently used variables Register optimization is important! Registers are a fixed resources ECE232: Final Exam Review 11

MIPS Registers and Usage ECE232: Final Exam Review 12

MIPS Instructions All instructions exactly 32 bits wide Different formats for different purposes Similarities in formats ease implementation 6 bits 5 bits 5 bits 5 bits 5 bits 6 bits op rs rt rd shamt funct R-Format 6 bits 5 bits 5 bits 16 bits op rs rt offset I-Format 6 bits 26 bits op address J-Format ECE232: Final Exam Review 13

Procedure Calling Steps required 1. The calling program places parameters in registers 2. The calling program transfers control to the procedure (callee) 3. The called procedure acquire storage that it needs from memory 4. The called procedure executes its operations 5. The called procedure places results in registers for the calling program to retrieve. 6. The called procedure reverts the appropriate MIPS registers to their original or correct state. 7. The called procedure returns control to the the next word in memory from which it was called. 8. The calling program proceeds with its calculations ECE232: Final Exam Review 14

What values are saved? $sp 0x7fff fffc Stack Dynamic Data pc 0x0040 0000 0 Static Data Text Reserved ECE232: Final Exam Review 15

IEEE Floating-Point Format single: 8 bits double: 11 bits S Exponent single: 23 bits double: 52 bits Fraction x = ( 1) S (1+ Fraction) 2 (Exponent Bias) S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 significand < 2.0 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the 1. restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1203 ECE232: Final Exam Review 16

Floating-Point Multiplication Now consider a 4-digit binary example 1.000 2 2 1 1.110 2 2 2 (0.5 0.4375) 1. Add exponents Unbiased: 1 + 2 = 3 Biased: ( 1 + 127) + ( 2 + 127) = 3 + 254 127 = 3 + 127 2. Multiply significands 1.000 2 1.110 2 = 1.110 2 1.110 2 2 3 3. Normalize result & check for over/underflow 1.110 2 2 3 (no change) with no over/underflow 4. Round and renormalize if necessary 1.110 2 2 3 (no change) 5. Determine sign: +value value value 1.110 2 2 3 = 0.21875 ECE232: Final Exam Review 17

Floating Point Special Representations S E 127 F = ( 1) 1. f 2 1 1. < 2 f 1 E 254 Single Precision Double Precision Object represented Exponent Fraction Exponent Fraction 0 0 0 0 0 0 nonzero 0 nonzero ± denormalized number 1-254 Anything 1-2046 Anything ± floating point number 255 0 2047 0 ± infinity 255 nonzero 2047 nonzero NaN (not a number) ECE232: Final Exam Review 18

FP Adder Hardware Step 1 Step 2 Step 3 Step 4 ECE232: Final Exam Review 19

Instruction Execution Steps Instruction Fetch Decode, Inc PC and Read Registers 1. Read IM[PC] 2. Instruction Decode, PC = PC + 4, Register read ALU Operation, Branch address Data Memory operation 3. ALU operation, Branch address computation 4. LW/STORE in Data memory Write Back 5. Register Write ECE232: Final Exam Review 20

Datapath Step 1: Any Instruction 4 A d d PC Address 32-bit adder or ALU wired only for add Clock Instruction Instruction Memory (IMem) Once program is loaded, IMem is read-only ECE232: Final Exam Review 21

Single cycle data path op System clock affects primarily the Program Counter ECE232: Final Exam Review 22

MIPs Datapath Datapath contains 5 stages Instruction fetch (IF), Decode (ID), Execute (EX), Memory (Mem), Writeback (W) PC Instruction Memory Registers A L U Data Memory Stage 1 (IF) Stage 2 (ID) Stage 3 (EX) Stage 4 (Mem) Stage 5 (W) Can I pipeline the MIPs stages? ECE232: Final Exam Review 23

T a s k O r d e r Sequential Laundry A B C D 6 PM 7 8 9 10 11 12 1 2 AM 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 Time Sequential laundry takes 8 hours for 4 loads If they learned pipelining, how long would laundry take? ECE232: Final Exam Review 24

Pipelining Lessons T a s k O r d e r A B C D 6 PM 7 8 9 Time 30 30 30 30 30 30 30 Pipelining doesn t help latency of single task, it helps throughput of entire workload Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Pipeline rate limited by slowest pipeline stage Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup ECE232: Final Exam Review 25

MIPS Pipelined Datapath State registers between pipeline stages to isolate them IF:IFetch ID:Dec EX:Execute MEM: MemAccess WB: WriteBack Inst 5 Inst 4 Inst 3 Inst 2 Inst 1 Add PC 4 Instruction Memory Read Address IFetch/Dec Read Addr 1 Register Read Read Addr Data 2 1 File Write Addr Write Data Read Data 2 Dec/Exec Shift left 2 Add ALU Exec/Mem Address Write Data Data Memory Read Data Mem/WB Sign 16 Extend 32 System Clock ECE232: Final Exam Review 26

Pipeline Hazards Data hazards: an instruction uses the result of a previous instruction (RAW) ADD R1, R2, R3 or SW R1, 4(R2) SUB R4, R1, R5 LW R3, 4(R2) Control hazards: the address of the next instruction to be executed depends on a previous instruction BEQ R1,R2,CONT SUB R6,R7,R8 CONT: ADD R3,R4,R5 Structural hazards: two instructions need access to the same resource e.g., single memory shared for instruction fetch and load/store ECE232: Final Exam Review 27

Forwarding with Load-use Data Hazards Time I n s t r. lw $1,4($2) sub $4,$1,$5 ALU IM Reg DM Reg ALU IM Reg DM Reg O r d e r and $6,$1,$7 or $8,$1,$9 xor $4,$1,$5 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg sub needs to stall Will still need one stall cycle even with forwarding ECE232: Final Exam Review 28

Datapath with Forwarding Hardware PCSrc ID/EX EX/MEM IF/ID Control PC 4 Instruction Memory Read Address Add Read Addr 1 Register Read Read Addr 2 Data 1 File Write Addr Read Data 2 Write Data 16 Sign 32 Extend Shift left 2 Add ALU ALU cntrl Branch Address Data Memory Write Data Read Data MEM/WB Forward Unit ECE232: Final Exam Review 29

Branch Instructions Cause Control Hazards I n s t r. O r d e r beq lw Inst 3 Inst 4 ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg ALU IM Reg DM Reg jr IF ID EX DM WB IF ID EX DM WB ECE232: Final Exam Review 30

One Way to Fix a Control Hazard I n s t r. beq stall ALU IM Reg DM Reg O r d e r stall stall lw Inst 3 Fix branch hazard by waiting introduce stalls ALU IM Reg DM Reg ALU IM Reg DM ECE232: Final Exam Review 31

Reducing branch penalty through HW design ECE232: Final Exam Review 32

Branch Prediction Easiest - static prediction Always taken, always not taken Opcode based Displacement based (forward not taken, backward taken) Compiler directed (branch likely, branch not likely) Dynamic prediction prediction per branch in program 1 bit predictor remember last taken/not taken per branch Use a branch-history table (BHT) with 1 bit entry Use part of the PC (low-order bits) to index table Why? Multiple branches may share the same bit Invert the bit if prediction is wrong Branch PC BHT Predictor 0 Predictor 1 Predictor 127 ECE232: Final Exam Review 33