COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Part A Datapath Design

Similar documents
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Single-Cycle Disadvantages & Advantages

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4 The Processor 1. Chapter 4A. The Processor

CENG 3420 Lecture 06: Datapath

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Appendix D. Controller Implementation

CSC 220: Computer Organization Unit 11 Basic Computer Organization and Design

Processor (I) - datapath & control. Hwansoo Han

Computer Architecture. Lecture 6.1: Fundamentals of

CMSC Computer Architecture Lecture 5: Pipelining. Prof. Yanjing Li University of Chicago

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Arquitectura de Computadores

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor Advanced Issues

Computer Architecture. Microcomputer Architecture and Interfacing Colorado School of Mines Professor William Hoff

Review: Abstract Implementation View

CMSC Computer Architecture Lecture 3: ISA and Introduction to Microarchitecture. Prof. Yanjing Li University of Chicago

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECS 154B Computer Architecture II Spring 2009

Master Informatics Eng. 2017/18. A.J.Proença. Memory Hierarchy. (most slides are borrowed) AJProença, Advanced Architectures, MiEI, UMinho, 2017/18 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 5: Processor Design Advanced Topics. Microprogramming: Basic Idea

Chapter 4. The Processor

Lecture 10: Simple Data Path

Chapter 4 The Datapath

EE 459/500 HDL Based Digital Design with Programmable Logic. Lecture 13 Control and Sequencing: Hardwired and Microprogrammed Control

Systems Architecture

COMPUTER ORGANIZATION AND DESIGN

CENG 3420 Lecture 06: Pipeline

CPE 335. Basic MIPS Architecture Part II

Lecture 7 Pipelining. Peng Liu.

Elementary Educational Computer

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Introduction to Pipelined Datapath

CPE 335 Computer Organization. Basic MIPS Pipelining Part I

End Semester Examination CSE, III Yr. (I Sem), 30002: Computer Organization

Instruction and Data Streams

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5

Chapter 4. The Processor

Lecture 1: Introduction and Fundamental Concepts 1

Multi-Threading. Hyper-, Multi-, and Simultaneous Thread Execution

CS252 Spring 2017 Graduate Computer Architecture. Lecture 6: Out-of-Order Processors

Lecture Topics. Announcements. Today: Single-Cycle Processors (P&H ) Next: continued. Milestone #3 (due 2/9) Milestone #4 (due 2/23)

Threads and Concurrency in Java: Part 1

Threads and Concurrency in Java: Part 1

Course Site: Copyright 2012, Elsevier Inc. All rights reserved.

14:332:331 Pipelined Datapath

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Chapter 4. The Processor

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

LECTURE 5. Single-Cycle Datapath and Control

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 4. The Processor. Determined by ISA and compiler. Determined by CPU hardware

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

CMSC Computer Architecture Lecture 2: ISA. Prof. Yanjing Li Department of Computer Science University of Chicago

COMPUTER ORGANIZATION AND DESIGN

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Pipelined Processor Design

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Computer Science 141 Computing Hardware

The MIPS Processor Datapath

COMPUTER ORGANIZATION AND DESIGN

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMP2611: Computer Organization. The Pipelined Processor

Computer Architecture ELEC3441

Lecture 6 Datapath and Controller

CS Computer Architecture Spring Week 10: Chapter

Systems Architecture I

Pipelined Processor Design

Lecture 9. Pipeline Hazards. Christos Kozyrakis Stanford University

LECTURE 3: THE PROCESSOR

Description of Single Cycle Computer (SCC)

EE University of Minnesota. Midterm Exam #1. Prof. Matthew O'Keefe TA: Eric Seppanen. Department of Electrical and Computer Engineering

CMSC22200 Computer Architecture Lecture 9: Out-of-Order, SIMD, VLIW. Prof. Yanjing Li University of Chicago

CMSC Computer Architecture Lecture 12: Virtual Memory. Prof. Yanjing Li University of Chicago

Morgan Kaufmann Publishers 26 February, COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. Chapter 5.

ENGN1640: Design of Computing Systems Topic 04: Single-Cycle Processor Design

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Design of Digital Circuits Lecture 14: Pipelining. Prof. Onur Mutlu ETH Zurich Spring April 2018

UNIVERSITY OF MORATUWA

RISC Processor Design

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

Chapter 5: The Processor: Datapath and Control

The Processor: Datapath & Control

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Chapter 3. Floating Point Arithmetic

One advantage that SONAR has over any other music-sequencing product I ve worked

Chapter 4. The Processor Designing the datapath

Major CPU Design Steps

The Big Picture: Where are We Now? EEM 486: Computer Architecture. Lecture 3. Designing a Single Cycle Datapath

Basic allocator mechanisms The course that gives CMU its Zip! Memory Management II: Dynamic Storage Allocation Mar 6, 2000.

Transcription:

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Iterface 5 th Editio Chapter The Processor Part A path Desig Itroductio CPU performace factors Istructio cout Determied by ISA ad compiler. CPI ad Cycle time Determied by CPU hardware. We will examie two MIPS implemetatios A simplified versio sigle-cycle executio. A more realistic pipelied versio. First, we will use a simple subset of istructios which shows most aspects of a basic CPU: access: lw, sw Basic math: add, sub, ad, or, slt Brach ad jump: beq, j. Itroductio Chapter The Processor 2 Cptr35 Chapter The Processor - path

Stored Executio Get the istructio. Decide what kid of istructio it is. Get ay ecessary data. Execute the istructio. Store the result. Repeat forever. Istructio Fetch Istructio Decode Operad Fetch Execute Result Store Next Istructio Course Itroductio 3 MIPS Fetch-Execute Processor Architecture Couter (PC) Istructio Register Cotrol I Course Itroductio Cptr35 Chapter The Processor - path 2

Iitialize Couter Þ First Istructio Couter (PC) Istructio Register Cotrol I Course Itroductio 5 Activate Cotrol Couter (PC) Istructio Register Cotrol I Course Itroductio 6 Cptr35 Chapter The Processor - path 3

Route to Couter (PC) Istructio Register Cotrol I Course Itroductio 7 Route Istructio to Istructio Register (IR) Couter (PC) Istructio Register Cotrol I Course Itroductio 8 Cptr35 Chapter The Processor - path

Select From Register File Couter (PC) Istructio Register Cotrol I Course Itroductio 9 Route to Arithmetic Uit () Couter (PC) Istructio Register Cotrol I Course Itroductio Cptr35 Chapter The Processor - path 5

Do the Computatio Couter (PC) Istructio Register Cotrol I Course Itroductio Store the Result Couter (PC) Istructio Register Cotrol I Course Itroductio 2 Cptr35 Chapter The Processor - path 6

Icremet PC Þ Poit to Next Istructio Couter (PC) Istructio Register Cotrol I Course Itroductio 3 Icremet PC Þ Poit to Next Istructio Couter (PC) Istructio Register Cotrol I Course Itroductio Cptr35 Chapter The Processor - path 7

Execute Next Istructio Couter (PC) Istructio Register Cotrol I Course Itroductio 5 View from 3, Feet Note: we have t bothered showig multiplexors. What is the role of the Add uits? Explai the iputs to the data memory uit. Explai the iputs to the. Explai the iputs to the register uit. Chapter The Processor 6 Cptr35 Chapter The Processor - path 8

Clockig Methodologies Which of the above uits eed a clock? What is beig saved (latched) o the risig edge of the clock? Keep i mid that the latched value remais there for a etire cycle. Buildig a path path Elemets that process data ad addresses i the CPU Registers, s, mux s, memories, We will build a MIPS datapath icremetally by refiig the overview desig show earlier..3 Buildig a path Chapter The Processor 8 Cptr35 Chapter The Processor - path 9

Istructio Fetch 32-bit register Icremet by for ext istructio Chapter The Processor 9 Executig R Format Operatios R format operatios (add, sub, slt, ad, or) R-type: 3 25 2 5 5 op rs rt rd shamt fuct Perform operatio (op ad fuct) o values i rs ad rt. Store the result back ito the Register File ito locatio rd. RegWrite cotrol Istructio Addr Register Addr 2 File Write Addr 2 Write overflow zero Note that the Register File is ot writte for every istructio (e.g. sw), so we eed a explicit write cotrol sigal for the Register File. Cptr35 Chapter The Processor - path

Executig Load ad Store Operatios Load ad store operatios ivolve computig a memory address by addig the base register to the 6-bit sig-exteded offset field i the istructio. I-Type: op rs rt address offset Store: The value that is read from the Register File ito the. Load: The value that is read from the ad writte to the Register File. RegWrite cotrol MemWrite Istructio Addr Register Addr 2 File Write Addr 2 Write overflow zero Write Sig 6 Exted 32 Mem Executig Brach Operatios Brach operatios ivolve comparig the operads read from the Register File for equality (the zero output) ad the computig the brach target address by addig the updated PC to the 6-bit sig-exteded offset field cotaied i the istructio. I-Type: op rs rt address offset Add Shift left 2 Add Brach target address PC Istructio Register File Addr Addr 2 Write Addr Write 2 cotrol zero (to brach cotrol logic) Sig 6 Exted 32 Cptr35 Chapter The Processor - path

Executig Jump Operatios A Jump operatio ivolves replacig the lower 28 bits of the PC with the lower 26 bits of the fetched istructio shifted left by 2 bits. J-type: 3 25 op target address Add PC Istructio Istructio 26 Shift left 2 28 Jump address View from 5 Feet The picture above is referred to as the path. Cptr35 Chapter The Processor - path 2

View from 2 Feet Chapter The Processor 25 Creatig a Sigle path from the Parts Sigle-cycle desig fetch, decode, ad execute each istructio i oe (ad oly oe) clock cycle. No datapath resource ca be used more tha oce per istructio, so some must be duplicated (e.g., separate Istructio ad, several adders). Multiplexers eeded at the iput of shared elemets, with cotrol lies to do the selectio. Write sigals to cotrol writig to the Register File ad. Cycle time is determied by the legth of the logest path, kow as the critical path. Cptr35 Chapter The Processor - path 3

The Mai Cotrol Uit Cotrol sigals must be derived from the istructio R-type Load/ Store Brach rs rt rd shamt fuct 3:26 25:2 2:6 5: :6 5: 35 or 3 rs rt address 3:26 25:2 2:6 5: rs rt address 3:26 25:2 2:6 5: always opcode always read read, except for load write for R-type ad load sig-exted ad add Chapter The Processor 27 Addig the Cotrol The purpose of the cotroller is to cotrol the flow of data. The cotroller determies which cotrol sigals to activate ad whe to activate them. The sigals eeded are depedat o the operatio to be performed (Register, Brach or jump, or read/write). R-type: 3 25 2 5 5 op rs rt rd shamt fuct 3 25 2 5 I-Type: op rs rt address offset Observatios J-type: 3 25 op The op field is always i bits 3-26. target address The addresses of registers to be read are always specified by the rs (bits 25-2) ad rt fields (bits 2-6); for lw ad sw, rs is the base register. The address of the register to be writte is i oe of two places i rt (bits 2-6) for lw; i rd (bits 5-) for R-type istructios. The offset for beq, lw, ad sw is always foud i bits 5-. Cptr35 Chapter The Processor - path

R-type Istructio /Cotrol Flow Add Op Istr[3-26] Cotrol Uit Brach Src Shift left 2 Add PCSrc Mem MemtoReg MemWrite PC Istructio Istr[3-] RegDst RegWrite Istr[25-2] Addr Register Istr[2-6] Addr 2 File Write Addr Istr[5 2 Write -] ovf zero Write Istr[5-] Sig 6 Exted 32 Istr[5-] cotrol Load Word Istructio /Cotrol Flow Add Op Istr[3-26] Cotrol Uit Brach Src Shift left 2 Add PCSrc Mem MemtoReg MemWrite PC Istructio Istr[3-] RegDst RegWrite Istr[25-2] Addr Register Istr[2-6] Addr 2 File Write Addr Istr[5 2 Write -] ovf zero Write Istr[5-] Sig 6 Exted 32 Istr[5-] cotrol Cptr35 Chapter The Processor - path 5

Brach Istructio /Cotrol Flow Add Op Istr[3-26] Cotrol Uit Brach Src Shift left 2 Add PCSrc Mem MemtoReg MemWrite PC Istructio Istr[3-] RegDst RegWrite Istr[25-2] Addr Register Istr[2-6] Addr 2 File Write Addr Istr[5 2 Write -] ovf zero Write Istr[5-] Sig 6 Exted 32 Istr[5-] cotrol Addig the Jump Operatio Add Istr[25-] 26 Op Istr[3-26] Shift left 2 Cotrol Uit 28 32 PC+[3-28] Jump Brach Src Shift left 2 Add PCSrc Mem MemtoReg MemWrite PC Istructio Istr[3-] RegDst RegWrite Istr[25-2] Addr Register Istr[2-6] Addr 2 File Write Addr Istr[5 2 Write -] ovf zero Write Istr[5-] Sig 6 Exted 32 Istr[5-] cotrol Cptr35 Chapter The Processor - path 6

Istructio Critical Paths Calculate the clock cycle time assumig egligible delays for multiplexers, cotrol uit, sig exted, PC access, shift left 2, wires, setup ad hold times except: Istructio ad (2 ps) Register File access (reads or writes) ( ps) ad adders (2 ps) Istr. I Mem Reg Rd Op D Mem Reg Wr Total R- type load store beq jump 2 2 6 2 2 2 8 2 2 2 7 2 2 5 2 2 Sigle-Cycle Disadvatages & Advatages Clk Uses the clock cycle iefficietly the clock cycle must be timed to accommodate the slowest istructio. This would be especially problematic for more complex istructios like floatig poit multiply. Cycle Cycle 2 lw sw Waste May be wasteful of area. Some fuctioal uits (e.g., adders, memory) must be duplicated sice they ca ot be shared durig a clock cycle. However, the sigle-cycle implemetatio is simple ad easy to uderstad. Cptr35 Chapter The Processor - path 7

How Ca We Make the path Faster? Fetch (ad execute) more tha oe istructio at a time This is called Superscalar processig covered later i this chapter. Start fetchig ad executig the ext istructio before the curret oe has completed Pipeliig (all?) moder processors are pipelied for performace. Remember the performace equatio: CPU time = CPI * CC * IC Uder ideal coditios ad with a large umber of istructios, the speedup from pipeliig is approximately equal to the umber of pipe stages A five stage pipelie is early five times faster because the CC (clock cycle time) ca be early five times faster. The Five Stages of the Load Istructio Cycle Cycle 2 Cycle 3 Cycle Cycle 5 lw IFetch Dec Exec Mem WB IFetch: Istructio Fetch ad Update PC Dec: Istructio Decode ad Register fetch Exec: Execute R-type; calculate memory address Mem: /write the data from/to the WB: Write the result ito the register file Cptr35 Chapter The Processor - path 8

A Pipelied MIPS Processor Start the ext istructio before the curret oe has completed Improves throughput - total amout of work doe i a give time. Istructio latecy (time from the start of a istructio to its completio) is ot reduced. It is ofte icreased due to imbalaces betwee stages. Cycle Cycle 2 Cycle 3 Cycle Cycle 5 Cycle 6 Cycle 7 Cycle 8 lw IFetch Dec Exec Mem WB sw IFetch Dec Exec Mem R-type IFetch Dec Exec WB Clock cycle (pipelie stage time) is limited by the slowest stage. For some stages we do t eed the whole clock cycle (e.g., WB). For some istructios, some stages are wasted cycles (i.e., othig is doe durig that cycle for that istructio). Pipelie Performace Sigle-cycle (cycle time = 8ps) Pipelied (cycle time = 2ps) Chapter The Processor 38 Cptr35 Chapter The Processor - path 9

Pipelie Speedup If all stages are balaced i.e., all take the same time. (Time to execute a istructio) pipelied = (Time to execute a istructio) o-pipelied Number of stages If ot balaced, speedup is less. Speedup due to icreased throughput Latecy (time for each istructio) does ot decrease. Chapter The Processor 39 Pipeliig the MIPS ISA What makes it easy All istructios are the same legth (32 bits) Ca fetch i the st stage ad decode i the 2 d stage. Few istructio formats (three) with symmetry across formats Ca begi readig register file i 2 d stage. operatios occur oly i loads ad stores Ca use the execute stage to calculate memory addresses. Each istructio writes at most oe result (i.e., chages the machie state) ad does it i the last pipelie stages (MEM or WB). Operads must be aliged i memory so a sigle data trasfer takes oly oe data memory access. Cptr35 Chapter The Processor - path 2

Graphically Represetig MIPS Pipelie IM Reg DM Reg Ca help with aswerig questios like: How may cycles does it take to execute this code? What is the doig durig cycle? Is there a hazard (whatever that is), why does it occur, ad how ca it be fixed? Five Istructio Sequece Time (clock cycles) I s t r. O r d e r Ist Ist Ist 2 Ist 3 IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg IM Reg DM Reg Oce the pipelie is full, oe istructio is completed every cycle, so CPI = Ist IM Reg DM Reg Time to fill the pipelie Cptr35 Chapter The Processor - path 2

Ca Pipeliig Get Us Ito Trouble? Yes: Pipelie Hazards Structural hazards: attempt to use the same resource by two differet istructios at the same time. hazards: attempt to use data before it is ready A istructio s source operad(s) are produced by a prior istructio still i the pipelie. Cotrol hazards: attempt to make a decisio about program cotrol flow before the coditio has bee evaluated ad the ew PC target address calculated Brach ad jump istructios, exceptios. Hazards ca usually be resolved by waitig. Pipelie cotrol must detect the hazard ad take actio to resolve it. Other Pipelie Structures Are Possible What about a slow multiply istructio? Make the clock twice as slow or Let it take two cycles (sice it does t use the DM stage). MUL IM Reg DM Reg What if the data memory access is twice as slow as the istructio memory? Make the clock twice as slow or Let data memory access take two cycles (ad keep the same clock rate). IM Reg DM DM2 Reg Cptr35 Chapter The Processor - path 22

Other Pipelie Architectures ARM7 IM Reg EX PC update IM access decode reg access op DM access shift/rotate commit result (write back) XScale PC update BTB access start IM access IM IM2 Reg DM Reg SHFT DM2 IM access decode reg access op shift/rotate reg 2 access DM write reg write start DM access exceptio Summary All moder day processors use pipeliig. Pipeliig does t help latecy of sigle task, it helps throughput of etire workload. Potetial speedup: a CPI of ad faster Clock Cycle. Pipelie rate limited by slowest pipelie stage Ubalaced pipe stages make for iefficiecies. The time to fill pipelie ad time to drai it ca impact speedup for deep pipelies ad short code rus. Must detect ad resolve hazards Stallig egatively affects CPI. Cptr35 Chapter The Processor - path 23