Hardware Design I Chap. 10 Design of microprocessor

Similar documents
COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Chapter 4. The Processor

EE 3170 Microcontroller Applications

Chapter 4. The Processor

Mark Redekopp and Gandhi Puvvada, All rights reserved. EE 357 Unit 15. Single-Cycle CPU Datapath and Control

Chapter 4 The Processor 1. Chapter 4A. The Processor

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Chapter 4. The Processor

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

The Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Processor (I) - datapath & control. Hwansoo Han

Chapter 4. The Processor. Computer Architecture and IC Design Lab

Chapter 4. The Processor. Instruction count Determined by ISA and compiler. We will examine two MIPS implementations

The MIPS Processor Datapath

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 2, 2016

ECE260: Fundamentals of Computer Engineering

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Chapter 4. The Processor Designing the datapath

are Softw Instruction Set Architecture Microarchitecture are rdw

CS3350B Computer Architecture Quiz 3 March 15, 2018

These actions may use different parts of the CPU. Pipelining is when the parts run simultaneously on different instructions.

ECE260: Fundamentals of Computer Engineering

CSEE 3827: Fundamentals of Computer Systems

von Neumann Architecture Basic Computer System Early Computers Microprocessor Reading Assignment An Introduction to Computer Architecture

Basic Computer System. von Neumann Architecture. Reading Assignment. An Introduction to Computer Architecture. EEL 4744C: Microprocessor Applications

Pipelining. CSC Friday, November 6, 2015

CS 31: Intro to Systems Digital Logic. Kevin Webb Swarthmore College February 3, 2015

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CSE 141L Computer Architecture Lab Fall Lecture 3

Systems Architecture

CS3350B Computer Architecture Winter 2015

Lecture 7 Pipelining. Peng Liu.

Laboratory 05. Single-Cycle MIPS CPU Design smaller: 16-bits version One clock cycle per instruction

Computer Architecture

Review. N-bit adder-subtractor done using N 1- bit adders with XOR gates on input. Lecture #19 Designing a Single-Cycle CPU

LECTURE 3: THE PROCESSOR

Lecture 01: Basic Structure of Computers

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road QUESTION BANK (DESCRIPTIVE) UNIT-I

Introduction. Datapath Basics

Outline. EEL-4713 Computer Architecture Designing a Single Cycle Datapath

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Laboratory Single-Cycle MIPS CPU Design (4): 16-bits version One clock cycle per instruction

Review: Abstract Implementation View

361 datapath.1. Computer Architecture EECS 361 Lecture 8: Designing a Single Cycle Datapath

Comprehensive Exams COMPUTER ARCHITECTURE. Spring April 3, 2006

Lecture Topics. Announcements. Today: The MIPS ISA (P&H ) Next: continued. Milestone #1 (due 1/26) Milestone #2 (due 2/2)

Processing Unit CS206T

CS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic

Inf2C - Computer Systems Lecture Processor Design Single Cycle

Improving Performance: Pipelining

CSE140: Components and Design Techniques for Digital Systems


Computers and Microprocessors. Lecture 34 PHYS3360/AEP3630

CPU Structure and Function

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

CISC / RISC. Complex / Reduced Instruction Set Computers

Computer Science 324 Computer Architecture Mount Holyoke College Fall Topic Notes: MIPS Instruction Set Architecture

Processor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

Chapter 4. The Processor

CS 61C: Great Ideas in Computer Architecture. MIPS CPU Datapath, Control Introduction

CPE300: Digital System Architecture and Design

CPE 335 Computer Organization. Basic MIPS Architecture Part I

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

LECTURE 10. Pipelining: Advanced ILP

EITF20: Computer Architecture Part2.2.1: Pipeline-1

TDT4255 Computer Design. Lecture 4. Magnus Jahre. TDT4255 Computer Design

ASSEMBLY LANGUAGE MACHINE ORGANIZATION

Computer Systems. Binary Representation. Binary Representation. Logical Computation: Boolean Algebra

Chapter 4. The Processor

CENG 3420 Lecture 06: Datapath

ECE232: Hardware Organization and Design

CMSC411 Fall 2013 Midterm 1

Learning Outcomes. Spiral 3-3. Sorting: Software Implementation REVIEW

The Pipelined RiSC-16

RISC Processor Design

Data paths for MIPS instructions

Computer Fundamentals and Operating System Theory. By Neil Bloomberg Spring 2017

Computer Hardware Engineering

CS311 Lecture: CPU Implementation: The Register Transfer Level, the Register Set, Data Paths, and the ALU

COMPUTER ORGANIZATION AND DESIGN

Lecture 10: Simple Data Path

Anne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

CpE242 Computer Architecture and Engineering Designing a Single Cycle Datapath

CS61C : Machine Structures

Microprocessors and Microcontrollers. Assignment 1:

Laboratory Pipeline MIPS CPU Design (2): 16-bits version

Slide Set 7. for ENCM 501 in Winter Term, Steve Norman, PhD, PEng

Advanced Computer Architecture

Lecture 15: Pipelining. Spring 2018 Jason Tang

Chapter 1. Microprocessor architecture ECE Dr. Mohamed Mahmoud.

ECE331: Hardware Organization and Design

SAE5C Computer Organization and Architecture. Unit : I - V

Transcription:

Hardware Design I Chap. 0 Design of microprocessor E-mail: shimada@is.naist.jp Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory hierarchy Instruction set architecture Microarchitecture of the microprocessor Microarchitecture with sequential processing Microarchitecture with pipelined processing Hardware Design I (Chap. 0) 2

What is microprocessor? LSI for processing data at the center of the computer Also, called processor There are several type of microprocessors Central Processing Unit (CPU) Microcontroller Graphic accelerator Other several accelerators Hardware Design I (Chap. 0) 3 Central Processing Unit (CPU) A nucleus of Neumann computer Detail will be taught in later slide Sometimes, the word microprocessor denotes this By combining CPU with memories, disks, I/Os, we can create or server Examples: Intel core i7, Fujitsu SPARC64 VII, AMD Opteron,... Hardware Design I (Chap. 0) 4 2

Microcontroller A processor used for control of electric devices Optimized for those use e.g. give high current drive ability to output pin to directly drive LED Many of them can organize computer with one chip Implement hierarchy into them Example: Renesus H8, Atmel AT9, Zilog Z80,... Too many companies provide them Hardware Design I (Chap. 0) 5 Graphic accelerator A processor used for processing graphic Also called Graphic Processing Unit (GPU) Implement too many to utilize parallelism In graphic processing, usually, we can process each pixel independently It also utilized for high parallelism arithmetic Examples: NVIDIA GeForce, AMD(ATI) Radeon,... Hardware Design I (Chap. 0) 6 3

Other accelerators There s several processor to accelerate data processing which is not suitable to process with CPU or GPU But recently, GPU intrudes to this area Usually, it implements much to supply high arithmetic performance Example: ClearSpeed CSX, Ageia PhysiX,... Hardware Design I (Chap. 0) 7 Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory hierarchy Instruction set architecture Microarchitecture of the microprocessor Microarchitecture with sequential processing Microarchitecture with pipelined processing Hardware Design I (Chap. 0) 8 4

Microprocessor from sequential circuit viewpoint We can abstract microprocessor with following sequential machine Inputs Programs (= instructions) Data for processing Outputs: Processed data State: Register (and register file) Programs (=instructions) data for processing Combinational logic circuit Register (and register file) Clock Processed data Hardware Design I (Chap. 0) 9 Neumann computer Neumann computer is current major organization Point of Neumann computer Instructions and data are placed in main Instructions manipulate data of flip-flop and memories We can apply different manipulation by different instruction Neumann computer Non-Neman computer Instructions Data before processing Data after processing A hardware which apply operation notated in instruction to data Data before processing Data after processing A hardware which apply fixed operation to data Hardware Design I (Chap. 0) 0 5

Advantages and disadvantages of Neumann computer Advantages We don t have to modify hardware between different data processing EDSAC(949) is one of the early Neumann computer ENIAC have to change wire connection if it change processing We can execute complicated processing with multiple instructions Disadvantages (Neumann bottleneck) Communication between processor and increases Slow drags down processor performance How about non-neumann computer? It remains in some specific use (e.g. movie codec) It begins to reposition with reconfigurable hardware ->Chap. 9 Hardware Design I (Chap. 0) Neumann computer and microprocessor Instructions Data before processing Data after processing A hardware which apply operation notated in instruction to data This hardware is a microprocessor Combinational logic circuit Register (and register file) Microprocessor is a hardware which operate data processing in Neman computer Also called processor or Central Processing Unit (CPU) It includes a part of main (see hierarchy) Hardware organization of the microprocessor differs between its purpose For server, for, for high performance embedded, for embedded,... Hardware Design I (Chap. 0) 2 6

States in the microprocessor How we define states of sequential machine in the processor? Usually, we call it register There are many types of registers Special purpose registers (SPR) Program counter (): Denotes position of instruction which is executing Flag register: Denotes carry generation, overflow,... Global purpose registers (GPR) Used for hold data before/after processing (work as a part of main ) Also, used for intermediate data under arithmetic The organization of register differs between instruction set architectures Relationship between GPR and hierarchy is shown in later Hardware Design I (Chap. 0) 3 Inputs for the microprocessor There are two inputs of sequential machine in the processor Instruction: must be defined if we design sequential machine Data: don t have to define them What s instruction? Series of bits: e.g. 000000000000000000000000000 Usually, we use assembly language to represent it A programming language which has one to one relationship to instruction It defines operation relationship between registers and main (in basic) e.g. add R8, R4, R5 (GPR #8 = GPR #4 GPR #5) <-> 000000000000000000000000000 Introduce how to define it efficiency in later slide Hardware Design I (Chap. 0) 4 7

GPR and hierarchy (/2) In recent processors, the GPR becomes a part of main Firstly the processor moves data from main to register Processor apply operation to the data in the register After operation, it write back data to main This organization effectively reduces workload for main Assuming that we apply multiple operation to data Internal processor Register Data transmission Data transmission Disk FF or SRAM DRAM HDD or SSD Hardware Design I (Chap. 0) 5 GPR and hierarchy (2/2) We call hierarchy for those hierarchical Including disk It also reduces performance degradation from slow device Recently, number of hierarchy increasing because the speed difference between devices is increasing Traditional hierarchy Recent hierarchy Register Register Data transmission SRAM L cache SRAM L2 cache Data transmission Disk Disk Hardware Design I (Chap. 0) 6 8

Instruction set architecture (ISA) To create sequential machine, we have to define format of inputs and internal state Internal state: denoted by registers (for internal state) Inputs: instructions We usually call this definition as Instruction Set Architecture (ISA) Including systematic instruction construction method By defining ISA carefully you can reduce States (registers) Combinational logics Hardware Design I (Chap. 0) 7 Instruction encoding Instruction is encoded to chunk of binary under ISA definition e.g. add R8, R4, R5 (GPR #8 = GPR #4 GPR #5) <-> 000000000000000000000000000 In usual encoding, we give meaning into some chunk of bits R type op rs rt rd shift func 3 26 2 6 6 0 I type op rs rt addr 3 26 2 6 Hardware Design I (Chap. 0) 8 0 9

Example of instruction encoding (/3) Example: Instruction encoding of MIPS Total length is 32-bit It has meaning in several chunk of bits op: Type of operation (arithmetic, load, store, branch,...) rs: Source operand for arithmetic rt: Source operand 2 for arithmetic rd: Destination operand for arithmetic (store arithmetic result) shift: Amount of shift func: Type of arithmetic (supplemental for op) addr: Immediate value for arithmetic R type op rs rt rd shift func 3 26 2 6 6 0 I type op rs rt addr 3 26 2 6 0 Hardware Design I (Chap. 0) 9 Example of instruction encoding (2/3) add R8, R4, R5 000000 0000 000 0000 no use 00000 3 26 2 6 6 0 Operation: R8 = R4 R5 (add: addition) sub R8, R4, R5 000000 0000 000 0000 no use 0000 3 26 2 6 6 0 Operation: R8 = R4 - R5 (sub: subtract) This difference indicates different arithmetic Hardware Design I (Chap. 0) 20 0

Example of instruction encoding (3/3) lw R8, 8(R4) 000 0000 0000 000000000000000 3 26 2 6 Operation: Load value in (R4 8) position on main to R8 (lw: load word) bne R4, R5, -5 0 0000 0000 000 0 3 26 2 6 0 Operation: if R4!= R5, back to 5 prior instruction (bne: branch not equal) Hardware Design I (Chap. 0) 2 Short Exercise Let s translate following assembly to instruction notated by binary Refer R-type instruction notation in slides add R0, R3, R4 Hardware Design I (Chap. 0) 22

Answer Let s translate following assembly to instruction notated by binary Refer R-type instruction notation in slides add R0, R3, R4 (no use) 000000 00 00 000 00000 00000 3 26 2 6 6 0 Hardware Design I (Chap. 0) 23 Outline What is microprocessor? Microprocessor from sequential machine viewpoint Microprocessor and Neumann computer Memory hierarchy Instruction set architecture Microarchitecture of the microprocessor Microarchitecture with sequential processing Microarchitecture with pipelined processing Hardware Design I (Chap. 0) 24 2

What s Microarchitecture? An implementation of processor on the hardware We can choose several possible microarchitecture in same ISA e.g. Intel Core i7, Intel Atom, AMD Phenon It can execute same program (e.g. Windows) because ISA is the same Usually, we choose microarchitecture for the purpose of the computer e.g. Choose low power consumption microarchitecture for notebook Hardware Design I (Chap. 0) 25 One organization of microprocessor (/3) Combinational logics : execute add, sub, logical arithmetic, shift,... Multiplexers: construct data path from instructions and values in register Adder after : increment to indicate next instruction Adder beside : calculate branch target in branch instruction Address bus Hardware Design I (Chap. 0) 26 3

One organization of microprocessor (2/3) Buses for main Address bus: send address value which indicate read/write position in main Data Bus Send data value which is read from main Send data value which will be written into main Address bus Hardware Design I (Chap. 0) 27 One organization of microprocessor (3/3) Registers Register file (): chunk of GPR Number is differ between architectures (e.g. 32 in MIPS) Program counter () Instruction register (): hold instruction comes from main Address bus Hardware Design I (Chap. 0) 28 4

How to understand operation of prior chunk of hardware? It seems that it s hard to understand operation of prior large hardware -> True How can we understand it easily? -> Decompose hardware to 5 part and understand those operation This 5 part decomposition has importance in operation Operate 5 part sequentially: 5 phase operation processor Finish one instruction with 5 clock pulse Operate 5 part simultaneously: 5 stage pipelined processor Finish one instruction with clock pulse (in general case) Hardware Design I (Chap. 0) 29 Decomposition to 5 part. Instruction fetch (IF) 2. Instruction decode (ID), register read 3. Execution (EX) 4. Memory access (MA) 5. Write back to register (WB), and commit (5.). 2.(5.) 3. 4. 5. Hardware Design I (Chap. 0) 30 5

Operation of IF (/2) Manages updating Update with incremented value If instruction is not branch instructions (or not taken branch) Update with branch target address If instruction is (take conditional) branch instruction Read instruction which is indicated by value Hardware Design I (Chap. 0) 3 Operation of IF (2/2) Manages updating Read instruction which is indicated by value Send content of to address bus Capture instruction (to ) comes from data bus add R4, R5, R6 Hardware Design I (Chap. 0) 32 6

Operation of ID Read registers by rs or rt bits in instruction Usually, is consist of RAM so that rs or rt becomes address for RAM Decode instructions Generate several control signals from instruction e.g. signal for multiplexer before Hardware Design I (Chap. 0) 33 Operation of EX Arithmetic Memory address generation on access is also operated Detailed arithmetic is indicated by func part of instruction Calculate branch target address Add and immediate value comes from instruction Hardware Design I (Chap. 0) 34 7

Operation of MA Memory access Send generated address to address bus If access instruction is load, capture data in data bus If access instruction is store, the processor send store data into data bus Other instruction: no operation Hardware Design I (Chap. 0) 35 Operation of WB Writeback operation result to In instructions which creates result If branch is taken, write branch target address to Coordinated to IF part largely -> See IF part slide Hardware Design I (Chap. 0) 36 8

Example execution of add R8, R4, R5. Send to main and read instruction into 2. Read register by a part of instruction and decode instruction and generate signals 3. Apply arithmetic denoted by a part of instruction 4. Do nothing 5. Writeback arithmetic result to and increment Hardware Design I (Chap. 0) 37 Example execution of lw R8, 8(R4). Send to main and read instruction into 2. Read register by a part of instruction, decode instruction and generate signals, and apply sign extension to immediate value 3. Create address by adding register and immediate values 4. Send address to and read data 5. Writeback read data to and increment Hardware Design I (Chap. 0) 38 9

Example execution of bne R4, R5, -5. Send to main and read instruction into 2. Read register by a part of instruction, decode instruction and generate signals, and apply sign extension to immediate value 3. Check branch condition by arithmetic result of and generate target address with adder 4. Do nothing 5. If condition is taken, writeback target address to. Otherwise increment Hardware Design I (Chap. 0) 39 Pipelined processing on processor Prior example is sequential execution of 5 parts Are there any way to work them simultaneously? Process consecutive instructions in each parts Called pipelined processing -> Prof. Yao s lecture on Jan. 26 Imaging creating product on belt conveyer in factory Decoding lw R7, 8(R3) Fetching bne R2, R6, -5 Executing add R8, R4, R5 Hardware Design I (Chap. 0) 40 20

Outlined notation of pipeline We usually utilize operation (e.g. IF, ID,...) denoted in box to represent parallel execution Horizontal axis denotes time (by clock cycles) Vertical axis denotes instruction order (by program order) add R8, R4, R5 IF ID EX MEM WB Time (by clock cycles) lw R7, 8(R3) Sequential IF ID EX MEM add R8, R4, R5 IF ID EX MEM WB lw R7, 8(R3) IF ID EX MEM WB bne R2, R6, -5 IF ID EX MEM WB Pipelined Hardware Design I (Chap. 0) 4 Additional hardware for pipelined processing We have to prepare additional FF to keep different instructions Prepare FF between each part called pipeline register It keeps not only instructions but also related informations (read register value, arithmetic result,...) Each part is called pipeline stage or stage Pipeline register Hold add R8, R4, R5 and related informations Hold lw R7, 8(R3) and related informations Hardware Design I (Chap. 0) 42 2

Pipelined processing from sequential circuit viewpoint It becomes anomalistic sequential circuit Updated state is written into next FF It gives additional constraint to state Caused by relationship between instructions Current state Next state Current state Next state Hold add R8, R4, R5 and related informations Hold lw R7, 8(R3) and related informations Hardware Design I (Chap. 0) 43 Pipeline hazard caused by data hazard Let s consider following instructions add R8, R4, R5 sub R6, R8, R7 IF ID ID ID ID EX MEM WB Data dependency R8 is not written in this point!!! Pipeline continuously try read R8 and stop it at ID stage Called pipeline stall Called pipeline hazard : pipeline processing stops with several reasons This is a pipeline hazard caused by data dependency Called data hazard Data dependency: a relationship that later instruction utilize the result of prior instruction IF ID EX MEM WB Read R8 in this point Hardware Design I (Chap. 0) 44 22

Data hazard avoidance with result forwarding Pipeline stall is achieved by additional constraint on sequential machine Are there any way to avoid it? -> Result forwarding: passing value without Additional data path is required add R8, R4, R5 sub R6, R8, R7 IF ID IF EX ID MEM R8 WB EX MEM WB Result forwarding Additional data path for result forwarding Hardware Design I (Chap. 0) 45 Pipeline and length of logic (/2) Length of (combinational) logic seems the same in outlined figure between stages But in practical, it differs IF ID EX MEM WB IF ID EX MEM WB Hardware Design I (Chap. 0) 46 23

Pipeline and depth of logic (2/2) How do we operate those logics? We have to operate them with latest one (which has longest logic) because they works with same clock pulse add R8, R4, R5 IF ID EX MEM WB sub R6, R8, R7 IF ID EX MEM WB We have to average them as far as possible If we consider sequential organization, there s no problem IF ID EX MEM WB IF ID EX MEM WB Hardware Design I (Chap. 0) 47 24