The Evolution of Microprocessors. Per Stenström

Similar documents
EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

CS3350B Computer Architecture Quiz 3 March 15, 2018

CSEN 601: Computer System Architecture Summer 2014

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

Chapter 4. The Processor. Computer Architecture and IC Design Lab

ELEC 5200/6200 Computer Architecture and Design Spring 2017 Lecture 4: Datapath and Control

Four Steps of Speculative Tomasulo cycle 0

EECS150 - Digital Design Lecture 10- CPU Microarchitecture. Processor Microarchitecture Introduction

COMPUTER ORGANIZATION AND DESIGN. The Hardware/Software Interface. Chapter 4. The Processor: A Based on P&H

Computer and Information Sciences College / Computer Science Department Enhancing Performance with Pipelining

Single Cycle CPU Design. Mehran Rezaei

CSE 378 Midterm 2/12/10 Sample Solution

Processor Design Pipelined Processor (II) Hung-Wei Tseng

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

EECS150 - Digital Design Lecture 9- CPU Microarchitecture. Watson: Jeopardy-playing Computer

Lecture 10: Simple Data Path

CS 351 Exam 2 Mon. 11/2/2015

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

EE557--FALL 1999 MIDTERM 1. Closed books, closed notes

ECE232: Hardware Organization and Design

Load1 no Load2 no Add1 Y Sub Reg[F2] Reg[F6] Add2 Y Add Reg[F2] Add1 Add3 no Mult1 Y Mul Reg[F2] Reg[F4] Mult2 Y Div Reg[F6] Mult1

Very short answer questions. "True" and "False" are considered short answers.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

CS3350B Computer Architecture Winter 2015

Processor (I) - datapath & control. Hwansoo Han

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

Improving Performance: Pipelining

EECS 151/251A Fall 2017 Digital Design and Integrated Circuits. Instructor: John Wawrzynek and Nicholas Weaver. Lecture 13 EE141

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

CS433 Midterm. Prof Josep Torrellas. October 19, Time: 1 hour + 15 minutes

Lecture 8 Dynamic Branch Prediction, Superscalar and VLIW. Computer Architectures S

COMP303 - Computer Architecture Lecture 8. Designing a Single Cycle Datapath

Perfect Student CS 343 Final Exam May 19, 2011 Student ID: 9999 Exam ID: 9636 Instructions Use pencil, if you have one. For multiple choice

CENG 3420 Lecture 06: Datapath

CSE 2021 COMPUTER ORGANIZATION

Systems Architecture

4DM4 Sample Problems for Midterm Tuesday, Oct. 22, 2013

A Model RISC Processor. DLX Architecture

CS 152 Computer Architecture and Engineering

Machine Organization & Assembly Language

CS 152 Computer Architecture and Engineering

Multi-cycle Instructions in the Pipeline (Floating Point)

Complications with long instructions. CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. How slow is slow?

NATIONAL UNIVERSITY OF SINGAPORE

COSC 6385 Computer Architecture - Pipelining

CA226 Advanced Computer Architecture

CENG 3420 Computer Organization and Design. Lecture 06: MIPS Processor - I. Bei Yu

LECTURE 5. Single-Cycle Datapath and Control

Laboratory 5 Processor Datapath

Laboratory Exercise 6 Pipelined Processors 0.0

CS 152 Computer Architecture and Engineering

Computer Organization and Structure

Adding Support for jal to Single Cycle Datapath (For More Practice Exercise 5.20)

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

ENE 334 Microprocessors

RISC Processor Design

Department of Electrical Engineering and Computer Sciences Fall 2003 Instructor: Dave Patterson CS 152 Exam #1. Personal Information

COMP2611: Computer Organization. The Pipelined Processor

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

CPE 335 Computer Organization. Basic MIPS Architecture Part I

Codeword[1] Codeword[0]

Pipeline design. Mehran Rezaei

MIPS-Lite Single-Cycle Control

Computer Organization MIPS Architecture. Department of Computer Science Missouri University of Science & Technology

The Processor: Datapath & Control

CSE Quiz 3 - Fall 2009

Single Cycle Data Path

LECTURE 10. Pipelining: Advanced ILP

CS232 Final Exam May 5, 2001

CS/COE0447: Computer Organization

CS/COE0447: Computer Organization

Design of Digital Circuits 2017 Srdjan Capkun Onur Mutlu (Guest starring: Frank K. Gürkaynak and Aanjhan Ranganathan)

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Chapter 5 Solutions: For More Practice

Lecture 6 MIPS R4000 and Instruction Level Parallelism. Computer Architectures S

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Lecture 8: Control COS / ELE 375. Computer Architecture and Organization. Princeton University Fall Prof. David August

COSC121: Computer Systems. ISA and Performance

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

/ : Computer Architecture and Design Fall 2014 Midterm Exam Solution

Instruction-Level Parallelism and Its Exploitation

6.823 Computer System Architecture Datapath for DLX Problem Set #2

Major CPU Design Steps

CMSC 411 Computer Systems Architecture Lecture 6 Basic Pipelining 3. Complications With Long Instructions

CSc 256 Midterm 2 Fall 2011

CPU Organization (Design)

Processor: Multi- Cycle Datapath & Control

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

Grading Results Total 100

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

EC 413 Computer Organization - Fall 2017 Problem Set 3 Problem Set 3 Solution

CSE 141 Computer Architecture Summer Session Lecture 3 ALU Part 2 Single Cycle CPU Part 1. Pramod V. Argade

CS 61C Summer 2016 Guerrilla Section 4: MIPS CPU (Datapath & Control)

Updated Exercises by Diana Franklin

CO Computer Architecture and Programming Languages CAPL. Lecture 18 & 19

Transcription:

The Evolution of Microprocessors Per Stenström

Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory

Evolution of Microprocessors Multicycle Pipelined Superscalar Multicore and beyond 1970 1980 1990 2000 2010 2020 Time

Multicycle Computers

Fetch next instruction Decode/Fetch data Execute Store result

Opcode Assembly code Meaning Comments ADD,SUB, ADD R1,R2,R3 R1=(R2) + (R3) Integer operations ADDI, SUBI ADDI R1,R2,#3 R1=(R2)+3 Immediate AND, OR, XOR AND R1,R2,R3 R1=(R2).AND.(R3) bit-wise logical AND,OR, XOR SLT SLT R1,R2,R3 If (R2)<(R3) then R1=1 else R1=0 ADD.D, SUB.D, MUL.D, DIV.D Test R2, R3 outcome in R1 ADD.D F1,F2,F3 F1=(F2)+(F3) Floatingpoint operations

Opcode Assembly code Meaning Comments LB,LH,LW,LD LW R1,#20(R2) R1=MEM[20+(R2)] For bytes, half-words, words and double words SB,SH,SW,SD SW R1,#20(R2) MEM[20+(R2)]=R1 L.S, L.D L.S F0,#20(R2) F0=MEM[20+(R2)] Single/double float load S.S, S.D S.S F0,#20(R2) MEM[20+(R2)]=F0 Single/double float store

Opcode Assembly code Meaning Comments BEQZ, BNEZ BEQZ R1,Label If (R1)=0 then PC=Label BEQ, BNE BEQ R1,R2,Label If (R1)=(R2) PC=Label Conditional branch-equal 0/not equal 0 Conditional branchequal/not equal J J Target PC=Target Target is an immediate field JR JR R1 PC=(R1) Target is in register JAL JAL Target R31 = PC + 4; PC = Target Jump to target and save return address in R31

PC Read Address [31..0] memory

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rs data Rt addr Rt data Registers [15..0]

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rs data Rt addr Rt data Registers ALU [15..0]

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rs data Rt addr Rt data Registers ALU [15..0]

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data ALU Effective addresss Displacement + (Rs) [15..0] Sign Extend

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data ALU Wr addr Rd data Data memory [15..0] Sign Extend

ALU Memory transfer PC Control Read Address [31..0] memory [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data ALU Wr addr Rd data Data memory [15..0] Sign Extend

+4 ALU Memory transfer PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Control Fetching the next instruction ALU Wr addr Rd data Data memory [15..0] Sign Extend

+4 ALU Memory transfer PC Read Address [31..0] memory ADD BEQ R1,R2,offset [15..11] [25..21] [20..16] Rs addr Rs data Rt addr Rt data Registers Shift Left 2 ADD ALU ZERO Wr addr Rd data Data memory Control [15..0] Sign Extend

+4 PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend

+4 PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] RegDst [15..0] Rs addr Rt addr RegWrite Registers Sign Extend Rs data Rt data Shift Left 2 ALUSrc ADD ALU ALUOp MemRead PCSrc Wr addr Rd data Data memory MemWrite MemToReg

INSTRUCTION FETCH +4 DECODE /OPERAND FETCH EXECUTE MEMORY ACCESS WRITE BACK PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend

Pipelined Single-Cycle Computer

The Conveyor Belt

LABEL: LD R2, 0(R10) LD R3, 0(R11) +4 ADD R1,R2,R3 SD 0(R12), R1 SUBI R4,R4,#1 PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory ADDI R10,R10,#4 ADDI R11,R11,#4 ADDI R12,R12,#4 BNEZ R4, LABEL [15..0] Sign Extend SUBI R4,R4,#1 SD 0(R12),R1 ADD R1,R2,R3 LD R3,0(R11) LD R2,0(R10)

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend SUBI R4,R4,#1 SD 0(R12),R1 ADD R1,R2,R3 LD R3,0(R11) LD R2,0(R10)

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend SUBI R4,R4,#1 SD 0(R12),R1 UNUSED ADD R1,R2,R3 LD R3,0(R11)

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend SUBI R4,R4,#1 SD 0(R12),R1 UNUSED UNUSED ADD R1,R2,R3

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend UNUSED BEQ R1,offset

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend UNUSED UNUSED BEQ R1,offset

Structural hazards +4 Data hazards Control hazards PC Read Address [31..0] memory ADD [15..11] [25..21] [20..16] Rs addr Rt addr Registers Rs data Rt data Shift Left 2 ADD ALU Wr addr Rd data Data memory [15..0] Sign Extend NEXT INST. UNUSED UNUSED BEQ R1,offset

Superscalar Microprocessors

IF ID IS EX WB Fetch Decode/ Dispatch Issue/Read Operands Execute Write Back

IQ 1 EX 1 IF ID IQ 2 EX 2 WB IQ N EX N Fetch Decode/ Dispatch Read Operands Execute Write Back

I-Q INT IF ID FP-Q FP WB Cycle: 1 M-Q MEM L.S F0,0(R1) Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT IF ID FP-Q FP WB Cycle: 2 M-Q MEM L.S F1,0(R2) L.S F0,0(R1) Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT IF ID FP-Q FP WB Cycle: 3 M-Q L.S F0,0(R1) MEM ADD.S F2,F1,F0 L.S F1,0(R2) Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT IF ID FP-Q FP WB Cycle: 4 M-Q L.S F1,0(R2) MEM L.S F0,0(R1) S.S F2,0(R1) ADD.S F2,F1,F0 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT IF ID ADD.S FP-Q F2,F1,F0 FP WB Cycle: 5 M-Q MEM L.S F1,0(R2) ADDI R1,R1,#4 S.S F2,0(R1) L.S F0,0(R1) Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT IF ID FP-Q ADD.S FP F2,F1,F0 WB Cycle: 6 M-Q S.S F2,0(R1) MEM ADDI R2,R2,#4 ADDI R1,R1,#4 L.S F1,0(R2) Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q ADDI R1,R1,#4 INT IF ID FP-Q ADD.S FP F2,F1,F0 WB Cycle: 7 M-Q S.S F2,0(R1) MEM SUBI R3,R3,#1 ADDI R2,R2,#4 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT ADDI R2,R2,#4 Cycle: 8 ADDI R1,R1,#4 IF ID FP-Q ADD.S FP F2,F1,F0 WB M-Q S.S F2,0(R1) MEM BNEZ R3,Loop SUBI R3,R3,#1 The ADDI R1,R1,#4 is being executed Out-of-program-order with respect to the S.S F2,0(R1)! Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q SUBI R3,R3,#1 INT IF ID FP-Q ADD.S FP F2,F1,F0 WB ADDI R2,R2,#4 Cycle: 9 M-Q S.S F2,0(R1) MEM L.S F0,0(R2) BNEZ R3,Loop ADDI R1,R1,#4 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q BNEZ R3,Loop INT SUBI R3,R3,#1 IF ID FP-Q ADD.S FP F2,F1,F0 WB Cycle: 10 M-Q S.S F2,0(R1) MEM L.S F1,0(R1) L.S F0,0(R2) ADDI R2,R2,#4 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT BNEZ R3,Loop IF ID FP-Q ADD.S FP F2,F1,F0 WB Cycle: 11 M-Q S.S F2,0(R1) MEM L.S F1,0(R1) L.S F0,0(R2) SUBI R3,R3,#1 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

I-Q INT BNEZ R3,Loop IF ID FP-Q FP WB Cycle: 12 M-Q L.S F0,0(R2) MEM S.S F2,0(R1) ADD,S F2,F1,F0 L.S F1,0(R1) ADD.S F2,F1,F0 Loop L.S F0,0(R1) L.S F1,0(R2) ADD.S F2,F1,F0 S.S F2,0(R1) ADDI R1,R1,#4 ADDI R2,R2,#4 SUBI R3,R3,#1 BNEZ R3,Loop

Multicore Computers

Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory

Blended Learning Lectures on youtube 9 interactive sessions, flipped class-room style 5 problem solution sessions 1 design exploration project Textbook: Parallel Computer Organization and Design Dubois, Annavaram, Stenström