Processor Architecture

Similar documents
Processor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Lecture 4: Review of MIPS. Instruction formats, impl. of control and datapath, pipelined impl.

COSC 6385 Computer Architecture - Pipelining

Lecture 3. Pipelining. Dr. Soner Onder CS 4431 Michigan Technological University 9/23/2009 1

Modern Computer Architecture

Pipeline Overview. Dr. Jiang Li. Adapted from the slides provided by the authors. Jiang Li, Ph.D. Department of Computer Science

Computer Architecture

ECE331: Hardware Organization and Design

COMP2611: Computer Organization. The Pipelined Processor

Computer Architecture. Lecture 6.1: Fundamentals of

CISC 662 Graduate Computer Architecture Lecture 6 - Hazards

Computer Systems Architecture Spring 2016

Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Full Datapath. CSCI 402: Computer Architectures. The Processor (2) 3/21/19. Fengguang Song Department of Computer & Information Science IUPUI

Pipelining. CSC Friday, November 6, 2015

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Full Datapath. Chapter 4 The Processor 2

The Processor (1) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

Processor (II) - pipelining. Hwansoo Han

EI338: Computer Systems and Engineering (Computer Architecture & Operating Systems)

LECTURE 3: THE PROCESSOR

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Computer Systems Laboratory Sungkyunkwan University

Chapter 4 The Processor 1. Chapter 4A. The Processor

Chapter 4. The Processor

COSC4201 Pipelining. Prof. Mokhtar Aboelaze York University

Basic Pipelining Concepts

HY425 Lecture 05: Branch Prediction

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Lecture 05: Pipelining: Basic/ Intermediate Concepts and Implementation

The Processor. Z. Jerry Shi Department of Computer Science and Engineering University of Connecticut. CSE3666: Introduction to Computer Architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Computer Architecture 计算机体系结构. Lecture 2. Instruction Set Architecture 第二讲 指令集架构. Chao Li, PhD. 李超博士

CSCI 402: Computer Architectures. Fengguang Song Department of Computer & Information Science IUPUI. Today s Content

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Advanced Computer Architecture

What is Pipelining? Time per instruction on unpipelined machine Number of pipe stages

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Lecture 2: Processor and Pipelining 1

CS/COE1541: Introduction to Computer Architecture

Pipelining. Ideal speedup is number of stages in the pipeline. Do we achieve this? 2. Improve performance by increasing instruction throughput ...

ECE 486/586. Computer Architecture. Lecture # 7

Advanced Instruction-Level Parallelism

Chapter 4. The Processor

MIPS An ISA for Pipelining

ECEC 355: Pipelining

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

CS429: Computer Organization and Architecture

Computer Architecture. The Language of the Machine

The Single Cycle Processor

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

Computer Architecture Spring 2016

Instruction Set Architecture

CENG3420 Lecture 03 Review

Announcements HW1 is due on this Friday (Sept 12th) Appendix A is very helpful to HW1. Check out system calls

Appendix C. Abdullah Muzahid CS 5513

Instruction Set Architecture

EITF20: Computer Architecture Part2.2.1: Pipeline-1

Processor Design CSCE Instructor: Saraju P. Mohanty, Ph. D. NOTE: The figures, text etc included in slides are borrowed

Chapter 4 (Part II) Sequential Laundry

CSEE 3827: Fundamentals of Computer Systems

Lecture 4: MIPS Instruction Set

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Chapter 2. lw $s1,100($s2) $s1 = Memory[$s2+100] sw $s1,100($s2) Memory[$s2+100] = $s1

Chapter 4 The Processor 1. Chapter 4B. The Processor

ECE232: Hardware Organization and Design

Designing a Pipelined CPU

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

14:332:331 Pipelined Datapath

SI232 Set #20: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Chapter 6 ADMIN. Reading for Chapter 6: 6.1,

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

are Softw Instruction Set Architecture Microarchitecture are rdw

Computer Science 61C Spring Friedland and Weaver. The MIPS Datapath

Instructions: Language of the Computer

ECE 486/586. Computer Architecture. Lecture # 8

Reminder: tutorials start next week!

Math 230 Assembly Programming (AKA Computer Organization) Spring MIPS Intro

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization

EITF20: Computer Architecture Part2.2.1: Pipeline-1

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Compiler Architecture

MIPS%Assembly% E155%

CSE 141 Computer Architecture Spring Lecture 3 Instruction Set Architecute. Course Schedule. Announcements

Lecture Topics. Announcements. Today: Data and Control Hazards (P&H ) Next: continued. Exam #1 returned. Milestone #5 (due 2/27)

CPE Computer Architecture. Appendix A: Pipelining: Basic and Intermediate Concepts

Lecture 7 Pipelining. Peng Liu.

Lecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.

Math 230 Assembly Programming (AKA Computer Organization) Spring 2008

Instruction Set Architecture. "Speaking with the computer"

Instruction Set Architecture part 1 (Introduction) Mehran Rezaei

ECE154A Introduction to Computer Architecture. Homework 4 solution

ECE260: Fundamentals of Computer Engineering

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Pipeline Review. Review

Transcription:

Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)

Moore s Law Gordon Moore @ Intel (1965) SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 2

Computer Architecture Trends (1) Pre-WWII: Mechanical calculating machines WWII-50 s: Technology improvement Relays à vacuum tubes High-level languages 60 s: Miniaturization/packaging Transistors Integrated circuits (ICs) 70 s: Semantic gap Complex instruction set Large support in hardware Microcoding SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 3

Computer Architecture Trends (2) 80 s: Keep it simple RISC (Reduced Instruction Set Computer) Shift complexity to software 90 s: What to do with all these transistors? Large on-chip cache Prefetching hardware Speculative execution Special-purpose instructions and hardware, 2000 s: Multi-cores Power wall Parallel computing SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 4

CISC Complex Instruction Set Computer Dominant style through mid-80 s Add instructions to perform typical programming tasks Stack-oriented instruction set Use stack to pass arguments, save program counter Explicit push and pop instructions Arithmetic instructions can access memory addl %eax, 12(%ebx, %ecx, 4) Requires memory read/write & complex address calculation Condition codes Set as side effect of arithmetic and logical instructions SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 5

RISC Reduced Instruction Set Computer Fewer, simpler instructions Might take more to get given task done Can be decoded easily Can execute them with small and fast hardware ister-oriented instruction set Many more (typically 32+) registers Use for arguments, return pointer, temporaries Only load and store instructions can access memory Single address mode: base register + displacement No condition codes Test instructions return 0/1 in register SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 6

MIPS Instruction Formats R-type add $3, $2, $1 ; R[3] = R[2] + R[1] op rs rt rd shamt funct addu $3, $2, 1234 ; R[3] = R[2] + 1234 lw $3, $2, 12 ; R[3] = Mem[R[2] + 12] beqz $3, dest ; if (R[3] == 0) PC = PC + 4 + dest I-type op rs rt 16 bit address j dest ; PC = dest jal dest ; R[31] = PC + 4, PC = dest J-type op 26 bit address SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 7

MIPS isters # Name Usage 0 $zero The constant value 0 1 $at Assembler temporary 2 $v0 Values for results and 3 $v1 expression evaluation 4 $a0 Arguments 5 $a1 6 $a2 7 $a3 8 $t0 Temporaries 9 $t1 (Caller-save registers) 10 $t2 11 $t3 12 $t4 13 $t5 14 $t6 15 $t7 # Name Usage 16 $s0 Saved temporaries 17 $s1 (Callee-save registers) 18 $s2 19 $s3 20 $s4 21 $s5 22 $s6 23 $s7 24 $t8 More temporaries 25 $t9 (Caller-save registers) 26 $k0 Reserved for OS kernel 27 $k1 28 $gp Global pointer 29 $sp Stack pointer 30 $fp Frame pointer 31 $ra Return address SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 8

S-MIPS Datapath (5 stages) IF Stage ID Stage EX Stage MEM Stage WB Stage Add Zero? PC +4 Instr. Memory File Output Data Memory LMD SMD Sign 16 Ext 32 SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 9

S-MIPS Datapath Example (1) instruction: add $r3, $r2, $r1 ; R[3]ß R[2] + R[1] IF Stage ID Stage EX Stage MEM Stage WB Stage PC +4 Add Instr. Memory $r1 $r2 $r3 File Zero? SMD Output Data Memory LMD Sign 16 Ext 32 $r1+$r2 SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 10

S-MIPS Datapath Example (2) LOAD instruction: lw $r3, $r2, 12 ; R[3]ß Mem[R[2] + 12] IF Stage ID Stage EX Stage MEM Stage WB Stage PC +4 Add Instr. Memory $r2 $r3 File 12 Sign 16 Ext 32 Zero? SMD 0x0000000c Output Data Memory LMD Mem[R[2]+12] SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 11

S-MIPS Datapath Example (3) Branch instruction: beqz $r3, 16 ; PC += (R[3]==0)? 4+16:4 IF Stage ID Stage EX Stage MEM Stage WB Stage PC +4 Add Instr. Memory $r3 File Zero? Output Data Memory LMD 16 Sign 16 Ext 32 SMD 0x00000010 SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 12

Pipelining in Real World SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 13

Pipelining Pipelining doesn t help latency of single task, it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously Potential speedup = Number of pipeline stages Unbalanced lengths of pipe stages reduces speedup Time to fill pipeline and time to drain it reduces speedup SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 14

Instruction Pipelining Why pipelining? Execute billions of instructions, so throughput is what matters What is desirable in instruction sets for pipelining? Variable length instructions vs. all instructions same length? Memory operands part of any operation vs. memory operands only in loads and stores? ister operand many places in instruction format vs. registers located in same place? SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 15

Pipelined S-MIPS Datapath IF Stage ID Stage EX Stage MEM Stage WB Stage Add Zero? PC +4 Instr. Memory IF/ID File ID/EX EX/MEM Data Memory MEM/WB LMD Sign 16 Ext 32 SMD SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 16

Visualizing Pipeline Time (clock cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Instruction Order Draining Filling SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 17

Pipeline Stall Hazards prevent the next instruction from executing during its designated clock cycle Structural hazards: HW cannot support the combination of instructions due to lack of HW capacity Data hazards: Instruction depends on the result of prior instruction still in the pipeline Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow Common solution is to stall the pipeline until the hazard is resolved, inserting one or more bubbles (idle clock cycles) in the pipeline SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 18

Structural Hazard Time (clock cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 Instruction Order Use dual port memory to support two simultaneous accesses Access memory from two instructions at the same cycle SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 19

Data Hazard Time (clock cycles) CC1 CC2 CC3 CC4 CC5 CC6 CC7 CC8 CC9 ADD R1,R2,R3 SUB R4,R1,R3 AND R6,R1,R7 OR R8,R1,R9 Clock Cycle XOR R10,R11,R1 Store into Ri Read from Ri Ri SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 20

Control Hazard Three stage stall on branches 10: beq r1,r3,22 Ifetch em We don t know yet the instruction being executed is a branch. Fetch the branch successor. Now, target address is available. 14: and r2,r3,r5 Ifetch em 18: or r6,r1,r7 Ifetch em 22: add r8,r1,r9 36: xor r10,r1,r11 Now, we know the instruction being executed is a branch. But stall until branch target address is known. Ifetch Ifetch em em SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 21

Summary CISC vs. RISC CISC: easy for compiler, fewer code bytes RISC: better for optimizing compilers, can make run fast with simple chip design CISC vs. RISC: Current status For desktop processors, choice of ISA not a technical issue With enough hardware, can make anything run fast Code compatibility more important For embedded processors, RISC makes sense Smaller, cheaper, less power SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 22

Summary (cont d) Pipelining Improved throughput Problems in pipelining Structural hazards Data hazards Control hazards Instruction set design affects complexity of pipeline implementation SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu) 23