Computer Architecture Review. Jo, Heeseung

Similar documents
Chapter 5. Memory Technology

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

V. Primary & Secondary Memory!

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

CSE 2021: Computer Organization

CSE 2021: Computer Organization

Pipelining. CSC Friday, November 6, 2015

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

CSCI 402: Computer Architectures. Arithmetic for Computers (3) Fengguang Song Department of Computer & Information Science IUPUI.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

5. Memory Hierarchy Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Chapter 5. Principle of Locality. Chapter 5 Large and Fast: Exploiting Memory Hierarchy 1

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Cycle Time for Non-pipelined & Pipelined processors

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Memory Technology. Caches 1. Static RAM (SRAM) Dynamic RAM (DRAM) Magnetic disk. Ideal memory. 0.5ns 2.5ns, $2000 $5000 per GB

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

Full Datapath. Chapter 4 The Processor 2

Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN

ECE260: Fundamentals of Computer Engineering

Computer Systems Laboratory Sungkyunkwan University

The Memory Hierarchy Cache, Main Memory, and Virtual Memory

Floating Point Arithmetic. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Processor (II) - pipelining. Hwansoo Han

Virtual Memory. Patterson & Hennessey Chapter 5 ELEC 5200/6200 1

EN1640: Design of Computing Systems Topic 06: Memory System

Chapter 4. The Processor

Chapter 1. Computer Abstractions and Technology

ECE 154A Introduction to. Fall 2012

Memory Hierarchy Y. K. Malaiya

Chapter 5A. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN

Floating Point Arithmetic

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

Computer Architecture and IC Design Lab. Chapter 3 Part 2 Arithmetic for Computers Floating Point

ECE331: Hardware Organization and Design

4.1 Introduction 4.3 Datapath 4.4 Control 4.5 Pipeline overview 4.6 Pipeline control * 4.7 Data hazard & forwarding * 4.

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Part II Virtual Memory

Pipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.

COEN-4730 Computer Architecture Lecture 3 Review of Caches and Virtual Memory

CS6303 Computer Architecture Regulation 2013 BE-Computer Science and Engineering III semester 2 MARKS

Full Datapath. Chapter 4 The Processor 2

EN1640: Design of Computing Systems Topic 06: Memory System

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface

LECTURE 3: THE PROCESSOR

Chapter 5 B. Large and Fast: Exploiting Memory Hierarchy

CS/ECE 3330 Computer Architecture. Chapter 5 Memory

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

Signed Multiplication Multiply the positives Negate result if signs of operand are different

Pipelined processors and Hazards

Donn Morrison Department of Computer Science. TDT4255 Memory hierarchies

Recap from Last Time. CSE 2021: Computer Organization. It s All about Numbers! 5/12/2011. Text Pictures Video clips Audio

ECE331: Hardware Organization and Design

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

CPU Pipelining Issues

CSEE 3827: Fundamentals of Computer Systems

Modern Computer Architecture

COMPUTER ORGANIZATION AND DESI

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

CS 61C: Great Ideas in Computer Architecture. Direct Mapped Caches

The Processor (3) Jinkyu Jeong Computer Systems Laboratory Sungkyunkwan University

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition. Chapter 5. Large and Fast: Exploiting Memory Hierarchy

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Memory Hierarchy, Fully Associative Caches. Instructor: Nick Riasanovsky

Memory Hierarchy. ENG3380 Computer Organization and Architecture Cache Memory Part II. Topics. References. Memory Hierarchy

Lecture 29 Review" CPU time: the best metric" Be sure you understand CC, clock period" Common (and good) performance metrics"

Chapter 5. Large and Fast: Exploiting Memory Hierarchy. Jiang Jiang

Chapter 5B. Large and Fast: Exploiting Memory Hierarchy

ECE468 Computer Organization and Architecture. Virtual Memory

Virtual Memory. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Chapter 5. Large and Fast: Exploiting Memory Hierarchy

The Memory Hierarchy & Cache Review of Memory Hierarchy & Cache Basics (from 350):

Do not start the test until instructed to do so!

ECE4680 Computer Organization and Architecture. Virtual Memory

Locality. Cache. Direct Mapped Cache. Direct Mapped Cache

Computer Architecture. Fall Dongkun Shin, SKKU

CS 61C: Great Ideas in Computer Architecture. The Memory Hierarchy, Fully Associative Caches

COMPUTER ORGANIZATION AND DESIGN

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

Chapter Seven. Memories: Review. Exploiting Memory Hierarchy CACHE MEMORY AND VIRTUAL MEMORY

CMSC411 Fall 2013 Midterm 1

Transistor: Digital Building Blocks

Caches and Memory Hierarchy: Review. UCSB CS240A, Winter 2016

Caches and Memory Hierarchy: Review. UCSB CS240A, Fall 2017

Representing and Manipulating Floating Points. Jo, Heeseung

Summary of Computer Architecture

Caches. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

Memory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology

ECE 15B Computer Organization Spring 2011

Transcription:

Computer Architecture Review Jo, Heeseung

Computer Abstractions and Technology Jo, Heeseung

Below Your Program Application software Written in high-level language System software Compiler: translates HLL code to machine code Operating System: service code Hardware - Handling input/output - Managing memory and storage - Scheduling tasks & sharing resources Processor, memory, I/O controllers 3

Levels of Program Code High-level language Level of abstraction closer to problem domain Provides for productivity and portability Assembly language Textual representation of instructions Hardware representation Binary digits (bits) Encoded instructions and data 4

Components of a Computer Same components for all kinds of computer Desktop, server, embedded Input/output includes User-interface devices - Display, keyboard, mouse Storage devices - Hard disk, CD/DVD, flash Network adapters - For communicating with other computers 5

Arithmetic for Computers Jo, Heeseung

Integer Addition Example: 7 + 6 Overflow if result out of range Adding +ve and -ve operands, no overflow Adding two +ve operands - Overflow if result sign is 1 Adding two -ve operands - Overflow if result sign is 0 7

Integer Subtraction Add negation of second operand Example: 7-6 = 7 + (-6) +7: 0000 0000 0000 0111-6: 1111 1111 1111 1010 +1: 0000 0000 0000 0001 Overflow if result out of range Subtracting two +ve or two -ve operands, no overflow Subtracting +ve from -ve operand - Overflow if result sign is 0 Subtracting -ve from +ve operand - Overflow if result sign is 1 8

Representation of Negative Numbers 9

Floating Point Representation for non-integral numbers Including very small and very large numbers Like scientific notation -2.34 10 56 +0.002 10-4 +987.02 10 9 normalized not normalized In binary ±1.xxxxxxx 2 2 yyyy Types float and double in C 10

Floating Point Standard Defined by IEEE Std 754, 1985 Developed in response to divergence of representations Portability issues for scientific code Now almost universally adopted Two representations Single precision (32-bit) Double precision (64-bit) 11

IEEE Floating-Point Format single: 8 bits double: 11 bits single: 23 bits double: 52 bits S Exponent Fraction x ( 1) S (1 Fraction) 2 (Exponent Bias) S: sign bit (0 non-negative, 1 negative) Normalize significand: 1.0 significand < 2.0 Always has a leading pre-binary-point 1 bit, so no need to represent it explicitly (hidden bit) Significand is Fraction with the "1." restored Exponent: excess representation: actual exponent + Bias Ensures exponent is unsigned Single: Bias = 127; Double: Bias = 1023 12

Floating-Point Example Represent -0.75-0.75 = (-1) 1 1.1 2 2-1 S = 1 Fraction = 1000 00 2 Exponent = -1 + Bias - Single: -1 + 127 = 126 = 01111110 2 - Double: -1 + 1023 = 1022 = 01111111110 2 Single: 1011 1111 0100 0 00 Double: 1011 1111 1110 1000 00 13

Floating-Point Example What number is represented by the single-precision float 11000000101000 00 S = 1 Fraction = 01000 00 2 Exponent = 10000001 2 = 129 x = (-1) 1 (1 + 01 2 ) 2 (129-127) = (-1) 1.25 2 2 = -5.0 14

Denormal Numbers Exponent = 000...0 hidden bit is 0 x ( 1) S (0 Fraction) 2 Bias Smaller than normal numbers allow for gradual underflow, with diminishing precision Denormal with fraction = 000...0 x ( 1) S (0 0) 2 Bias 0.0 Two representation s of 0.0! 15

Infinities and NaNs Exponent = 111...1, Fraction = 000...0 ±Infinity Can be used in subsequent calculations, avoiding need for overflow check Exponent = 111...1, Fraction 000...0 Not-a-Number (NaN) Indicates illegal or undefined result - e.g., 0.0 / 0.0 Can be used in subsequent calculations 16

The Processor Jo, Heeseung

Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects in common Early computers had very simple instruction sets Simplified implementation Many modern computers also have simple instruction sets 18

Translation and Startup Many compilers produce object modules directly Static linking 19

CPU Overview 20

Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup number of stages 21

Hazards Situations that prevent starting the next instruction in the next cycle Structure hazards A required resource is busy Data hazard Need to wait for previous instruction to complete its data read/write Control hazard Deciding on control action depends on previous instruction 22

Structure Hazards Conflict for use of a resource In MIPS pipeline with a single memory Load/store requires data access Instruction fetch would have to stall for that cycle - Would cause a pipeline "bubble" Hence, pipelined datapaths require separate instruction/data memories Or separate instruction/data caches 23

Data Hazards An instruction depends on completion of data access by a previous instruction add $s0, $t0, $t1 sub $t2, $s0, $t3 24

Forwarding (aka Bypassing) Use result when it is computed Don't wait for it to be stored in a register Requires extra connections in the datapath 25

Load-Use Data Hazard Can't always avoid stalls by forwarding If value not computed when needed Can't forward backward in time! 26

Code Scheduling to Avoid Stalls Reorder code to avoid use of load result in the next instruction C code for A = B + E; C = B + F; stall stall lw $t1, 0($t0) lw $t2, 4($t0) add $t3, $t1, $t2 sw $t3, 12($t0) lw $t4, 8($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 13 cycles lw $t1, 0($t0) lw $t2, 4($t0) lw $t4, 8($t0) add $t3, $t1, $t2 sw $t3, 12($t0) add $t5, $t1, $t4 sw $t5, 16($t0) 11 cycles 27

Control Hazards Branch determines flow of control Fetching next instruction depends on branch outcome Pipeline can't always fetch correct instruction - Still working on ID stage of branch In MIPS pipeline Need to compare registers and compute target early in the pipeline Add hardware to do it in ID stage 28

Stall on Branch Wait until branch outcome determined before fetching next instruction 29

Branch Prediction Longer pipelines can't readily determine branch outcome early Stall penalty becomes unacceptable Predict outcome of branch Only stall if prediction is wrong In MIPS pipeline Can predict branches not taken Fetch instruction after branch, with no delay 30

MIPS with Predict Not Taken Predictio n correct Predictio n incorre ct 31

More-Realistic Branch Prediction Static branch prediction Based on typical branch behavior Example: loop and if-statement branches - Predict backward branches taken - Predict forward branches not taken Dynamic branch prediction Hardware measures actual branch behavior - e.g., record recent history of each branch Assume future behavior will continue the trend - When wrong, stall while re-fetching, and update history 32

Large and Fast: Exploiting Memory Hierarchy Jo, Heeseung

Memory Technology Static RAM (SRAM) 0.5ns - 2.5ns, $2000 - $5000 per GB Dynamic RAM (DRAM) 50ns - 70ns, $20 - $75 per GB Magnetic disk 5ms - 20ms, $0.20 - $2 per GB Ideal memory Access time of SRAM Capacity and cost/gb of disk 34

Registers vs. Memory Qureshi (IBM Research) et al., Scalable High Performance Main Memory System Using Phase-Change Memory Technology, ISCA 2009. 35

Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon e.g., instructions in a loop, induction variables Spatial locality Items near those accessed recently are likely to be accessed soon E.g., sequential instruction access, array data 36

Taking Advantage of Locality Memory hierarchy Store everything on disk Copy recently accessed (and nearby) items from disk to smaller DRAM memory Main memory Copy more recently accessed (and nearby) items from DRAM to smaller SRAM memory Cache memory attached to CPU 37

Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level - Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level - Time taken: miss penalty - Miss ratio: misses/accesses = 1 - hit ratio Then accessed data supplied from upper level 38

Virtual Memory Use main memory as a "cache" for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main memory Each gets a private virtual address space holding its frequently used code and data Protected from other programs CPU and OS translate virtual addresses to physical addresses VM "block" is called a page VM translation "miss" is called a page fault 39

Address Translation Fixed-size pages (e.g., 4K) 40

Page Fault Penalty On page fault, the page must be fetched from disk Takes millions of clock cycles Handled by OS code Try to minimize page fault rate Fully associative placement Smart replacement algorithms 41

Page Tables Stores placement information Array of page table entries, indexed by virtual page number Page table register in CPU points to page table in physical memory If page is present in memory PTE stores the physical page number Plus other status bits (referenced, dirty, ) If page is not present PTE can refer to location in swap space on disk 42

Translation Using a Page Table 43

Mapping Pages to Storage 44

Fast Translation Using a TLB Address translation would appear to require extra memory references One to access the PTE Then the actual memory access But access to page tables has good locality So use a fast cache of PTEs within the CPU Called a Translation Look-aside Buffer (TLB) Typical: 16-512 PTEs, 0.5-1 cycle for hit, 10-100 cycles for miss, 0.01%-1% miss rate Misses could be handled by hardware or software 45

Fast Translation Using a TLB 46