EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts
|
|
- Junior Briggs
- 5 years ago
- Views:
Transcription
1 EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1
2 Classical concepts (prerequisite) 1. Instruction set architecture (ISA) 2. Pipelining 3. Cache memory 4. Virtual memory 5. DRAM This set of lectures is only for refreshing. You should already know this material; it is prerequisite for enrolling in this class S. Reda EN2910A FALL'15 2
3 Steps of program execution fetch PC decode instruction fetch Operands execute instruction store result update PC ISA Design Choices: What is instruction format / size? How is it decoded? Where are the operands located? What are their sizes? What is the size of the register file? How to access the memory? What are supported operations? How to determine the successor instruction? S. Reda EN2910A FALL'15 3
4 Instruction types 1. Memory transfer loads and stores 2. Arithmetic and logic: Arithmetic (e.g., add, mult) à could be integer / floating. Logic (e.g., or, and). 3. Control instructions: Jumps, conditional branches, jump and link CISC versus RISC instruction sets: CISC example: ADD [R1], [R2], [R3]. How many RISC instructions are needed to implement this CISC instruction? S. Reda EN2910A FALL'15 4
5 Examples of ISA operations Types Opcode Assembly code Meaning Comments Data Transfers LB, LH, LW, LD LW R1,#20(R2) R1<=MEM[(R2)+20] for bytes, half-words SB, SH, SW, SD SW R1,#20(R2) MEM[(R2)+20]<=(R1) words, and double words L.S, L.D L.S F0,#20(R2) F0<=MEM[(R2)+20] single/double float load S.S, S.D S.S F0,#20(R2) MEM[(R2)+20]<=(F0) single/double float store ALU operations ADD, SUB, ADDU, SUBU ADD R1,R2,R3 R1<=(R2)+(R3) add/sub signed or unsigned ADDI, SUBI, ADDIU, SUBIU ADDI R1,R2,#3 R1<=(R2)+3 add/sub immediate signed or unsigned AND, OR, XOR, AND R1,R2,R3 R1<=(R2).AND.(R3) bitwise logical AND, OR, XOR ANDI, ORI, XORI, ANDI R1,R2,#4 R1<=(R2).ANDI.4 bitwise AND, OR, XOR immediate SLT, SLTU SLT R1,R2,R3 R1<=1 if R2<R3 else R1<=0 SLTI, SLTUI SLTI R1,R2,#4 R1<=1 if R2<4 else R1<=0 test on R2,R3 outcome in R1, signed or unsigned comparison test R2 outcome in R1, signed or unsigned comparison S. Reda EN2910A FALL'15 5
6 Examples of ISA operations Types Opcode Assembly code Meaning Comments Branches/Jumps BEQZ, BNEZ BEQZ R1,label PC<=label if (R1)=0 conditional branch-equal 0/not equal 0 BEQ, BNE BNE R1,R2,label PC<=label if (R1)=(R2) conditional branchequal/not equal J J target PC<=target target is an immediate field JR JR R1 PC<=(R1) target is in register JAL JAL target R1<=(PC)+4; PC<=target jump to target after saving the return address in R31 Floating point ADD.S,SUB.S,MUL.S,DI V.S ADD.D,SUB.D,MUL.D,DI V.D ADD.S F1,F2,F3 F1<=(F2)+(F3) float arithmetic single precision ADD.D F0,F2,F4 F0<=(F2)+(F4) float arithmetic double precision S. Reda EN2910A FALL'15 6
7 Memory addressing modes MODE EXAMPLE MEANING REGISTER ADD R4,R3 reg[r4] <- reg[r4] +reg[r3] IMMEDIATE ADD R4, #3 reg[r4] <- reg[r4] + 3 DISPLACEMENT ADD R4, 100(R1) reg[r4] <- reg[r4] + Mem[100 + reg[r1]] REGISTER INDIRECT ADD R4, (R1) reg[r4] <- reg[r4] + Mem[reg[R1]] INDEXED ADD R3, (R1+R2) reg[r3] <- reg[r3] + Mem[reg[R1] + reg[r2]] DIRECT OR ABSOLUTE ADD R1, (1001) reg[r1] <- reg[r1] + Mem[1001] MEMORY INDIRECT ADD reg[r1] <- reg[r1] + Mem[Mem[Reg[3]]] POST INCREMENT ADD R1, (R2)+ ADD R1, (R2) then R2 <- R2+d PREDECREMENT ADD R1, -(R2) R2 <- R2-d then ADD R1, (R2) PC-RELATIVE BEZ R1, 100 if R1==0, PC <- PC+100 PC-RELATIVE JUMP 200 Concatenate bits of PC and offset S. Reda EN2910A FALL'15 7
8 Example of ISA (MIPS) encoding LW Rt, displacement(rs) SW Rt, displacement(rs) ADDI Rt, Rs, immediate BEQ Rt, Rs, offset ADD Rd, Rt, Rs J target JAL target S. Reda EN2910A FALL'15 8
9 Architectural state Determines everything about a processor: PC 32 registers Memory CLK CLK CLK PC' PC A RD Instruction Memory A1 A2 A3 WD3 WE3 Register File RD1 RD A RD Data Memory WD WE 32 S. Reda EN2910A FALL'15 9
10 Typical datapath (MIPS) S. Reda EN2910A FALL'15 10
11 Pipelining (IF/ID/EX/MEM/WB) Pipeline is a form of temporal parallelism Reducing the path of the critical path enable faster operation of clock à ideal speed up is achieved when pipeline is balanced Ideal CPI of 1; however, stalls, branches and cache memory misses increase CPI beyond 1 S. Reda EN2910A FALL'15 11
12 Pipeline operation abstraction S. Reda EN2910A FALL'15 12
13 Pipeline hazards Structural Problem: Not enough read/write data ports for the register file or caches Solution: stall or add more hardware resources Data Problem: Data dependency where the input of one instruction is dependent on a proceeding instruction(s) that has not written its results; aka, Read After Write (RAW) hazard. Solution: reorder instructions by compiler, stall or forward Control Deciding on next instruction to fetch depends on results of proceedings instructions in the pipeline Solution: stall or (branch prediction + speculative execution) S. Reda EN2910A FALL'15 13
14 Resolving data hazards by forwarding dependencies forward path Writing to register file in 1sthalf of cycle; reading in 2 nd half. S. Reda EN2910A FALL'15 14
15 Forwarding might not be possible all the time Forwarding not possible here S. Reda EN2910A FALL'15 15
16 Hazard avoidance by stalling and forwarding How to stall the pipeline? S. Reda EN2910A FALL'15 16
17 Circuit for forwarding Condition for forwarding: forward from either EX/MEM or MEM/WB pipeline registers if the destination register of either of these pipeline registers matches one of the sources of the ALU. Hazard detection and forward can lead to reduction in clock frequency S. Reda EN2910A FALL'15 17
18 Control hazards Results from branch evaluation are available at the end of cycle 3. Which instruction should be fetched in the second cycle? Solution: either stall or predict not taken and flush if necessary More on branch prediction + speculative execution later in class S. Reda EN2910A FALL'15 18
19 Memory hierarchy Technology cost / GB Access time Speed Cache Main Memory Virtual Memory Size SRAM ~ $10,000 ~ 1 ns DRAM ~ $100 ~ 100 ns Hard Disk ~ $1 ~ 10,000,000 ns Ideal memory: access time of SRAM with capacity and cost/gb of disk Exploit locality to make memory accesses fast: Temporal Locality: If data used recently, likely to use it again soon Spatial locality: If data used recently, likely to use nearby data soon S. Reda EN2910A FALL'15 19
20 Cache memory The level of the memory hierarchy closest to the CPU Fast (typically ~ 1 cycle access time) Made out of 6T SRAM cells If data is present in cache à hit; otherwise à miss à data must be copied in blocks (i.e., maybe multiple of words) from main memory or lower cache levels. Design goal: maximize cache memory hit ratio subject to latency and area constraints. Design Issues: Total size: #blocks and block size Designs: direct-mapped, fully associative and N-way associative. Write policies S. Reda EN2910A FALL'15 20
21 Direct-mapped cache memory Memory Address Tag Set 27 3 Byte Offset 00 V Tag Data 8-entry x ( )-bit SRAM = Location determined by address Direct mapped: only one choice (block address) modulo (#blocks in cache) #blocks is a power of 2 Hit Data S. Reda EN2910A FALL'15 21
22 Fully associative cache V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data V Tag Data tag byte hit selection lines 8x1 MUX Any block can be placed anywhere à no conflict misses Requires many tag comparators à Expensive to build S. Reda EN2910A FALL'15 22
23 N-way associative cache Memory Address Tag Set 28 2 Byte Offset 00 V Tag Way 1 Way 0 Data V Tag Data = = Hit 1 Hit Hit 1 Hit Data Aim: strike a balance between the hardware simplicity of direct-mapped cache and the flexibility of full associate cache S. Reda EN2910A FALL'15 23
24 Cache memory issues Write policies Write through Write back Replacement policies for associative cache designs Random Least recently used (LRU) S. Reda EN2910A FALL'15 24
25 Multi-level caches n Primary cache attached to CPU n Small, but fast n Level-2 cache services misses from primary cache n Larger, slower, but still faster than main memory n Main memory services L-2 cache misses n Some high-end systems include L-3 cache S. Reda EN2910A FALL'15 25
26 Virtual memory Each program uses virtual addresses Entire virtual address space stored on a hard disk. Subset of virtual address data in DRAM CPU translates virtual addresses into physical addresses Data not in DRAM is fetched from the hard disk Each program has its own virtual to physical mapping Two programs can use the same virtual address for different data Programs don t need to be aware that others are running One program (or virus) can t corrupt the memory used by another This is called memory protection S. Reda EN2910A FALL'15 26
27 Virtual to physical address translation Each application has its own page table, the address of which is stored in the page table register. Page could be in physical memory or on disk. If accessed and page on disk, an exception is raised and an OS handler transfers page to physical memory and updates page table. S. Reda EN2910A FALL'15 27
28 Translation look-side buffer (TLB) TLB: Small cache (access time 1 cycle) of most recent translations Small: accessed in < 1 cycle Typically entries Fully associative > 99 % hit rates typical Reduces memory access cycles for most loads & stores from 2 to 1 Virtual Address Virtual Page Number 0x Page Offset 47C 12 Entry 1 Entry 0 V Virtual Page Number Physical Page Number 1 0x7FFFD 0x x x7FFF V Virtual Page Number Physical Page Number TLB = = Hit 1 Hit Hit 1 Physical Hit S. Reda EN2910A FALL'15 Address 0x7FFF 47C
29 Virtual memory + cache chaining virtual address page offset 20 bits 12 bits 4 KB page size Translate TLB 20 bits 12 bits 18 bits 12 bits 2 bits Tag 16 KB cache size 4 bytes / block Direct mapped 4096 blocks hit data S. Reda EN2910A FALL'15 29
30 Main memory: DRAM DRAM is usually a shared resource among multiple processors, GPU and I/O devices à a controller (Northbridge in x86 systems) is need to coordinate the access 30
31 DRAM organization 1T bit cells à compact and few steps to fabricate à enable cheap, large memory. Reads are destructive; content must be restored after reading Capacitors are leaky so they must be periodically refreshed à contributes to the slow access of DRAMs Board busses from the processor to the DRAM are slow 31
32 Summary of background ISA design: instruction types, memory access modes, encoding choices Pipelining: provides speedup; complications: structural, data and control hazards; solutions: hazards detection with forwarding and/or stalling. Cache memory: size; designs (direct mapped, fully associative, N-way). Virtual memory: advantages; translation from virtual to physical. DRAM: slow à latency must be hidden by cache hierarchy S. Reda EN2910A FALL'15 32
The Evolution of Microprocessors. Per Stenström
The Evolution of Microprocessors Per Stenström Processor (Core) Processor (Core) Processor (Core) L1 Cache L1 Cache L1 Cache L2 Cache Microprocessor Chip Memory Evolution of Microprocessors Multicycle
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationEN1640: Design of Computing Systems Topic 06: Memory System
EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring
More informationENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013
ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationc. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?
Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined
More informationEE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes
NAME: STUDENT NUMBER: EE557--FALL 1999 MAKE-UP MIDTERM 1 Closed books, closed notes Q1: /1 Q2: /1 Q3: /1 Q4: /1 Q5: /15 Q6: /1 TOTAL: /65 Grade: /25 1 QUESTION 1(Performance evaluation) 1 points We are
More information--------------------------------------------------------------------------------------------------------------------- 1. Objectives: Using the Logisim simulator Designing and testing a Pipelined 16-bit
More informationENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design
ENGN1640: Design of Computing Systems Topic 03: Instruction Set Architecture Design Professor Sherief Reda http://scale.engin.brown.edu School of Engineering Brown University Spring 2014 Sources: Computer
More informationCS 4200/5200 Computer Architecture I
CS 4200/5200 Computer Architecture I MIPS Instruction Set Architecture Dr. Xiaobo Zhou Department of Computer Science CS420/520 Lec3.1 UC. Colorado Springs Adapted from UCB97 & UCB03 Review: Organizational
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA
CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationECE 15B Computer Organization Spring 2011
ECE 15B Computer Organization Spring 2011 Dmitri Strukov Partially adapted from Computer Organization and Design, 4 th edition, Patterson and Hennessy, Agenda Instruction formats Addressing modes Advanced
More informationA Model RISC Processor. DLX Architecture
DLX Architecture A Model RISC Processor 1 General Features Flat memory model with 32-bit address Data types Integers (32-bit) Floating Point Single precision (32-bit) Double precision (64 bits) Register-register
More informationComputer Architecture
CS3350B Computer Architecture Winter 2015 Lecture 4.2: MIPS ISA -- Instruction Representation Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design,
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the ISA. RISC Goals RISC: Simplify ISA Simplify CPU Design Better CPU Performance Motivated by simplifying
More informationComputer Architecture. The Language of the Machine
Computer Architecture The Language of the Machine Instruction Sets Basic ISA Classes, Addressing, Format Administrative Matters Operations, Branching, Calling conventions Break Organization All computers
More informationReminder: tutorials start next week!
Previous lecture recap! Metrics of computer architecture! Fundamental ways of improving performance: parallelism, locality, focus on the common case! Amdahl s Law: speedup proportional only to the affected
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count CPI and Cycle time Determined
More informationCISC 662 Graduate Computer Architecture. Lecture 4 - ISA MIPS ISA. In a CPU. (vonneumann) Processor Organization
CISC 662 Graduate Computer Architecture Lecture 4 - ISA MIPS ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,
More informationAdvanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017
Advanced Parallel Architecture Lessons 5 and 6 Annalisa Massini - Pipelining Hennessy, Patterson Computer architecture A quantitive approach Appendix C Sections C.1, C.2 Pipelining Pipelining is an implementation
More informationThe MIPS Instruction Set Architecture
The MIPS Set Architecture CPS 14 Lecture 5 Today s Lecture Admin HW #1 is due HW #2 assigned Outline Review A specific ISA, we ll use it throughout semester, very similar to the NiosII ISA (we will use
More informationECE 486/586. Computer Architecture. Lecture # 7
ECE 486/586 Computer Architecture Lecture # 7 Spring 2015 Portland State University Lecture Topics Instruction Set Principles Instruction Encoding Role of Compilers The MIPS Architecture Reference: Appendix
More informationLECTURE 3: THE PROCESSOR
LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationECE 486/586. Computer Architecture. Lecture # 8
ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Lecture Topics Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls
More informationCOMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition The Processor - Introduction
More informationChapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor The Processor - Introduction
More informationInstruction Set Architecture (ISA)
Instruction Set Architecture (ISA)... the attributes of a [computing] system as seen by the programmer, i.e. the conceptual structure and functional behavior, as distinct from the organization of the data
More informationComputer Science 61C Spring Friedland and Weaver. The MIPS Datapath
The MIPS Datapath 1 The Critical Path and Circuit Timing The critical path is the slowest path through the circuit For a synchronous circuit, the clock cycle must be longer than the critical path otherwise
More informationCS3350B Computer Architecture MIPS Instruction Representation
CS3350B Computer Architecture MIPS Instruction Representation Marc Moreno Maza http://www.csd.uwo.ca/~moreno/cs3350_moreno/index.html Department of Computer Science University of Western Ontario, Canada
More informationCOMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationComputer Architecture Spring 2016
Computer Architecture Spring 2016 Lecture 02: Introduction II Shuai Wang Department of Computer Science and Technology Nanjing University Pipeline Hazards Major hurdle to pipelining: hazards prevent the
More informationLecture Topics. Branch Condition Options. Branch Conditions ECE 486/586. Computer Architecture. Lecture # 8. Instruction Set Principles.
ECE 486/586 Computer Architecture Lecture # 8 Spring 2015 Portland State University Instruction Set Principles MIPS Control flow instructions Dealing with constants IA-32 Fallacies and Pitfalls Reference:
More informationCS3350B Computer Architecture Winter 2015
CS3350B Computer Architecture Winter 2015 Lecture 5.5: Single-Cycle CPU Datapath Design Marc Moreno Maza www.csd.uwo.ca/courses/cs3350b [Adapted from lectures on Computer Organization and Design, Patterson
More informationAnne Bracy CS 3410 Computer Science Cornell University. See P&H Chapter: , , Appendix B
Anne Bracy CS 3410 Computer Science Cornell University The slides are the product of many rounds of teaching CS 3410 by Professors Weatherspoon, Bala, Bracy, and Sirer. See P&H Chapter: 2.16-2.20, 4.1-4.4,
More informationLec 13: Linking and Memory. Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University. Announcements
Lec 13: Linking and Memory Kavita Bala CS 3410, Fall 2008 Computer Science Cornell University PA 2 is out Due on Oct 22 nd Announcements Prelim Oct 23 rd, 7:30-9:30/10:00 All content up to Lecture on Oct
More informationMemory Hierarchies. Instructor: Dmitri A. Gusev. Fall Lecture 10, October 8, CS 502: Computers and Communications Technology
Memory Hierarchies Instructor: Dmitri A. Gusev Fall 2007 CS 502: Computers and Communications Technology Lecture 10, October 8, 2007 Memories SRAM: value is stored on a pair of inverting gates very fast
More informationCS 351 Exam 2 Mon. 11/2/2015
CS 351 Exam 2 Mon. 11/2/2015 Name: Rules and Hints The MIPS cheat sheet and datapath diagram are attached at the end of this exam for your reference. You may use one handwritten 8.5 11 cheat sheet (front
More informationInstruction Set Architecture of. MIPS Processor. MIPS Processor. MIPS Registers (continued) MIPS Registers
CSE 675.02: Introduction to Computer Architecture MIPS Processor Memory Instruction Set Architecture of MIPS Processor CPU Arithmetic Logic unit Registers $0 $31 Multiply divide Coprocessor 1 (FPU) Registers
More informationLecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1
Lecture 3: The Processor (Chapter 4 of textbook) Chapter 4.1 Introduction Chapter 4.1 Chapter 4.2 Review: MIPS (RISC) Design Principles Simplicity favors regularity fixed size instructions small number
More informationComputer Organization and Structure
Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What
More informationCOSC 6385 Computer Architecture - Pipelining
COSC 6385 Computer Architecture - Pipelining Fall 2006 Some of the slides are based on a lecture by David Culler, Instruction Set Architecture Relevant features for distinguishing ISA s Internal storage
More informationISA: The Hardware Software Interface
ISA: The Hardware Software Interface Instruction Set Architecture (ISA) is where software meets hardware In embedded systems, this boundary is often flexible Understanding of ISA design is therefore important
More informationMidterm. Sticker winners: if you got >= 50 / 67
CSC258 Week 8 Midterm Class average: 4.2 / 67 (6%) Highest mark: 64.5 / 67 Tests will be return in office hours. Make sure your midterm mark is correct on MarkUs Solution posted on the course website.
More informationCOMPUTER ORGANIZATION AND DESIGN
COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified
More informationPipelining Analogy. Pipelined laundry: overlapping execution. Parallelism improves performance. Four loads: Non-stop: Speedup = 8/3.5 = 2.3.
Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup =2n/05n+15 2n/0.5n 1.5 4 = number of stages 4.5 An Overview
More informationInstruction Level Parallelism. Appendix C and Chapter 3, HP5e
Instruction Level Parallelism Appendix C and Chapter 3, HP5e Outline Pipelining, Hazards Branch prediction Static and Dynamic Scheduling Speculation Compiler techniques, VLIW Limits of ILP. Implementation
More informationCMSC411 Fall 2013 Midterm 1
CMSC411 Fall 2013 Midterm 1 Name: Instructions You have 75 minutes to take this exam. There are 100 points in this exam, so spend about 45 seconds per point. You do not need to provide a number if you
More informationComputer Architecture Review. Jo, Heeseung
Computer Architecture Review Jo, Heeseung Computer Abstractions and Technology Jo, Heeseung Below Your Program Application software Written in high-level language System software Compiler: translates HLL
More informationPipelining and Exploiting Instruction-Level Parallelism (ILP)
Pipelining and Exploiting Instruction-Level Parallelism (ILP) Pipelining and Instruction-Level Parallelism (ILP). Definition of basic instruction block Increasing Instruction-Level Parallelism (ILP) &
More informationDepartment of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri
Department of Computer and IT Engineering University of Kurdistan Computer Architecture Pipelining By: Dr. Alireza Abdollahpouri Pipelined MIPS processor Any instruction set can be implemented in many
More informationInstruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction
Instruction Level Parallelism ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction Basic Block A straight line code sequence with no branches in except to the entry and no branches
More informationMinimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline
Instruction Pipelining Review: MIPS In-Order Single-Issue Integer Pipeline Performance of Pipelines with Stalls Pipeline Hazards Structural hazards Data hazards Minimizing Data hazard Stalls by Forwarding
More informationChapter 4. The Processor
Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware 4.1 Introduction We will examine two MIPS implementations
More informationCOMPUTER ORGANIZATION AND DESI
COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler
More informationComputer Architecture Experiment
Computer Architecture Experiment Jiang Xiaohong College of Computer Science & Engineering Zhejiang University Architecture Lab_jxh 1 Topics 0 Basic Knowledge 1 Warm up 2 simple 5-stage of pipeline CPU
More informationInstruction Pipelining Review
Instruction Pipelining Review Instruction pipelining is CPU implementation technique where multiple operations on a number of instructions are overlapped. An instruction execution pipeline involves a number
More informationISA and RISCV. CASS 2018 Lavanya Ramapantulu
ISA and RISCV CASS 2018 Lavanya Ramapantulu Program Program =?? Algorithm + Data Structures Niklaus Wirth Program (Abstraction) of processor/hardware that executes 3-Jul-18 CASS18 - ISA and RISCV 2 Program
More informationProcessor. Han Wang CS3410, Spring 2012 Computer Science Cornell University. See P&H Chapter , 4.1 4
Processor Han Wang CS3410, Spring 2012 Computer Science Cornell University See P&H Chapter 2.16 20, 4.1 4 Announcements Project 1 Available Design Document due in one week. Final Design due in three weeks.
More informationWhat is Pipelining? Time per instruction on unpipelined machine Number of pipe stages
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationReduced Instruction Set Computer (RISC)
Reduced Instruction Set Computer (RISC) Reduced Instruction Set Computer (RISC) Focuses on reducing the number and complexity of instructions of the machine. Reduced number of cycles needed per instruction.
More informationEITF20: Computer Architecture Part2.2.1: Pipeline-1
EITF20: Computer Architecture Part2.2.1: Pipeline-1 Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Pipelining Harzards Structural hazards Data hazards Control hazards Implementation issues Multi-cycle
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationImproving Performance: Pipelining
Improving Performance: Pipelining Memory General registers Memory ID EXE MEM WB Instruction Fetch (includes PC increment) ID Instruction Decode + fetching values from general purpose registers EXE EXEcute
More informationProgrammable Machines
Programmable Machines Silvina Hanono Wachman Computer Science & Artificial Intelligence Lab M.I.T. Quiz 1: next week Covers L1-L8 Oct 11, 7:30-9:30PM Walker memorial 50-340 L09-1 6.004 So Far Using Combinational
More informationFlow of Control -- Conditional branch instructions
Flow of Control -- Conditional branch instructions You can compare directly Equality or inequality of two registers One register with 0 (>,
More information/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:
16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic
More informationInstructions: MIPS ISA. Chapter 2 Instructions: Language of the Computer 1
Instructions: MIPS ISA Chapter 2 Instructions: Language of the Computer 1 PH Chapter 2 Pt A Instructions: MIPS ISA Based on Text: Patterson Henessey Publisher: Morgan Kaufmann Edited by Y.K. Malaiya for
More informationEEM 486: Computer Architecture. Lecture 2. MIPS Instruction Set Architecture
EEM 486: Computer Architecture Lecture 2 MIPS Instruction Set Architecture EEM 486 Overview Instruction Representation Big idea: stored program consequences of stored program Instructions as numbers Instruction
More information3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?
CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:
More informationProcessor Architecture
Processor Architecture Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu SSE2030: Introduction to Computer Systems, Spring 2018, Jinkyu Jeong (jinkyu@skku.edu)
More informationProcessor Architecture. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
Processor Architecture Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Moore s Law Gordon Moore @ Intel (1965) 2 Computer Architecture Trends (1)
More informationInstruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4
PROBLEM 1: An application running on a 1GHz pipelined processor has the following instruction mix: Instruction Frequency CPI Load-store 55% 5 Arithmetic 30% 4 Branch 15% 4 a) Determine the overall CPI
More informationReview of instruction set architectures
Review of instruction set architectures Outline ISA and Assembly Language RISC vs. CISC Instruction Set Definition (MIPS) 2 ISA and assembly language Assembly language ISA Machine language 3 Assembly language
More informationInstruction Set Principles. (Appendix B)
Instruction Set Principles (Appendix B) Outline Introduction Classification of Instruction Set Architectures Addressing Modes Instruction Set Operations Type & Size of Operands Instruction Set Encoding
More informationData Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard
Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard Consider: a = b + c; d = e - f; Assume loads have a latency of one clock cycle:
More informationWhat is Pipelining? RISC remainder (our assumptions)
What is Pipelining? Is a key implementation techniques used to make fast CPUs Is an implementation techniques whereby multiple instructions are overlapped in execution It takes advantage of parallelism
More informationCENG 3420 Lecture 06: Datapath
CENG 342 Lecture 6: Datapath Bei Yu byu@cse.cuhk.edu.hk CENG342 L6. Spring 27 The Processor: Datapath & Control q We're ready to look at an implementation of the MIPS q Simplified to contain only: memory-reference
More informationComputer Organization and Structure. Bing-Yu Chen National Taiwan University
Computer Organization and Structure Bing-Yu Chen National Taiwan University Large and Fast: Exploiting Memory Hierarchy The Basic of Caches Measuring & Improving Cache Performance Virtual Memory A Common
More informationChapter 4 The Processor 1. Chapter 4A. The Processor
Chapter 4 The Processor 1 Chapter 4A The Processor Chapter 4 The Processor 2 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware
More informationCPU Pipelining Issues
CPU Pipelining Issues What have you been beating your head against? This pipe stuff makes my head hurt! L17 Pipeline Issues & Memory 1 Pipelining Improve performance by increasing instruction throughput
More informationMIPS ISA. 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support
Components of an ISA EE 357 Unit 11 MIPS ISA 1. Data and Address Size 8-, 16-, 32-, 64-bit 2. Which instructions does the processor support SUBtract instruc. vs. NEGate + ADD instrucs. 3. Registers accessible
More informationChapter 5. Memory Technology
Chapter 5 Large and Fast: Exploiting Memory Hierarchy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $0.20 $2 per
More informationECE331: Hardware Organization and Design
ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor
More informationCOMPSCI 313 S Computer Organization. 7 MIPS Instruction Set
COMPSCI 313 S2 2018 Computer Organization 7 MIPS Instruction Set Agenda & Reading MIPS instruction set MIPS I-format instructions MIPS R-format instructions 2 7.1 MIPS Instruction Set MIPS Instruction
More informationChapter 2. Instructions: Language of the Computer. Adapted by Paulo Lopes
Chapter 2 Instructions: Language of the Computer Adapted by Paulo Lopes Instruction Set The repertoire of instructions of a computer Different computers have different instruction sets But with many aspects
More information101 Assembly. ENGR 3410 Computer Architecture Mark L. Chang Fall 2009
101 Assembly ENGR 3410 Computer Architecture Mark L. Chang Fall 2009 What is assembly? 79 Why are we learning assembly now? 80 Assembly Language Readings: Chapter 2 (2.1-2.6, 2.8, 2.9, 2.13, 2.15), Appendix
More informationLecture 4: Instruction Set Architecture
Lecture 4: Instruction Set Architecture ISA types, register usage, memory addressing, endian and alignment, quantitative evaluation Reading: Textbook (5 th edition) Appendix A Appendix B (4 th edition)
More informationExamples of branch instructions
Examples of branch instructions Beq rs,rt,target #go to target if rs = rt Beqz rs, target #go to target if rs = 0 Bne rs,rt,target #go to target if rs!= rt Bltz rs, target #go to target if rs < 0 etc.
More informationIntroduction to the MIPS. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University
Introduction to the MIPS Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University Introduction to the MIPS The Microprocessor without Interlocked Pipeline Stages
More informationThe Processor: Datapath and Control. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
The Processor: Datapath and Control Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Introduction CPU performance factors Instruction count Determined
More informationCache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris
More informationCS 61C: Great Ideas in Computer Architecture Datapath. Instructors: John Wawrzynek & Vladimir Stojanovic
CS 61C: Great Ideas in Computer Architecture Datapath Instructors: John Wawrzynek & Vladimir Stojanovic http://inst.eecs.berkeley.edu/~cs61c/fa15 1 Components of a Computer Processor Control Enable? Read/Write
More informationCycle Time for Non-pipelined & Pipelined processors
Cycle Time for Non-pipelined & Pipelined processors Fetch Decode Execute Memory Writeback 250ps 350ps 150ps 300ps 200ps For a non-pipelined processor, the clock cycle is the sum of the latencies of all
More informationChapter 4. The Processor Designing the datapath
Chapter 4 The Processor Designing the datapath Introduction CPU performance determined by Instruction Count Clock Cycles per Instruction (CPI) and Cycle time Determined by Instruction Set Architecure (ISA)
More informationComputer Architecture. MIPS Instruction Set Architecture
Computer Architecture MIPS Instruction Set Architecture Instruction Set Architecture An Abstract Data Type Objects Registers & Memory Operations Instructions Goal of Instruction Set Architecture Design
More informationCPU Architecture and Instruction Sets Chapter 1
CPU Architecture and Instruction Sets Chapter 1 1 Is CPU Architecture Relevant for DBMS? CPU design focuses on speed resulting in a 55%/year improvement since 1987: If CPU performance in database code
More information