ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

Similar documents
Corso Architetture Avanzate dei Calcolatori

Course on Advanced Computer Architectures

COGNOME NOME MATRICOLA

Architectures for Multimedia Systems - Code: (073335) Prof. C. SILVANO 5th July 2010

Course on Advanced Computer Architectures

Course on Advanced Computer Architectures

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

Final Exam Fall 2007

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

CIS 662: Midterm. 16 cycles, 6 stalls

Instruction-Level Parallelism and Its Exploitation

Advanced Instruction-Level Parallelism

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

LECTURE 10. Pipelining: Advanced ILP

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

Processor (IV) - advanced ILP. Hwansoo Han

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

ECE154A Introduction to Computer Architecture. Homework 4 solution

The Processor: Instruction-Level Parallelism

Computer Architecture CS372 Exam 3

Final Exam Fall 2008

EECC551 Exam Review 4 questions out of 6 questions

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Multiple Instruction Issue. Superscalars

CMSC411 Fall 2013 Midterm 2 Solutions

Multi-cycle Instructions in the Pipeline (Floating Point)

Static, multiple-issue (superscaler) pipelines

Pipelining. CSC Friday, November 6, 2015

CISC 662 Graduate Computer Architecture. Lecture 10 - ILP 3

Thomas Polzer Institut für Technische Informatik

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

CS2100 Computer Organisation Tutorial #10: Pipelining Answers to Selected Questions

Final Exam Spring 2017

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

CS 61C: Great Ideas in Computer Architecture Pipelining and Hazards

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

ארכי טק טורת יחיד ת עיבוד מרכזי ת

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Chapter 4 The Processor 1. Chapter 4D. The Processor

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Good luck and have fun!

( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

ECE 505 Computer Architecture

CS Mid-Term Examination - Fall Solutions. Section A.

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

CS/CoE 1541 Exam 1 (Spring 2019).

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

Full Datapath. Chapter 4 The Processor 2

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

Metodologie di Progettazione Hardware-Software

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Lecture-13 (ROB and Multi-threading) CS422-Spring

Pipelining and Caching. CS230 Tutorial 09

Dynamic Scheduling. Better than static scheduling Scoreboarding: Tomasulo algorithm:

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

HY425 Lecture 05: Branch Prediction

INSTRUCTION LEVEL PARALLELISM

Structure of Computer Systems

Static vs. Dynamic Scheduling

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Instruction Level Parallelism

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Chapter 4. The Processor

CS/COE1541: Introduction to Computer Architecture

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

Recall from Pipelining Review. Lecture 16: Instruction Level Parallelism and Dynamic Execution #1: Ideas to Reduce Stalls

Solutions to exercises on Instruction Level Parallelism

ECEC 355: Pipelining

CMSC 411 Practice Exam 1 w/answers. 1. CPU performance Suppose we have the following instruction mix and clock cycles per instruction.

Super Scalar. Kalyan Basu March 21,

CS 110 Computer Architecture. Pipelining. Guest Lecture: Shu Yin. School of Information Science and Technology SIST

LECTURE 3: THE PROCESSOR

CPE 631 Lecture 11: Instruction Level Parallelism and Its Dynamic Exploitation

CPE 631 Lecture 10: Instruction Level Parallelism and Its Dynamic Exploitation

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

CS 341l Fall 2008 Test #2

Reduction of Data Hazards Stalls with Dynamic Scheduling So far we have dealt with data hazards in instruction pipelines by:

Lec 25: Parallel Processors. Announcements

Alexandria University

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

COSC 6385 Computer Architecture - Pipelining

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

ELE 818 * ADVANCED COMPUTER ARCHITECTURES * MIDTERM TEST *

Preventing Stalls: 1

COSC 6385 Computer Architecture - Pipelining (II)

CSE502 Lecture 15 - Tue 3Nov09 Review: MidTerm Thu 5Nov09 - Outline of Major Topics

PIPELINING: HAZARDS. Mahdi Nazm Bojnordi. CS/ECE 6810: Computer Architecture. Assistant Professor School of Computing University of Utah

Transcription:

ADVANCED COMPUTER ARCHITECTURES: 088949 Prof. C. SILVANO Written exam 11 July 2011 SURNAME NAME ID EMAIL SIGNATURE EX1 (3) EX2 (3) EX3 (3) EX4 (5) EX5 (5) EX6 (4) EX7 (5) EX8 (3+2) TOTAL (33) EXERCISE 1 PIPELINE BASIC (3 points) Given the following loop expressed in a high level language: do { BASEC[i] = BASEA[i] + BASEB[i] + BASEC[i]; i++; } while (i!= N) The program has been compiled in MIPS assembly code assuming that registers $t6 and $t7have been initialized with values 0 and N respectively. The symbols VETTA, VETTB and VETTC are 16-bit constant. The processor clock frequency is 1 GHz. Let us consider the loop executed by 5-stage pipelined MIPS processor without any optimization in the pipeline Identify the RAW (Read After Write) data hazards by marking in RED and control hazards in BLUE Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the hazards.

Num Stalls Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 Hazard Type 3 DO: lw $t2,basea($t6) IF ID EX M WB CNTR lw $t3,baseb($t6) IF ID EX M WB lw $t4,basec($t6) IF ID EX M WB 2 add $t3,$t2,$t3 IF ID EX M WB RAW $t2 RAW $t3 3 add $t4,$t4,$t3 IF ID EX M WB RAW $t3 RAW $t4 3 sw $t4,basec($t6) IF ID EX M WB RAW $t4 addi $t6, $t6, 4 IF ID EX M WB 3 bne $t6,$t7,do IF ID EX M WB RAW $t6 Express the formula then calculate the following metrics: Instruction Count (IC): 8 Number of stalls per iteration: 14 CPI per iteration: CPI = # cycles / IC = (IC+ # stalls + 4) /IC = (8 + 14 +4) / 8 = 3.25 Throughput (expressed in MIPS) per iteration: MIPS = f CLOCK / CPI * 10 6 = (10 9 ) / (CPI * 10 6 ) = 10 3 /3.25 = 308 Asymptotic CPI (N cycles) : CPI AS = (IC + # stalls) / IC = (8+14) /8 = 2.75 Asymptotic Throughput (expressed in MIPS) (N cycles): MIPS AS = f CLOCK / CPI AS * 10 6 = (10 9 ) / (CPI AS * 10 6 ) = 10 3 /2.75 = 364

EXERCISE 2 PIPELINE OPTIMIZATIONS (3 points) Assuming there are the following optimisations in the pipeline - In the Register File it is possible the read and write at the same address in the same clock cycle; - Forwarding - Computation of PC and TARGET ADDRESS for branch & jump instructions anticipated in the ID stage 1. Identify the RAW (Read After Write) data hazards and the control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the hazards. 3. Identify in the last column the forwarding path used Num Stalls Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 Hazard Type Forwarding Path 1 DO: lw $t2,basea($t6) IF ID EX M WB CNTR lw $t3,baseb($t6) IF ID EX M WB lw $t4,basec($t6) IF ID EX M WB add $t3,$t2,$t3 IF ID EX M WB RAW $t2 $t3 MEM-EX add $t4,$t4,$t3 IF ID EX M WB RAW $t3 $t4 MEM-EX sw $t4,basec($t6) IF ID EX M WB RAW $t4 MEM-MEM addi $t6, $t6, 4 IF ID EX M WB 1 bne $t6,$t7,do IF ID EX M WB RAW $t6 Express the formula then calculate the following metrics: Instruction Count (IC): 8 Number of stalls per iteration: 2 Asymptotic CPI (N cycles) : CPI AS = (IC + # stalls) / IC = (8+2) /8 = 1.25 Asymptotic Throughput (expressed in MIPS) (N cycles): MIPS AS = f CLOCK / CPI AS * 10 6 = (10 9 ) / (CPI AS * 10 6 ) = 10 3 /1.25 = 800 Calculate the speedup with respect to the previous case (EX. 1): CPI AS1 /CPI AS2 = 2.75 / 1.25 = 2.2

EXERCISE 3 PIPELINE WITH DATA CACHE MISSES (3 points) We assume that, in the previously scheduled and optimized program, each DATA READ access in the MEM phase generates a DATA CACHE MISS requiring 2 stalls to access the memory. Draw the pipeline scheme by inserting the stalls due to the memory accesses and to data and control hazards still remaining in the code: Instruction C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 C18 C19 C20 C21 C22 C23 DO: lw $t2,basea($t6) S IF ID EX M S S WB lw $t3,baseb($t6) IF ID EX S S M S S WB lw $t4,basec($t6) IF ID S S EX S S M S S WB add $t3,$t2,$t3 IF S S ID S S EX S S M WB add $t4,$t4,$t3 S S IF S S ID S S EX M WB sw $t4,basec($t6) S S IF S S ID EX M WB addi $t6, $t6, 4 S S IF ID EX M WB bne $t6,$t7,do IF S ID EX M WB Express the formula then calculate the following metrics: Instruction Count (IC): 8 Number of stalls per iteration: 8 Asymptotic CPI (N cycles) : CPI AS = (IC + # stalls) / IC = (8+8) /8 = 2 Asymptotic Throughput (expressed in MIPS) (N cycles): MIPS AS = f CLOCK / CPI AS * 10 6 = (10 9 ) / (CPI AS * 10 6 ) = 500 Calculate the performance lost with respect to the previous case (EX. 2): = CPI AS3 /CPI AS2 = 2 / 1.25 = 1.6

EXERCISE 4: SCOREBOARD (5 points) Assuming the program be executed by a CPU with dynamic scheduling based on SCOREBOARD with: 2 LOAD/STORE units (LDU1, LDU2) with latency 5 2 ALU/BR/J units (ALU1, ALU2) with latency 2. Check structural hazards in ISSUE phase Check RAW hazards and RF READ in READ OPERANDS phase Check WAR e WAW and RF WRITE in WRITE BACK phase Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer Please complete the SCOREBOARD TABLE by assuming all cache HITS and considering ONE ITERATION: INSTRUCTION PRED. ISSUE READ EXECUTION WRITE HAZARDS TYPE Forwarding UNIT T / NT OPERANDS COMPLETE BACK DO: lw $t2,basea($t6) - 1 2 7 8 LDU1 lw $t3,baseb($t6) - 2 3 8 9 LDU2 lw $t4,basec($t6) - 9 10 15 16 STRUCT LDU1 LDU1 add $t3,$t2,$t3-10 11 13 14 ALU1 add $t4,$t4,$t3-11 16 18 19 RAW $t3 RAW $t4 Forw $t3 $t4 ALU2 sw $t4,basec($t6) - 12 19 24 25 RAW $t4 Forw $t4 LDU2 addi $t6, $t6, 4-15 17 19 20 STRUCT ALU1 + RF READ (WAR $t6 OK) ALU1 bne $t6,$t7,do T 20 21 23 24 STRUCT ALU2 ALU2 (*) RF WRITE does not occur because the sw writes in Data Cache Express the formula then calculate the following metrics: CPI = # cycles / IC = 25 / 8 = 3.125 IPC = 1 / CPI = 0.32 To avoid structural hazards, please indicate how many LOAD/STORE UNITS are needed: _4 and how many ALU/BR/J UNITS are needed: _4

EXERCISE 5 TOMASULO (5 points) We assume the original program be executed on CPU with dynamic scheduling based on TOMASULO algorithm with: 2 RESERVATION STATIONS (RS1, RS2) + 2 LOAD/STORE units (LDU1, LDU2) with latency 5 2 RESERVATION STATIONS (RS3, RS4) + 2 ALU/BR FU (ALU1,ALU2) with latency 2 Check STRUCTURAL hazards for RS in ISSUE phase Check RAW hazards and Check STRUCTURAL hazards for FUs in START EXECUTE phase WRITE RESULT in RS and RF Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer Please complete the TOMASULO TABLE by assuming all cache HITS and considering ONE ITERATION: ISTRUZIONE ISSUE START WRITE Hazards Type RSi UNIT EXEC RESULT DO: lw $t2,basea($t6) 1 2 7 RS1 LDU1 lw $t3,baseb($t6) 2 3 8 RS2 LDU2 lw $t4,basec($t6) 8 9 14 Struct RS1 RS1 LDU1 add $t3,$t2,$t3 9 10 12 RS3 ALU1 add $t4,$t4,$t3 10 15 17 RAW $t3 + RAW$t4 RS4 ALU2 sw $t4,basec($t6) 11 18 23 RAW $t4 RS2 LDU2 addi $t6, $t6, 4 13 14 16 Struct RS3 RS3 ALU1 bne $t6,$t7,do 17 18 20 Struct RS3 RS3 ALU1 Express the formula then calculate the following metrics: CPI = # cycles / IC = 23 / 8 = 2.875 IPC = 1 / CPI = 0.35 Calculate the speedup with respect to the first version of Scoreboard (EX 4): Speedup = (Exec. Time Scoreboard 6) / (Exec. Time Tomasulo) = 25 / 23 = 1.09

EXERCISE 6: PERFORMANCE OF THE MEMORY HIERARCHY (4 points) Write the formula for the average memory access time: Write the formula for the average memory access time for L1 and L2 caches: Provide the definition for Local Miss Rate and Global Miss rate:

EXERCISE 7: STATIC SCHEDULING (5 points) Describe the main concepts of static scheduling to manage ILP (Instruction Level Parallelism): Draw and briefly describe the architecture of a 2-issue VLIW processor:.

Explain the main advantages of VLIW architectures: What are the main disadvantages of VLIW architectures:

EXERCISE 8: MEMORY HIERARCHY (3 points) Explain the main advantages of introducing a SECOND LEVEL CACHE (L2 CACHE) : OPTIONAL PART (2 points) Explain the concepts and benefits of introducing a VICTIM CACHE: