Architectures for Multimedia Systems - Code: (073335) Prof. C. SILVANO 5th July 2010

Similar documents
COGNOME NOME MATRICOLA

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

Course on Advanced Computer Architectures

Corso Architetture Avanzate dei Calcolatori

Course on Advanced Computer Architectures

Advanced Parallel Architecture Lessons 5 and 6. Annalisa Massini /2017

Final Exam Fall 2007

CIS 662: Midterm. 16 cycles, 6 stalls

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

4. What is the average CPI of a 1.4 GHz machine that executes 12.5 million instructions in 12 seconds?

Instruction-Level Parallelism and Its Exploitation

Page # CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Michela Taufer

Course on Advanced Computer Architectures

LECTURE 10. Pipelining: Advanced ILP

CISC 662 Graduate Computer Architecture Lecture 13 - CPI < 1

Advanced Instruction-Level Parallelism

The Processor Pipeline. Chapter 4, Patterson and Hennessy, 4ed. Section 5.3, 5.4: J P Hayes.

CS Mid-Term Examination - Fall Solutions. Section A.

Page 1. CISC 662 Graduate Computer Architecture. Lecture 8 - ILP 1. Pipeline CPI. Pipeline CPI (I) Pipeline CPI (II) Michela Taufer

Processor (IV) - advanced ILP. Hwansoo Han

Thomas Polzer Institut für Technische Informatik

DYNAMIC INSTRUCTION SCHEDULING WITH SCOREBOARD

COMPUTER ORGANIZATION AND DESI

Instruction Level Parallelism. Appendix C and Chapter 3, HP5e

DYNAMIC AND SPECULATIVE INSTRUCTION SCHEDULING

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

Instruction word R0 R1 R2 R3 R4 R5 R6 R8 R12 R31

Chapter 4 The Processor 1. Chapter 4D. The Processor

ECEC 355: Pipelining

The Processor: Instruction-Level Parallelism

COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface. 5 th. Edition. Chapter 4. The Processor

Pipelining and Exploiting Instruction-Level Parallelism (ILP)

Advanced Computer Architecture CMSC 611 Homework 3. Due in class Oct 17 th, 2012

ECE154A Introduction to Computer Architecture. Homework 4 solution

Instruction Frequency CPI. Load-store 55% 5. Arithmetic 30% 4. Branch 15% 4

ECE260: Fundamentals of Computer Engineering

Real Processors. Lecture for CPSC 5155 Edward Bosworth, Ph.D. Computer Science Department Columbus State University

Multiple Instruction Issue. Superscalars

Pipelining: Hazards Ver. Jan 14, 2014

Good luck and have fun!

Chapter 4. The Processor

1 Hazards COMP2611 Fall 2015 Pipelined Processor

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

Pipelining and Caching. CS230 Tutorial 09

Chapter 7. Digital Design and Computer Architecture, 2 nd Edition. David Money Harris and Sarah L. Harris. Chapter 7 <1>

Instruction Level Parallelism. ILP, Loop level Parallelism Dependences, Hazards Speculation, Branch prediction

Department of Computer and IT Engineering University of Kurdistan. Computer Architecture Pipelining. By: Dr. Alireza Abdollahpouri

Exploiting ILP with SW Approaches. Aleksandar Milenković, Electrical and Computer Engineering University of Alabama in Huntsville

Advanced d Instruction Level Parallelism. Computer Systems Laboratory Sungkyunkwan University

Static vs. Dynamic Scheduling

COSC 6385 Computer Architecture - Pipelining (II)

ECE 4750 Computer Architecture, Fall 2017 T05 Integrating Processors and Memories

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

Static, multiple-issue (superscaler) pipelines

CS433 Midterm. Prof Josep Torrellas. October 16, Time: 1 hour + 15 minutes

Pipelining concepts The DLX architecture A simple DLX pipeline Pipeline Hazards and Solution to overcome

EECC551 Exam Review 4 questions out of 6 questions

Pipelining. CSC Friday, November 6, 2015

EITF20: Computer Architecture Part3.2.1: Pipeline - 3

EN164: Design of Computing Systems Topic 08: Parallel Processor Design (introduction)

Pipeline Hazards. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Midnight Laundry. IC220 Set #19: Laundry, Co-dependency, and other Hazards of Modern (Architecture) Life. Return to Chapter 4

INSTRUCTION LEVEL PARALLELISM

Pipelining! Advanced Topics on Heterogeneous System Architectures. Politecnico di Milano! Seminar DEIB! 30 November, 2017!

Chapter 4. Instruction Execution. Introduction. CPU Overview. Multiplexers. Chapter 4 The Processor 1. The Processor.

Solutions to exercises on Instruction Level Parallelism

Metodologie di Progettazione Hardware-Software

EE557--FALL 1999 MAKE-UP MIDTERM 1. Closed books, closed notes

( ) תשס"ח סמסטר ב' May, 2008 Hugo Guterman Web site:

Full Datapath. Chapter 4 The Processor 2

LECTURE 3: THE PROCESSOR

Final Exam Spring 2017

References EE457. Out of Order (OoO) Execution. Instruction Scheduling (Re-ordering of instructions)

4. The Processor Computer Architecture COMP SCI 2GA3 / SFWR ENG 2GA3. Emil Sekerinski, McMaster University, Fall Term 2015/16

CS 341l Fall 2008 Test #2

CS/COE1541: Introduction to Computer Architecture

ארכי טק טורת יחיד ת עיבוד מרכזי ת

COSC4201 Instruction Level Parallelism Dynamic Scheduling

Chapter 4. Advanced Pipelining and Instruction-Level Parallelism. In-Cheol Park Dept. of EE, KAIST

COSC 6385 Computer Architecture - Pipelining

Performance Evaluation CS 0447

Data Hazards Compiler Scheduling Pipeline scheduling or instruction scheduling: Compiler generates code to eliminate hazard

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

COMPUTER ORGANIZATION AND DESIGN. 5 th Edition. The Hardware/Software Interface. Chapter 4. The Processor

ECE 505 Computer Architecture

Computer Architecture Computer Science & Engineering. Chapter 4. The Processor BK TP.HCM

ILP concepts (2.1) Basic compiler techniques (2.2) Reducing branch costs with prediction (2.3) Dynamic scheduling (2.4 and 2.5)

Determined by ISA and compiler. We will examine two MIPS implementations. A simplified version A more realistic pipelined version

Lecture-13 (ROB and Multi-threading) CS422-Spring

CS3350B Computer Architecture Quiz 3 March 15, 2018

Chapter 4. The Processor

Dynamic Branch Prediction

Minimizing Data hazard Stalls by Forwarding Data Hazard Classification Data Hazards Present in Current MIPS Pipeline

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

Chapter 4. The Processor

Structure of Computer Systems

Updated Exercises by Diana Franklin

Page 1. Recall from Pipelining Review. Lecture 15: Instruction Level Parallelism and Dynamic Execution

CS433 Homework 3 (Chapter 3)

Transcription:

Architectures for Multimedia Systems - Code: 078650 (073335) Prof. C. SILVANO 5th July 2010 SURNAME NAME STUDENT ID (MATRICOLA) EMAIL EXERCISE 1 - PIPELINE Given the following loop expressed in a high level language: do { VETTB[i] = VETTA[i]; VETTC[i] = VETTA[i] if ( vetta[i] >= 0 ) { } i++; } while (i!= N) vettd[i] = vetta[i]; Il programma sia stato compilato nel codice assembly MIPS riportato nella seguente tabella. Si supponga che i registri $t6, e $t7 siano stati inizializzati rispettivamente ai valori 0 e N. I simboli VETTA, VETTB, VETTC e VETTD sono costanti a 16 bit, prefissate. La frequenza di clock del processore vale 1 GHz. Si consideri una generica iterazione del ciclo eseguita dal processore MIPS in modalità pipeline a 5 stadi.

a) Assuming there are NO optimisation in the pipeline: 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. Num. stalls INSTRUCTION C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard Type DO: lw $t2,vetta($t6) IF ID EX ME WB sw $t2, VETTB($t6) IF ID EX ME WB sw $t2, VETTC($t6) IF ID EX ME WB slt $t0,$t2,$0 IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) IF ID EX ME WB INC: addi $t6,$t6,4 IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS =

b) Assuming there are the following optimisations in the pipeline - In the Register File it is possible the read and write at the same address in the same clock cycle; - Forwarding - Computation of PC e TARGET ADDRESS for branch & jump instructions anticipated in the ID stage 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. 3. Identify in the last column the forwarding path used Num. ISTRUCTION C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard stalls Type DO: lw $t2,vetta($t6) IF ID EX ME WB sw $t2, VETTB($t6) IF ID EX ME WB sw $t2, VETTC($t6) IF ID EX ME WB slt $t0,$t2,$0 IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) IF ID EX ME WB INC: addi $t6,$t6,4 IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB FORWARDING PATH NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS = Asymptotic SpeedUp with respect to the first case: SpeedUp AS =

c) Assuming there are the previous optimisations in the pipeline with static branch prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with BRANCH TARGET BUFFER. 1. Identify the RAW (Read After Write) data hazards and control hazards. 2. Identify the number of stalls to be inserted before each instruction (or between the stage IF and ID of each instruction) necessary to solve the harzards. 3. Identify in the Static Branch Prediction (Taken/Not Taken) Num. stalls ISTRUCTION PRED. T/NT DO: lw $t2,vetta($t6) - IF ID EX ME WB sw $t2, VETTB($t6) - IF ID EX ME WB sw $t2, VETTC($t6) - IF ID EX ME WB slt $t0,$t2,$0 - IF ID EX ME WB bne $t0, $0, INC IF ID EX ME WB sw $t2,vettd($t6) - IF ID EX ME WB INC: addi $t6,$t6,4 - IF ID EX ME WB bne $t6,$t7, DO IF ID EX ME WB END: IF ID EX ME WB NOTA: slt $t0,$t2,$0 # if $t2 < $0 then set $t0 = 1 otherwise $t0 =0; C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12 C13 C14 C15 C16 C17 Hazard Type Assuming that (vetta[i] >= 0)in the 50% of the cases: Average Instruction Count: IC AVE = Average Number of Stalls: STALL AVE = Asymptotic CPI (N ): CPI AS = Asymptotic Throughput expressed in MIPS (N ): MIPS AS = Asymptotic SpeedUp with respect to the first case: SpeedUp AS =

EXERCISE 2: SCOREBOARD Assuming the program be executed by a CPU with dynamic scheduling based on SCOREBOARD with: 2 LOAD/STORE units (LDU1, LDU2) with Latency 4 2 ALU/BR/J units (ALU1, ALU2) with latency 2. Check structural hazards in ISSUE phase Check RAW hazads in READ OPERANDS phase Check WAR e WAW in WRITE BACK phase Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer a) Assuming the case (vetta[i] >= 0), fill in the following table: ISTRUCTION PRED. ISSUE READ EXECUTION WRITE HAZARDS UNIT T/NT OPERANDS COMPLETE BACK DO: lw $t2,vetta($t6) - 1 2 6 7 LDU1 sw $t2,vettb($t6) - 2 7 11 12 RAW $t2 solved with FW LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:

EXERCISE 3 - TOMASULO Assuming the program be executed by a CPU with dynamic scheduling based on TOMASULO algorithm: 2 RESERVATION STATIONS (RS1, RS2) + 2 LOAD/STORE UNITS (LDU1, LDU2) with latency 4 2 RESERVATION STATIONS (RS3, RS4) + 2 ALU/BR/J UNITS (ALU1, ALU2) with latency 2 Check structural hazards for RS in ISSUE phase Check RAW hazads and Check structural hazards for FUs in START EXECUTE phase WRITE RESULT in RS and RF Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer a) Assuming the case (vetta[i] >= 0), fill in the following table: INSTRUCTION PRED ISSUE START WRITE HAZARDS RS UNIT T/NT EXEC RESULTS DO: lw $t2,vetta($t6) - 1 2 6 RS1 LDU1 sw $t2,vettb($t6) - 2 6 10 RAW $t2 solved with FW RS2 LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:

EXERCISE 4 - TOMASULO Assuming the program be executed by a CPU with dynamic scheduling based on TOMASULO algorithm: 4 RESERVATION STATIONS (RS1, RS2, RS3, RS4) + 2 LOAD/STORE UNITS (LDU1, LDU2) with latency 4 4 RESERVATION STATIONS (RS5, RS6, RS7l, RS8) + 2 ALU/BR/J UNITS (ALU1, ALU2) with latency 2 Check structural hazards for RS in ISSUE phase Check RAW hazads and Check structural hazards for FUs in START EXECUTE phase WRITE RESULT in RS and RF Forwarding Static Branch Prediction BTFNT (BACKWARD TAKEN FORWARD NOT TAKEN) with Branch Target Buffer b) Assuming the case (vetta[i] >= 0), fill in the following table: INSTRUCTION PRED ISSUE START WRITE HAZARDS RS UNIT T/NT EXEC RESULTS DO: lw $t2,vetta($t6) - 1 2 6 RS1 LDU1 sw $t2,vettb($t6) - 2 6 10 RAW $t2 solved with FW RS2 LDU2 sw $t2,vettc($t6) - slt $t0, $t2, $0 - bne $t0,$0, INC sw $t2,vettd($t6) - INC: addi $t6, $t6, 4 - bne $t6, $t7, DO END:

EXERCISE 5 VLIW and TRACE SCHEDULING 1. Define the term basic-block in the context of assembly code: 2. Consider the following pseudo-mips code sequence:... load $r2, 0($r1) /* Instruction A */ add $r4, $r5, $r6 /* Instruction B, $r4 is destination */ add $r8, $r6, $r4 /* Instruction C, $r8 is destination */ blt $r2, #1000, L1 /* Instruction D, branch */ load $r3, 0($r2) /* Instruction E */... 2.1 Given that instructions A... D are not a target of any branch in the program, we can safely say that there are two basic blocks; indicate, in the following control flow graph, the content of the two basic blocks by using the labels defined for each instruction (A..E).

2.2 Let us consider a VLIW machine with only two slots. The first slot is dedicated to load/store instructions while the second slot is dedicated to branches/alu instructions. Ideally, L/S functional units are used for 2 cycles while ALU functional unit take only 1 cycle. Branch instructions can be considered as occupying the branch unit for 2 cycles. 2.2.1 Draw the dependency graph of instructions in basic block 1 (use directed arrows to indicate RAW or control dependencies among instructions): 2.2.2 Indicate the length of the critical path of basic block 1 in terms of clock cycles:

2.2.3 Fill up the resource reservation table and the VLIW instruction schedule for the instructions of basic block 1 (as always use the instruction label defined before): Resource Reservation Table VLIW Code Schedule Cycle L/S Arith U. Branch U. Cycle L/S Slot ALU/Branch Slot 1 1 2 2 3 3 4 4 5 5 6 6 7 7 2.2.3 How many cycles does it take to complete the sequence of instructions A...E? 2.2.4 Assume that the code sequence is the most likely sequence found in an execution trace of the program and indicate which trace-scheduling move can be made to improve the execution time of the instructions A...E:

2.2.5 Is it a safe move? 2.2.6 Compute the speedup of the completion of the code sequence by considering the above move: 2.2.7 Indicate if you need any additional architectural support for enabling the above move: