Practice Assignment 1

Size: px

Start display at page:

Download "Practice Assignment 1"

Cecil Townsend
5 years ago
Views:

1 German University in Cairo Practice Assignment 1 Dr. Haytham El Miligi Ahmed Hesham Mohamed Khaled Lydia Sidhom Assume that in a given program: 1 Performance Metrics 1.1 IPC and CPI % of instructions are loads and 20% of instructions following load depend on its result are stalled for 1 cycle 2. 20% of instructions are branches with 60% being taken The penalty is 2 cycles if the branch is not taken, and it is 3 cycles if the branch is taken. What is the IPC and CPI? CP I total = 1 + ΣCP I CP I load = = 0.03 CP I branch = = 0.52 CP I = = 1.55 IP C =

2 1.1.2 A machine M runs at 3Ghz. When running a program P, it has a CPI of 1.5. How many instructions will be executed, during 1 second while running program P. 1 1 frequency = ClockRate ClockRate = IP C = 1/ s = 1 IP C ClockRate CP Uclockcyclesforprogram CP I = s Instructions = InstructionCount InstructionCount = RISC processors and pipe-lining Suppose that there is a benchmarking program on a RISC processor and the average instruction mix is as follows: 2

3 Instruction Average Frequency Load 26% Store 9% Arithmetic 14% Compare 13% Conditional Branch 16% Unconditional Branch 1% Call/Return 2% Shift 4% Logical 9% Other 6% The CPI for instruction types are as follows: Instruction Type Average CPI All ALU 1 Load/Store 1.4 Taken Conditional Branch 2 Untaken conditional Branch 1.5 Jumps 1.2 Assume that 60% of the conditional branches are taken and that all instructions in the other category are ALU instructions. What are the CPI and IPC of the benchmark on this RISC machine? CP I = IP C = 1 CP I

4 1.3 Pipeline Throughput A processor M-5 has a five-stage pipeline and a clock cycle time of 10 ns. A newer implementation of the same instruction set in a processor M-7 uses a seven-stage pipeline and a cycle time of 7.5 ns Define the maximum throughput of both processors and identify which processor has a higher possible throughput. Maximum throughput occurs when IP C = 1 therefore: Thus, M7 has a higher throughput. M 5 = 1 10ns = 0.1 M 7 = 1 7.5ns = Consider a loop of five instructions (four arithmetic instructions followed by a branch to the beginning of the loop). There is a dependency between two consecutive arithmetic instructions in the loop. This dependency induces a 1 cycle stall in M-5 and a 2 cycle stall in M-7. The branch induces a 2 cycle stall on M-5 and a 4 cycle stall on M-7. Which processor executes the loop faster in steady state (i.e. the pipeline is not executing for the first time, it is already executing for a while). M 5 is better. EX M5 = ( ) 10ns = 80ns EX M7 = ( ) 7.25ns = 82.5ns Assume now that the loop is unrolled once, that is, instead of n iterations of four instructions and a branch, we have now n/2 iterations of eight instructions and a branch. Instead of one data dependency per loop, we now have two data dependencies per unrolled loop. Which processor executes the unrolled loop faster? 4

5 EX M5 = ( ) 10ns = 130ns EX M7 = ( ) 7.25ns = 127.5ns M 7 is better. 1.4 SpeedUp With sequential execution occurring 15% of the time: What is the maximum speedup with an infinite number of processors? SpeedUp = s + p s + p n 1 Note: lim n n = 0 s = 0.15 p = 1 s = 0.85 SpeedUp = = How many processors are required to be within 20% of the maximum speedup? SpeedUp n = s + p s + p n n = 23 = (1 0.2) 6.67 = How many processors are required to be within 2% of the maximum speedup? SpeedUp n = s + p s + p n n = 278 = (1 0.02) 6.67 =

6 2 Extra Questions 2.1 A compiler designer is trying to decide between two code sequences for a particular computer. The hardware designers have supplied the fact that the computer has three different classes of instructions: Class A, which require 1 cycle Class B, which require 2 cycle Class C, which require 3 cycle For a particular high-level language statement, the compiler writer is considering two code sequences that require the following instruction counts: Sequence 1 Sequence 2 1) 2x Class A instructions 1) 4x Class A instructions 2) 1x Class B instruction 2) 1x Class B instruction 3) 2x Class C instructions 3) 1x Class C instruction Which code sequence executes the most instructions? Sequence 1 : = 5 instructions Sequence 2 : = 6 instructions Which code sequence will be faster? Cycles 1 : (2 1) + (1 2) + (2 3) = 10 cycles Cycles 2 : (4 1) + (1 2) + (1 3) = 9 cycles Thus sequence 2 is faster What is the CPI for each sequence? 6

7 CP I 1 : CPU cycles # of instructions = 10 5 = 2 CP I 2 : 9 6 = A program runs in 10 seconds on computer A, which has a 2 GHz clock. We are trying to help a computer designer build a computer B, which will run this program in 6 seconds. The designer has determined that a substantial increase in the clock rate is possible, but this increase will affect the rest of the CPU design, causing computer B to require 1.2 times as many clock cycles as computer A for this program. What clock rate should we tell the designer to target? CPU Cycles CPU time = Clock Rate CP U A : CPU cycles = cycles CP U B : CPU A cycles 1.2 Clock Rate B = 6 4Ghz 2.3 Suppose we have two implementations of the same ISA: Implementation 1: Clock cycle time of 250ps and CPI of 2 Implementation 2: Clock cycle time of 500ps and CPI of 1.2 Which implementation is faster for this program and by how much? CP U A time : 2 C 250 = 500 C ps CP U B time : 1.2 C 500 = 600 C ps Gain : Performance A Performance B = Execution Time A Execution Time B = 600 C 500 C = 1.2 7

8 2.4 Suppose we developed a new, simpler processor that has 85% of the capacitive load of the more complex older processor. Further, assume that it has adjustable voltage so that it can reduce voltage 15% compared to the older processor, which results in a 15% shrink in frequency. What is the impact on dynamic power? Power new = (0.85 C) (0.85 V 2 ) (0.85 F ) Power old C V 2 = F Thus the newer processor uses about 60% less power. 2.5 Calculate the dynamic energy consumed by a program when executed on a multicore microprocessor. Assume that the program instructions are equally divided between the cores the average CPI calculated for every core is the same. P ower dynamic = Capacitive Load Voltage 2 Frequency Energy dynamic = P ower dynamic CPU Time program n n is # of cores CPU Time program = CPU Time core = CP I core C core Clock Frequency = Energy dynamic = Capacitive Load Voltage 2 Frequency Energy dynamic = Capacitive Load Voltage 2 CP I core C CP I core C n Clock Frequency CP I core C n Clock Frequency In this case, the dynamic energy consumed by the program is independent from the clock frequency and the number of core. 8

5008: Computer Architecture HW#2

5008: Computer Architecture HW#2 1. We will now support for register-memory ALU operations to the classic five-stage RISC pipeline. To offset this increase in complexity, all memory addressing will be