Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm

Size: px
Start display at page:

Download "Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm"

Transcription

1 Second Semester, Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts in this homework. Part A includes questions that aim to help you with understanding the lecture materials. They resemble the kind of questions you will encounter in quizzes and the final exam. This part is an individual portion and you should complete this part by yourself. Part B and C are group portions. You may work in groups of up to 3 for these parts. Part B asks you to examine cache performance in real world processors. You should try to run it on as many different processors as you can find and try to analyze their cache behaviors. Part C contains an open-ended project. It is open-ended by nature, meaning there is no right-wrong answers. The following summarize the 3 parts. Part Type Indv/Grp Grading A Basic problem set Individual Graded on correctness B Hands-on Group of 2 to 3 Graded on effort C Mini-project Group of 2 to 3 Graded on effort In all cases, you are encouraged to discuss the homework problems offline or online using Piazza. However, you should not ask for or give out solution directly as that defeat the idea of having homework exercise. Giving out answers or copying answers directly will likely constitute an act of plagiarism.

2 Homework 2, Part A Part A: Problem Set A.1 Cache Access Consider the following sequence of memory accesses to the main memory in a 32-bit processor: Address (hex) ABCD1234 ABCD11D4 ABCD122C 1BCD1234 ABCD1220 ABCD15D8 1BCD11A4 AB0D C45A3C ABCD15C4 77C45A20 ABCD163C ABCD11A4 77C45A38 AB0D1228 Type W W W W W W A.1.1 Assume the following data cache organization: Capacity: 1 KiB Line size: 8 words Organization: direct map Policy: write back, write allocate Trace through the above memory access and answer the following: (i) For each access, is it a hit or a miss? (ii) Show the final content in the cache, including the tag. For sake of simplicity, assume the content of a memory address is the same as its address, i.e., mem[x] = X A.1.2 epeat A.1.1 but with a different cache: Capacity: 1 KiB Line size: 8 words Organization: 2-way set associative Policy: true LU, write back, write allocate r1.0 Page 2 of 16

3 Homework 2, Part A A.1.3 epeat A.1.1 but with a different cache: Capacity: 1 KiB Line size: 8 words Organization: direct map Policy: write through, no write allocate A.1.4 Given the above sequence of memory access, assume you can change the cache line size and cache organization, what is the minimum capacity of cache that is needed to result in the minimum number of misses? What is the minimum number of misses, line size, and the resulting cache organization? Explain your answer? A.2 Cache Performance You are evaluating the performance of the cache subsystem of a processor. The initial design of the processor has the following cache: Separate instruction and data cache Cache hit time is 1 cycle Cache miss penalty is 300 cycles. A.2.1 Focusing on the instruction cache in this part. The instruction cache has a miss rate of 5 %. What is the average memory access time (AMAT) of the instruction cache? A.2.2 The data cache has a miss rate of 15 %, what is the AMAT of the data cache? A.2.3 After profiling a program B, you find the following percentage of dynamic instructions: ALU Jump/Branch Load/Store 40 % 20 % 40 % Assuming CPI of all branch/jump instructions are 3 and CPI of ALU instructions are 1. Also assume for this part that the instruction cache is perfect, i.e., miss rate is 0 %. What is the overall CPI of program B? A.2.4 Now, consider the realistic instruction cache with 5 % miss rate as mentioned in A.2.1. Assume when the I-cache miss, the entire processor pipeline is stalled while the memory subsystem fetches data from main memory. Since there is only 1 external DAM, if both I-cache and D-cache misses at the same cycle, the 2 memory accesses have to take place sequentially. With these assumption, what is the average CPI of program B taking into account the possibility of both I-cache and D-cache miss? A.2.5 Upon investigation, you realize that 90 % of the load/store instructions (i.e., 90 % 40 % of the original dynamic instructions) are indeed accessing memory data that can be recalculated within the CPU. Specifically, each of these load/store instruction can be replaced by 100 ALU instructions plus 20 Jump/Branch instructions. If all of these load/store instructions are converted into ALU and Jump/Branch instructions, would it improve performance of B? If so, how much performane improvement can be achieved? If not, is there a maximum number of ALU+Jump/Branch instructions that can be used to replace these recalculatable load/store instructions to make the program faster? r1.0 Page 3 of 16

4 Homework 2, Part A A.3 Instruction Cache Adapted from final exam 2015 You are desigining the intruction cache for a new 32-bit processor. Because of other hardware constraints, your cache design must meet the following criteria: Virtual address space is 32-bit wide. Word size is 32 bits. Cache must be indexed and tagged with virtual addresses. Line size must be 8 words The number of bits to index the cache must be 8 bits. Page size is 4 KiB Translation from virtual to physical address is controlled by the OS and the page mapping is pseudo-random. Process ID is 4-bit wide. All compiled programs start running from address 0x The OS performs a context switch every 256 instructions. A.3.1 In order to differentiate data from different processes, you have decided to concatenate the process ID to the virtual address tag in the cache. Assume you are using a direct-map cache, what is the minimum width of the combined tag? bits A.3.2 Assume you are using a direct-map cache organization. What is the maximum possible capacity you can use in this processor given the above constraints? At this maximum capacity, what is the total number of bits required for tag storage in the cache? A.3.3 Your machine starts with an empty instruction cache. The following program is run as Process 1 (with process ID = 1): 0x _start: addi a1, a1, 1 # instruction 0 0x addi a3, zero, x000401FC bne a1, a3, _start # instruction 127: NOT taken on 4th time Process 1 begins execution for 256 instructions. Then it is stopped. There is no branch instruction in the first 128 instructions except for the last bne. The bne instruction takes the branch the first 3 times it is encountered. The branch condition is false on the 4th time it is executed. After Process 1 has stopped by the OS on context swicth (after it has executed 256 instructions), how many hits and misses have occurred in the instruction cache? Briefly explain your answer. A.3.4 Following up from the previous part, after Process 1 is stopped, the OS switches in Process 2. Process 2 is the exact same program as Process 1 except it is started by a different user. Therefore, it executes the exact same code as the above part except with a process ID = 2. Process 2 is again stopped after 256 instructions. How many hits and misses will have occurred in the instruction cache due to running Process 2? Briefly explain your answer. r1.0 Page 4 of 16

5 Homework 2, Part A A.3.5 Since there are only 2 processes running, the process switching between Process 1 and Process 2 continues: Process 1 is again switched in after Process 2. It continues its execution for another 256 instructions. By now, Process 1 has completed 512 instructions since it started. Process 2 is switched in place of Process 1 and run for 256 instructions. When the two processes run in the processor for the second time, is the instruction hit rate changed when compared to your answers in A.3.3 and A.3.4? If the hit rate has changed, explain how is it different. If the hit rate remains the same, explain why is it the case. A.3.6 In order to improve performance of instruction cache, you now have the following proposals: Proposal A B C D E Description Change to 2-way set associative organization while keeping the same capacity. Virtually tagged with Process ID. Change to 4-way set associative organization while keeping the same capacity. Virtually tagged with Process ID. Change to a physically tagged cache; direct map; flush the cache on context switch. Change to a physically tagged cache; direct map; DO NOT flush the cache on context switch; OS ensures processes never mapped to the same physical location. Change to a physically tagged cache; direct map; DO NOT flush the cache on context switch; Also, OS map the instructions to the same physical page. That is, for example, address 0x of both Process 1 and Process 2 will be mapped to the same physical location 0xA In each of these cases, explain if the proposal will result in better performance given the 2 processes above. Explain why and why not. r1.0 Page 5 of 16

6 Homework 2, Part A A.4 Branch Predictor & Branch Target Buffer As a chief architect for a new processor, you are considering the use of a branch predictor (BP) together with branch history table (BHT) and a branch target buffer (BTB). You will use the following particular important benchmark program conv to evaluate your processor. Comments are pseudo code describing the function of the instruction. Each line begins with its address in hex on the left of. The code is in ISV-V assembly language, and the macro %hi and %lo returns the corresponding part of the base address of its parameter (imgout and imgin in this case). A00 conv: lui s4,%hi(imgout) A04 addi s4,s4,%lo(imgout) A08 li s2,0 A0C li s3,256 A10.L10: mv s1,s4 A14 li s0,0 A18.L11: mv a1,s0 A1C mv a0,s2 A20 jal ra, k A24 sw a0,0(s1) A28 addi s0,s0,1 A2C addi s1,s1,4 A30 bne s0,s3,.l11 A34 addi s2,s2,1 A38 addi s4,s4,1024 A3C bne s2,s0,.l10 A40 li a0,0 A44 exit # program ends here A48 k: mv a5,a0 A4C beqz a0,.l3 A50 li a4,255 A54 beq a0,a4,.l3 A58 beqz a1,.l3 A5C beq a1,a4,.l3 A60 slli a5,a0,8 A64 add a5,a5,a1 A68 lui a4,%hi(imgin) A6C addi a4,a4,%lo(imgin) A70 slli a5,a5,2 A74 add a5,a5,a4 A78 lw a0,0(a5) A7C slli a0,a0,1 A80 jr ra A84.L3: li a0,0 A88 jr ra r1.0 Page 6 of 16

7 Homework 2, Part A The following shows a C code equivalent of conv. The funciton conv performs a 2D-convolution action on the input image imgin. The convolution kernel is implemented in k is simply returns the value of the input image at position (r, c). A more realistic kernel will be used in Part C no fthis homework. // 2D convolution with simple kernel #define N 256 int imgin[n][n]; int imgout[n][n]; int k(int r, int c) { if (r == 0 r == (N-1) c == 0 c==(n-1)) { return 0; } return imgin[r][c] * 2; } int f() { int c = 0; int r = 0; for (r = 0; r < N; r++) { for (c = 0; c < N; c++) { imgout[r][c] = k(r,c); } } return 0; } r1.0 Page 7 of 16

8 Homework 2, Part A A.4.1 In the following table, trace the first 15 outcome of the branch/jump instructions. Time flows from left to right. Each column represents the outcome of one branch/jump event. The first few columns have been filled in for you as an example. Mark Y if a branch is taken, N if a branch is not taken. Mark Y for a jump as well. 0xA20 (jal) 0xA30 (bne) 0xA3C (bne) 0xA4C (beqz) 0xA54 (beq) 0xA58 (beqz) 0xA5C (beq) 0xA80 (jr) 0xA88 (jr) A.4.2 Based on your result froma.4.1, assume you implement a branch predictor that predicts branches are always taken, what are the branch misprediction rate (prediction is wrong) for the branch instructions? A.4.3 If the 2-bit predictor as shown in class was used for each branch location and assume that it starts with predict branch not taken, what is the misprediction rate for the bne instruction at 0xA3C? A.4.4 If you refer back to the C code, you will see that the value N corresponds to the size of input/output image. If N increases by 128 times, how would that change your misprediction rate of the always predict taken and the 2-bit predictor compared to your answers above? Which predictor is better? r1.0 Page 8 of 16

9 Homework 2, Part A A.4.5 Assume you have a perfect BTB, which includes one entry for each branch/jump instruction. Fill in the target address for each of the entry after executing conv. Instruction Address Target Address.. A.4.6 Due to hardware constraints, you nove have only 4 entries for BTB. As a result, you have decided to use the lower 2 bits of word address (i.e., bit 3 and bit 2 of instructin address) to index this BTB. Show the final content of the 4 entries in the BTB. Index Instruction Address Target Address r1.0 Page 9 of 16

10 Homework 2, Part B Part B: Hands-on Exercise B.1 Micro-Benchmarking Cache Performance In this exercise, you will perform a mirco-benchmarking of the cache system of real machines. The core of the benchmarking program is the following loop: for(stride= STIDE_MIN; stride <= STIDE_MAX; stride = stride << 1) { for(i=0; i < asize; i += stride) { array[i] = array[i]; } } By carefully examining the time it takes to access elements of an array with different strides, it is possible to deduce information about the cache system such as its cache size, associativity, block size, etc. B.1.1 Obtain the File On tux-1, the file is included in the archive for homework 2 (see part C). To test on another machine, you can also download the file from: B.1.2 Compilation On a Linux machine, such as on tux-1.eee.hku.hk, compile the program as follows: gcc -o mbench -O0 mbench.c -lrt If you want to, on an OSX machine (You will need Xcode), you can compile the program as follows: gcc -o mbench -O0 mbench.c In both cases, note that the switch in the middle of the line is minus capital-oh Zero. B.1.3 Execute the compiled program by issuing the command: >./mbench By default, the results will be printed to the screen. If you want to save the results to a file, while seeing the output at the same time, you can pipe the output to the tee command as follows: r1.0 Page 10 of 16

11 Homework 2, Part B >./mbench tee output.log With the above command, the output will be printed to the screen while saving to the file output.log. B.1.4 Output Comma Separated Values Be default, the output of mbench is in human readable form. To make plotting the results easier, run the program with the -c switch as follows: >./mbench -c B.1.5 Plotting esults To analyze the micro-benchmark, it is easiest to plot the results. You can plot the CSV file using gnuplot, Excel or Matlab. Your results should look similar to the one below: (a) Full ange (b) Small Arrays Only Your plots should have the stride size on the x-axis, and access time on y-axis. Each series of plot represents the results of one array size. Your x-axis should have a base 2 log scale as your stride size increases exponentially as power of 2. Hint: it may be useful to produce separate plots for very large and very small array sizes so you can clearly observe the pattern in cache access time. B.1.6 Submission Submit your plot(s) for tux-1.eee.hku.hk. If you have access to other Linux or OSX machine, you may compile mbench there and submit results from different processors as well. The results are a lot clearer on simple/older processors than on modern multi-core processors. If you have access to an embedded system such as a aspberry Pi, you will be able to obtain much better results. B.1.7 Analysis From your plots, there are a lot you can learn about the memory hierarchy. By analyzing the plots, try to deduce the following: How many levels of cache does your processor has? For each level of cache, what is its block size, capacity, and associativity? What is the hit time of each level of cache? What is the page fault time? r1.0 Page 11 of 16

12 Homework 2, Part B You may not be able to deduce all the information. In fact, modern processors have many advanced features that may obscure your analysis above. If you have access to a machine with older processors, the effect will be easier to see. See hints below. B.1.8 Hints on analyzing the results Consider a simple example to begin understanding the effect of stride size and array size on the number of cache hits/misses. ecall that all elements of your array are being accessed in strides, and the process is repeated many times. As a result, the number of cache hits/misses is not going to be due to compulsory misses. Now, consider a cache with 16 words capacity, then ask yourself the following questions: If the array size is smaller than 16 words, how many hit/misses will you get. Consider an array with 32 words. Start with a direct map cache, with 1 word block size, how many hits/misses do you get when you increase the stride size from 1 word to 16 words? Now, if the cache is a 2-way set associative cache, how many hits/misses do you get when you increase the stride size from 1 word to 16 words? What if the cache is 4-way set associative? Can you observe a pattern in the change in number of hits/misses when the associativity increases? Now repeat the above exercise with block size of 2 words and 4 words. When does a change in block size have effect on the number of hits/misses you experience? With the above observations, go back and analyze your plot results. Do you observe similar change in hit/miss time? From these results, you can deduce certain features of the cache. B.1.9 Submission Submit your analysis of the results for at least 2 systems, including tux-1.eee.hku.hk. If you have tried the same exercise on additional processors, you may submit their analysis too. Compare your results with information you can find about the processor from online resources and note any differences. r1.0 Page 12 of 16

13 Homework 2, Part C Part C: Open-ended Project C.1 Adapting Cache Characteristics In this exercise, you will investigate and try to improve the cache performance of a set of benchmarks. To evaluate the benchmark performance, you will be using a ISC-V ISA simulator called spike. The spike simulator simulates the behavior of the ISC-V ISA with limited hardware implementation details. It features a built-in cache simulator that captures every memory access generated from the processor and collect statistics accordingly. C.1.1 The ISA simulator is already installed on tux-1.eee.hku.hk. If you prefer to run the simulator on your own machine, you need to get the latest ISC-V toolchain source code from: To obtain the files for homework 2, perform the following on tux-1.eee.hku.hk: tux-1$ cd ~ tux-1$ tar xzvf ~elec3441/elec3441hw2.tar.gz tux-1$ cd elec3441hw2 tux-1$ export HW2OOT=$PWD In the downloaded file you will find different benchmark programs located in its own individual directory. C.1.2 Compiling Benchmark Programs The ISA simulator may execute any valid ISC-V program. If you examine the source code in each subdirectory, you will notice they are no different from any other normal program. Feel free to write your own benchmark if you are curious. You must setup your environment correctly to make use of the ISC-V toolchain. On tux-1, you can use the following command: tux-1$. ~elec3441/elec3441hw2.bashrc To compile the provided benchmark, you may either perform a make command in each director, or make use of the top-level makefile that is provided for you: tux-1$ cd ${HW2OOT}/benchmarks tux-1$ make r1.0 Page 13 of 16

14 Homework 2, Part C C.1.3 unning Simulator You may now execute your compiled program using the spike ISA simulator. Execute the target binary with an L1 instruction cache as follows: tux-1$ cd ${HW2OOT}/benchmarks/kmean tux-1$ spike --ic=128:2:64 pk kmean Veirification passed! I$ Bytes ead: I$ Bytes Written: 0 I$ ead Accesses: I$ Write Accesses: 0 I$ ead Misses: 489 I$ Write Misses: 0 I$ Writebacks: 0 I$ Miss ate: 0.001% tux-1$ In the above line, the last argument kmean specifies the ISC-V binary that you are simulating using spike. The argument pk before that stands for proxy kernel, and it tells spike to use the native Linux kernel to handle any system calls. Finally, the argument --ic=128:2:64 tells spike to simulate an instruction cache that has 128 entries 2-way set associative 64-byte block If you multiply the three parameters, you get the capacity of the instruction as = 16 KiB. You can also specify the use of a data cache with the argument --dc=<s>:<w>:<b> and a unified L2 cache with the argument --l2=<s>:<w>:<b>. For example: tux-1$ spike --ic=128:2:64 pk... tux-1$ spike --ic=128:2:64 --dc=128:2:64 pk... tux-1$ spike --ic=128:2:64 --dc=128:2:64 --l2=1024:4:64 pk... C.1.4 Cache Evaluation Now, run all the benchmark programs with the following 2 memory hierarchies, one with L2 and the other without L2 cache: --ic=128:2:64 --dc=128:2:64 --ic=128:2:64 --dc=128:2:64 --l2=1024:4:64 Collect statistics about the memory hierarchies from the output and answer the following questions. 1. For each benchmark, what is the miss rate for L1 I$, L1 D$ and L2$? Which benchmark has the best and which one has the worst cache performance? Hint: Consider automating the process, as you will probably need to regenerate a lot similar statistics in the rest of this homework. 2. What is the cache access time for the 3 caches according to Table C.1? 3. Based on your above answer, what is the cycle time of the pipeline? Assuming on a cache hit, data should be returned in 1 cycle. Also assume that critical path of all non-memory pipeline stages is 600 ps. In other word, your cycle time is limited by the cache if the cache access time is longer than 600 ps. Otherwise, cache access is not the bottleneck. 4. Calculate the average CPI for the benchmarks without L2 cache. You can use the following formula to calculate average CPI, where MP = Miss Penalty, CT = Cycle Time. Assume the backside of r1.0 Page 14 of 16

15 Homework 2, Part C L1 caches are connected to a single DAM with 100 ns access time. Use CPI base = 1.2. CPI = CPI base + #L1_I$ misses + #L1_D$ misses #instructions MP CT Note that the number of instruction is the I$ ead Accesses number in the output of spike. 5. What is the AMAT (in ns) of the L2$ for all benchmarks? Use the following formula to calculate AMAT. (HT=Hit Time, M=Miss ate, MP=Miss Penalty) Assume the backside of the L2$ is connected to a DAM with 100 ns access time. Note that L2 AMAT is the time calculated after L1 access. So HT is the time to access L2 cache after L1, M is the local L2 miss rate, etc. AMAT L2 = HT L2 + M L2 MP L2 6. Calculate the average CPI for the benchmarks with L2 cache. Use the following formula to calculate CPI, assuming the L2$ is running asynchronously on its own clock domain. Use the same base CPI as above. CPI = CPI base + #L1_I$ misses + #L1_D$ misses #instructions 7. Based on your answers above, does the L2$ help with performance? AMATL2 CT C.1.5 Optimal L1 Data Cache Configuration Find the optimal L1 D$ configuration that maximizes performance of the benchmarks. You can assume L1 I$ is perfect and does not affect processor performance. There is no L2$. Pick one configuration from the following design space: Total capacity: up to 1 MiB Associativity: 1, 2, 4, 8, 16, 32, 64 Cache line size: 32, 64 Note the following: Your cache organization may affect your processor cycle time. Your architecture requires that L1 cache returns data in 1 cycle. Use geometric mean of normalized performance of the benchmark programs as the overall performance metric. Hint: You should write a program or a spreadsheet program like Excel to help you find the optimal L1 configuration. emember, you need to compute the geometric mean of C.1.6 Optional: Improved Benchmarks Now, given your optimal L1 cache obtained from above, try to implement a faster version of the slowest benchmark program than the provided implementation. You may use any modifications to data structure or code, as long as the same calculation is still performed at runtime, i.e., you cannot simply hard code the answer at compile time. Evaluate the improvements you make using the geometric mean of speedups that you achieve over the baseline that you determined in the previous question. C.1.7 Submission Submit your answer to C.1.4, C.1.5, and the optional part in C.1.6. answer concise, but make sure you have data to support your analysis. Make your r1.0 Page 15 of 16

16 Homework 2, Part C (a) cache line size = 32 B assoc \ size 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB N/A N/A N/A N/A N/A N/A (b) cache line size = 64 B assoc \ size 8KB 16KB 32KB 64KB 128KB 256KB 512KB 1MB N/A N/A N/A N/A N/A N/A N/A N/A N/A N/A Table C.1: Cache access time (in ns) for various cache configurations in 45 nm technology. Data obtained from CACTI. r1.0 Page 16 of 16

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm

Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Second Semester, 2015 16 Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through

More information

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Second Semester, 2016 17 Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Instruction: Submit your answers electronically through

More information

ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12

ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12 Homework 3, Part ELEC3441: Computer Architecture Second Semester, 2015 16 Homework 3 (r1.1) r1.1 Page 1 of 12 A.1 Cache Access Part A: Problem Set Consider the following sequence of memory accesses to

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

ELEC3441: Computer Architecture Second Semester, Homework 1 (r1.2) SOLUTION. r1.2 Page 1 of 7

ELEC3441: Computer Architecture Second Semester, Homework 1 (r1.2) SOLUTION. r1.2 Page 1 of 7 Homework 1, Part ELEC3441: Computer Architecture Second Semester, 2015 16 Homework 1 (r1.2) r1.2 Page 1 of 7 A.1 Iron Law A.1.1 A.1.2 Part A: Problem Set #cycles 1 140 + 8 80 + 4 30 CP I A = = = 3.6 #instructions

More information

Homework 1 (r1.0) Due: Part (A) Feb, 2018, 11:55pm Part (B) Feb, 2018, 11:55pm

Homework 1 (r1.0) Due: Part (A) Feb, 2018, 11:55pm Part (B) Feb, 2018, 11:55pm Second Semester, 2017 18 Homework 1 (r1.0) Due: Part (A) -- 28 Feb, 2018, 11:55pm Part (B) -- 28 Feb, 2018, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts

More information

Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm

Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm Second Semester, 2016 17 Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm Instruction: Submit your answers electronically through

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:Solution...................... Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM Covered topics: 1) pipeline, hazards, and instruction scheduling. 2) pipeline implementation. 3) Cache Organization and

More information

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000 The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 513 01 Test II November 14, 2000 Name: 1. (5 points) For an eight-stage pipeline, how many cycles does it take to

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached

More information

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#:

Computer Science and Engineering 331. Midterm Examination #1. Fall Name: Solutions S.S.#: Computer Science and Engineering 331 Midterm Examination #1 Fall 2000 Name: Solutions S.S.#: 1 41 2 13 3 18 4 28 Total 100 Instructions: This exam contains 4 questions. It is closed book and notes. Calculators

More information

CS152 Computer Architecture and Engineering. Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

CS152 Computer Architecture and Engineering. Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 CS152 Computer Architecture and Engineering Assigned 2/28/2018 Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 http://inst.eecs.berkeley.edu/~cs152/sp18 The problem

More information

CS/COE1541: Introduction to Computer Architecture

CS/COE1541: Introduction to Computer Architecture CS/COE1541: Introduction to Computer Architecture Dept. of Computer Science University of Pittsburgh http://www.cs.pitt.edu/~melhem/courses/1541p/index.html 1 Computer Architecture? Application pull Operating

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

CS232 Final Exam May 5, 2001

CS232 Final Exam May 5, 2001 CS232 Final Exam May 5, 2 Name: This exam has 4 pages, including this cover. There are six questions, worth a total of 5 points. You have 3 hours. Budget your time! Write clearly and show your work. State

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

CS / ECE 6810 Midterm Exam - Oct 21st 2008

CS / ECE 6810 Midterm Exam - Oct 21st 2008 Name and ID: CS / ECE 6810 Midterm Exam - Oct 21st 2008 Notes: This is an open notes and open book exam. If necessary, make reasonable assumptions and clearly state them. The only clarifications you may

More information

Computer Architecture EE 4720 Final Examination

Computer Architecture EE 4720 Final Examination Name Computer Architecture EE 4720 Final Examination Primary: 6 December 1999, Alternate: 7 December 1999, 10:00 12:00 CST 15:00 17:00 CST Alias Problem 1 Problem 2 Problem 3 Problem 4 Exam Total (25 pts)

More information

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,

More information

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010

ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 ECE/CS 552: Introduction to Computer Architecture ASSIGNMENT #1 Due Date: At the beginning of lecture, September 22 nd, 2010 This homework is to be done individually. Total 9 Questions, 100 points 1. (8

More information

CS 351 Exam 2 Mon. 11/2/2015

CS 351 Exam 2 Mon. 11/2/2015 CS 351 Exam 2 Mon. 11/2/2015 Name: Rules and Hints The MIPS cheat sheet and datapath diagram are attached at the end of this exam for your reference. You may use one handwritten 8.5 11 cheat sheet (front

More information

Complex Pipelines and Branch Prediction

Complex Pipelines and Branch Prediction Complex Pipelines and Branch Prediction Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L22-1 Processor Performance Time Program Instructions Program Cycles Instruction CPI Time Cycle

More information

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double *)malloc(sizeof(double)*n*n); B = (double *)malloc(sizeof(double)*n*n);

More information

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions

CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions CSEE W3827 Fundamentals of Computer Systems Homework Assignment 5 Solutions Profs. Stephen A. Edwards & Martha Kim Columbia University Due December 10th, 2012 at 5:00 PM Turn in at CSB 469. Write your

More information

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12

CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 Assigned 2/28/2018 CS152 Computer Architecture and Engineering SOLUTIONS Complex Pipelines, Out-of-Order Execution, and Speculation Problem Set #3 Due March 12 http://inst.eecs.berkeley.edu/~cs152/sp18

More information

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs.

CENG 3531 Computer Architecture Spring a. T / F A processor can have different CPIs for different programs. Exam 2 April 12, 2012 You have 80 minutes to complete the exam. Please write your answers clearly and legibly on this exam paper. GRADE: Name. Class ID. 1. (22 pts) Circle the selected answer for T/F and

More information

CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14

CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14 CS152 Computer Architecture and Engineering January 29, 2013 ISAs, Microprogramming and Pipelining Assigned January 29 Problem Set #1 Due February 14 http://inst.eecs.berkeley.edu/~cs152/sp13 The problem

More information

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #:

/ : Computer Architecture and Design Fall Midterm Exam October 16, Name: ID #: 16.482 / 16.561: Computer Architecture and Design Fall 2014 Midterm Exam October 16, 2014 Name: ID #: For this exam, you may use a calculator and two 8.5 x 11 double-sided page of notes. All other electronic

More information

Computer System Architecture Final Examination Spring 2002

Computer System Architecture Final Examination Spring 2002 Computer System Architecture 6.823 Final Examination Spring 2002 Name: This is an open book, open notes exam. 180 Minutes 22 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

Computer Organization and Structure

Computer Organization and Structure Computer Organization and Structure 1. Assuming the following repeating pattern (e.g., in a loop) of branch outcomes: Branch outcomes a. T, T, NT, T b. T, T, T, NT, NT Homework #4 Due: 2014/12/9 a. What

More information

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS.

OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS. CS/ECE472 Midterm #2 Fall 2008 NAME: Student ID#: OPEN BOOK, OPEN NOTES. NO COMPUTERS, OR SOLVING PROBLEMS DIRECTLY USING CALCULATORS. Your signature is your promise that you have not cheated and will

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle?

3/12/2014. Single Cycle (Review) CSE 2021: Computer Organization. Single Cycle with Jump. Multi-Cycle Implementation. Why Multi-Cycle? CSE 2021: Computer Organization Single Cycle (Review) Lecture-10b CPU Design : Pipelining-1 Overview, Datapath and control Shakil M. Khan 2 Single Cycle with Jump Multi-Cycle Implementation Instruction:

More information

Chapter 4. The Processor

Chapter 4. The Processor Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU hardware We will examine two MIPS implementations A simplified

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

CSE 378 Final Exam 3/14/11 Sample Solution

CSE 378 Final Exam 3/14/11 Sample Solution Name There are 8 questions worth a total of 100 points. Please budget your time so you get to all of the questions don t miss the short questions at the end. Keep your answers brief and to the point. Copies

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 22: Direct Mapped Cache Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Intel 8-core i7-5960x 3 GHz, 8-core, 20 MB of cache, 140

More information

6.004 Tutorial Problems L22 Branch Prediction

6.004 Tutorial Problems L22 Branch Prediction 6.004 Tutorial Problems L22 Branch Prediction Branch target buffer (BTB): Direct-mapped cache (can also be set-associative) that stores the target address of jumps and taken branches. The BTB is searched

More information

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses

Reducing Miss Penalty: Read Priority over Write on Miss. Improving Cache Performance. Non-blocking Caches to reduce stalls on misses Improving Cache Performance 1. Reduce the miss rate, 2. Reduce the miss penalty, or 3. Reduce the time to hit in the. Reducing Miss Penalty: Read Priority over Write on Miss Write buffers may offer RAW

More information

Written Exam / Tentamen

Written Exam / Tentamen Written Exam / Tentamen Computer Organization and Components / Datorteknik och komponenter (IS1500), 9 hp Computer Hardware Engineering / Datorteknik, grundkurs (IS1200), 7.5 hp KTH Royal Institute of

More information

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 1. Performance Principles [5 pts] The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 For each of the following comparisons,

More information

Cache Organizations for Multi-cores

Cache Organizations for Multi-cores Lecture 26: Recap Announcements: Assgn 9 (and earlier assignments) will be ready for pick-up from the CS front office later this week Office hours: all day next Tuesday Final exam: Wednesday 13 th, 7:50-10am,

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 251, Winter 2019, Assignment 5.1.1 3% of course mark Due Wednesday, March 27th, 5:30PM Lates accepted until 1:00pm March 28th with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

Final Exam Fall 2008

Final Exam Fall 2008 COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of

More information

ECE550 PRACTICE Final

ECE550 PRACTICE Final ECE550 PRACTICE Final This is a full length practice midterm exam. If you want to take it at exam pace, give yourself 175 minutes to take the entire test. Just like the real exam, each question has a point

More information

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14

MIPS Pipelining. Computer Organization Architectures for Embedded Computing. Wednesday 8 October 14 MIPS Pipelining Computer Organization Architectures for Embedded Computing Wednesday 8 October 14 Many slides adapted from: Computer Organization and Design, Patterson & Hennessy 4th Edition, 2011, MK

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Faculty of Science FINAL EXAMINATION

Faculty of Science FINAL EXAMINATION Faculty of Science FINAL EXAMINATION COMPUTER SCIENCE COMP 273 INTRODUCTION TO COMPUTER SYSTEMS Examiner: Prof. Michael Langer April 18, 2012 Associate Examiner: Mr. Joseph Vybihal 2 P.M. 5 P.M. STUDENT

More information

Final Exam Spring 2017

Final Exam Spring 2017 COE 3 / ICS 233 Computer Organization Final Exam Spring 27 Friday, May 9, 27 7:3 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of Petroleum & Minerals

More information

CS/CoE 1541 Mid Term Exam (Fall 2018).

CS/CoE 1541 Mid Term Exam (Fall 2018). CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)

More information

COMPUTER ORGANIZATION AND DESIGN

COMPUTER ORGANIZATION AND DESIGN COMPUTER ORGANIZATION AND DESIGN The Hardware/Software Interface 5 th Edition Chapter 4 The Processor Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle

More information

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts)

CS 230 Practice Final Exam & Actual Take-home Question. Part I: Assembly and Machine Languages (22 pts) Part I: Assembly and Machine Languages (22 pts) 1. Assume that assembly code for the following variable definitions has already been generated (and initialization of A and length). int powerof2; /* powerof2

More information

ECE Sample Final Examination

ECE Sample Final Examination ECE 3056 Sample Final Examination 1 Overview The following applies to all problems unless otherwise explicitly stated. Consider a 2 GHz MIPS processor with a canonical 5-stage pipeline and 32 general-purpose

More information

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions.

Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions. Control Hazards - branching causes problems since the pipeline can be filled with the wrong instructions Stage Instruction Fetch Instruction Decode Execution / Effective addr Memory access Write-back Abbreviation

More information

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer

1 5. Addressing Modes COMP2611 Fall 2015 Instruction: Language of the Computer 1 5. Addressing Modes MIPS Addressing Modes 2 Addressing takes care of where to find data instruction We have seen, so far three addressing modes of MIPS (to find data): 1. Immediate addressing: provides

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Please state clearly any assumptions you make in solving the following problems.

Please state clearly any assumptions you make in solving the following problems. Computer Architecture Homework 3 2012-2013 Please state clearly any assumptions you make in solving the following problems. 1 Processors Write a short report on at least five processors from at least three

More information

Cache introduction. April 16, Howard Huang 1

Cache introduction. April 16, Howard Huang 1 Cache introduction We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? The rest of CS232 focuses on memory and input/output issues, which are frequently

More information

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay!

Lecture 16. Today: Start looking into memory hierarchy Cache$! Yay! Lecture 16 Today: Start looking into memory hierarchy Cache$! Yay! Note: There are no slides labeled Lecture 15. Nothing omitted, just that the numbering got out of sequence somewhere along the way. 1

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with

More information

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches CS 61C: Great Ideas in Computer Architecture Cache Performance, Set Associative Caches Instructor: Justin Hsia 7/09/2012 Summer 2012 Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy

More information

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch

Predict Not Taken. Revisiting Branch Hazard Solutions. Filling the delay slot (e.g., in the compiler) Delayed Branch branch taken Revisiting Branch Hazard Solutions Stall Predict Not Taken Predict Taken Branch Delay Slot Branch I+1 I+2 I+3 Predict Not Taken branch not taken Branch I+1 IF (bubble) (bubble) (bubble) (bubble)

More information

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Question 1: (20 points) For this question, refer to the following pipeline architecture. This is the Mid Term exam given in Fall 2018. Note that Question 2(a) was a homework problem this term (was not a homework problem in Fall 2018). Also, Questions 6, 7 and half of 5 are from Chapter 5,

More information

Welcome to Part 3: Memory Systems and I/O

Welcome to Part 3: Memory Systems and I/O Welcome to Part 3: Memory Systems and I/O We ve already seen how to make a fast processor. How can we supply the CPU with enough data to keep it busy? We will now focus on memory issues, which are frequently

More information

CS 341l Fall 2008 Test #2

CS 341l Fall 2008 Test #2 CS 341l all 2008 Test #2 Name: Key CS 341l, test #2. 100 points total, number of points each question is worth is indicated in parentheses. Answer all questions. Be as concise as possible while still answering

More information

COSC 6385 Computer Architecture. - Memory Hierarchies (II)

COSC 6385 Computer Architecture. - Memory Hierarchies (II) COSC 6385 Computer Architecture - Memory Hierarchies (II) Fall 2008 Cache Performance Avg. memory access time = Hit time + Miss rate x Miss penalty with Hit time: time to access a data item which is available

More information

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0;

Plot SIZE. How will execution time grow with SIZE? Actual Data. int array[size]; int A = 0; How will execution time grow with SIZE? int array[size]; int A = ; for (int i = ; i < ; i++) { for (int j = ; j < SIZE ; j++) { A += array[j]; } TIME } Plot SIZE Actual Data 45 4 5 5 Series 5 5 4 6 8 Memory

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 17 Advanced Processors I 2005-10-27 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: David Marquardt and Udam Saini www-inst.eecs.berkeley.edu/~cs152/

More information

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives

ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Project Team TWO Objectives ECE 473 Computer Architecture and Organization Project: Design of a Five Stage Pipelined MIPS-like Processor Due: December 8, 2011 Instructor: Dr. Yifeng Zhu Project Team This is a team project. All teams

More information

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used?

4.1.3 [10] < 4.3>Which resources (blocks) produce no output for this instruction? Which resources produce output that is not used? 2.10 [20] < 2.2, 2.5> For each LEGv8 instruction in Exercise 2.9 (copied below), show the value of the opcode (Op), source register (Rn), and target register (Rd or Rt) fields. For the I-type instructions,

More information

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015 University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz (30 minutes) January 2, 205 Student ID number: Student Last Name: Exercise. In the following list of performance

More information

Lecture 13: Branch Prediction

Lecture 13: Branch Prediction S 09 L13-1 18-447 Lecture 13: Branch Prediction James C. Hoe Dept of ECE, CMU March 4, 2009 Announcements: Spring break!! Spring break next week!! Project 2 due the week after spring break HW3 due Monday

More information

LECTURE 3: THE PROCESSOR

LECTURE 3: THE PROCESSOR LECTURE 3: THE PROCESSOR Abridged version of Patterson & Hennessy (2013):Ch.4 Introduction CPU performance factors Instruction count Determined by ISA and compiler CPI and Cycle time Determined by CPU

More information

CS 152 Computer Architecture and Engineering

CS 152 Computer Architecture and Engineering CS 152 Computer Architecture and Engineering Lecture 20 Advanced Processors I 2005-4-5 John Lazzaro (www.cs.berkeley.edu/~lazzaro) TAs: Ted Hong and David Marquardt www-inst.eecs.berkeley.edu/~cs152/ Last

More information

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems ECE 356: Architecture, Concurrency, and Energy of Computation Sample Problem Set: Memory Systems TLB 1. Consider a processor system with 256 kbytes of memory, 64 Kbyte pages, and a 1 Mbyte virtual address

More information

CS 654 Computer Architecture Summary. Peter Kemper

CS 654 Computer Architecture Summary. Peter Kemper CS 654 Computer Architecture Summary Peter Kemper Chapters in Hennessy & Patterson Ch 1: Fundamentals Ch 2: Instruction Level Parallelism Ch 3: Limits on ILP Ch 4: Multiprocessors & TLP Ap A: Pipelining

More information

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination

THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY Computer Organization (COMP 2611) Spring Semester, 2014 Final Examination May 23, 2014 Name: Email: Student ID: Lab Section Number: Instructions: 1. This

More information

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA

CISC 662 Graduate Computer Architecture. Lecture 4 - ISA CISC 662 Graduate Computer Architecture Lecture 4 - ISA Michela Taufer http://www.cis.udel.edu/~taufer/courses Powerpoint Lecture Notes from John Hennessy and David Patterson s: Computer Architecture,

More information

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts

EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts EN2910A: Advanced Computer Architecture Topic 02: Review of classical concepts Prof. Sherief Reda School of Engineering Brown University S. Reda EN2910A FALL'15 1 Classical concepts (prerequisite) 1. Instruction

More information

ungraded and not collected

ungraded and not collected ELE 475 Spring 2012 PS#2 Solutions will be handed out week of 5/14/2012 This problem set is ungraded and not collected. Please stop by office hours if you have questions. Problem #1: For this problem,

More information

Computer System Architecture Quiz #2 April 5th, 2019

Computer System Architecture Quiz #2 April 5th, 2019 Computer System Architecture 6.823 Quiz #2 April 5th, 2019 Name: This is a closed book, closed notes exam. 80 Minutes 16 Pages (+2 Scratch) Notes: Not all questions are of equal difficulty, so look over

More information

Control Hazards. Prediction

Control Hazards. Prediction Control Hazards The nub of the problem: In what pipeline stage does the processor fetch the next instruction? If that instruction is a conditional branch, when does the processor know whether the conditional

More information

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add;

CSE Lecture 13/14 In Class Handout For all of these problems: HAS NOT CANNOT Add Add Add must wait until $5 written by previous add; CSE 30321 Lecture 13/14 In Class Handout For the sequence of instructions shown below, show how they would progress through the pipeline. For all of these problems: - Stalls are indicated by placing the

More information

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011

ADVANCED COMPUTER ARCHITECTURES: Prof. C. SILVANO Written exam 11 July 2011 ADVANCED COMPUTER ARCHITECTURES: 088949 Prof. C. SILVANO Written exam 11 July 2011 SURNAME NAME ID EMAIL SIGNATURE EX1 (3) EX2 (3) EX3 (3) EX4 (5) EX5 (5) EX6 (4) EX7 (5) EX8 (3+2) TOTAL (33) EXERCISE

More information

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality

EEC 170 Computer Architecture Fall Improving Cache Performance. Administrative. Review: The Memory Hierarchy. Review: Principle of Locality Administrative EEC 7 Computer Architecture Fall 5 Improving Cache Performance Problem #6 is posted Last set of homework You should be able to answer each of them in -5 min Quiz on Wednesday (/7) Chapter

More information

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero

Do-While Example. In C++ In assembly language. do { z--; while (a == b); z = b; loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero Do-While Example In C++ do { z--; while (a == b); z = b; In assembly language loop: addi $s2, $s2, -1 beq $s0, $s1, loop or $s2, $s1, $zero 25 Comparisons Set on less than (slt) compares its source registers

More information

COMPUTER ORGANIZATION AND DESI

COMPUTER ORGANIZATION AND DESI COMPUTER ORGANIZATION AND DESIGN 5 Edition th The Hardware/Software Interface Chapter 4 The Processor 4.1 Introduction Introduction CPU performance factors Instruction count Determined by ISA and compiler

More information

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the

Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. The memory is word addressable The size of the Question 1 (5 points) Consider a cache with the following specifications Address space is 1024 words. he memory is word addressable he size of the cache is 8 blocks; each block is 4 words (32 words cache).

More information

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard

ECE473 Computer Architecture and Organization. Pipeline: Control Hazard Computer Architecture and Organization Pipeline: Control Hazard Lecturer: Prof. Yifeng Zhu Fall, 2015 Portions of these slides are derived from: Dave Patterson UCB Lec 15.1 Pipelining Outline Introduction

More information

ECE232: Hardware Organization and Design

ECE232: Hardware Organization and Design ECE232: Hardware Organization and Design Lecture 22: Introduction to Caches Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Caches hold a subset of data from the main

More information

CSCE 212: FINAL EXAM Spring 2009

CSCE 212: FINAL EXAM Spring 2009 CSCE 212: FINAL EXAM Spring 2009 Name (please print): Total points: /120 Instructions This is a CLOSED BOOK and CLOSED NOTES exam. However, you may use calculators, scratch paper, and the green MIPS reference

More information

University of Toronto Faculty of Applied Science and Engineering

University of Toronto Faculty of Applied Science and Engineering Print: First Name:............ Solutions............ Last Name:............................. Student Number:............................................... University of Toronto Faculty of Applied Science

More information