Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm

Size: px
Start display at page:

Download "Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm"

Transcription

1 Second Semester, Homework 3 (r1.1) Due: Part (A) -- Apr 29, 2016, 11:55pm Part (B) -- Apr 29, 2016, 11:55pm Part (C) -- Apr 29, 2016, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts in this homework. Part A includes questions that aim to help you with understanding the lecture materials. They resemble the kind of questions you will encounter in quizzes and the final exam. Your answers to this part will be graded on your effort. Part B of this homework are hands-on exercises that require you to design and evaluate processor systems using various software and hardware tools, including Chisel and the RISCV-V compilation tool chains. They are designed to help you understand real-world processor design and the use of various tools to help you along the way. This part of the homework will be graded on correctness. Part C of this homework contains open-ended mini-project ideas. They are open-ended by nature, meaning there are no right-wrong answers. You must choose to attempt one of the several available topics. You may work individually or in groups of up to 3 for this part. If you work in groups, each of you must submit independent report on the project. The following summarize the 3 parts. Part Type Indv/Grp Grading A Basic problem set Individual Graded on effort B Hands-on Individual or Group of 2 to 3 Graded on correctness C Mini-project Individual or Group of 2 to 3 Graded on effort In all cases, you are encouraged to discuss the homework problems offline or online using Piazza. However, you should not ask for or give out solution directly as that defeat the idea of having homework exercise. Giving out answers or copying answers directly will likely constitute an act of plagiarism.

2 Part A: Problem Set A.1 Cache Access Consider the following sequence of memory accesses to the main memory in a 16-bit processor: Address (hex) Type A000 R B000 R A380 R A004 W 580C W A108 R 5800 R A10C W A39C W A3AC R A1AC R A006 R 5804 R A.1.1 Assume the following data cache organization: Capacity: 4 KiB Line size: 8 words Organization: direct map Policy: write back, write allocate Trace through the above memory access and answer the following: (i) For each access, is it a hit or a miss? (ii) Show the final content in the cache, including the tag. For sake of simplicity, assume the content of a memory address is the same as its address, i.e., mem[x] = X A.1.2 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: 2-way set associative Policy: LRU, write back, write allocate A.1.3 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: direct map r1.1 Page 2 of 17

3 Policy: write through, no write allocate A.1.4 Repeat A.1.1 but with a different cache: Capacity: 4 KiB Line size: 8 words Organization: 2-way set associative Policy: LRU, write through, no write allocate A.2 Cache Performance You are evaluating the performance of the cache subsystem of a processor. The initial design of the processor has the following cache: Separate instruction and data cache Cache hit time is 1 cycle Cache miss penalty is 300 cycles. A.2.1 Focusing on the instruction cache in this part. The instruction cache has a miss rate of 5 %. What is the average memory access time (AMAT) of the instruction cache? A.2.2 The data cache has a miss rate of 20 %, what is the AMAT of the data cache? A.2.3 After profiling a program B, you find the following percentage of dynamic instructions: ALU Jump/Branch Load/Store 50 % 15 % 35 % Assuming CPI of all branch/jump instructions are 2 and CPI of ALU instructions are 1. Also assume for this part that the instruction cache is perfect, i.e., miss rate is 0 %. Then what is the overall CPI of program B? A.2.4 Based on the above D-cache calculation, which of the following change will improve the CPI of program B most? Explain your answer. (i) Change D-cache into a fully associative cache (ii) Increase clock speed of the processor (iii) Rewrite B to increase data reuse in cache A.2.5 Now, consider the realistic instruction cache with 5 % miss rate as mentioned in A.2.1. Assume when the I-cache miss, the entire processor pipeline is stalled while the memory subsystem fetches data from main memory. Since there is only 1 external DRAM, if both I-cache and D-cache misses at the same cycle, the 2 memory accesses have to take place sequentially. With these assumption, what is the average CPI of program B taking into account the possibility of both I-cache and D-cache miss? A.2.6 As an attempt to improve CPI of the program, you are considering the addition of a new dual-port DRAM that allows 2 concurrent memory accesses. With the new dual-port memory, if both I-cache and D-cache misses at the same cycle, both of them may fetch from this memory at the same time. With the dual-port DRAM, however, the miss penalty increases to 310 cycles. Assume 2 % of the I-cache misses overlap with D-cache misses. What is the overall CPI of program B? Is it the overall performance improved? r1.1 Page 3 of 17

4 A.3 Instruction Cache Adapted from final exam 2015 You are desigining the intruction cache for a new 32-bit processor. Because of other hardware constraints, your cache design must meet the following criteria: Virtual address space is 32-bit wide. Word size is 32 bits. Cache must be indexed and tagged with virtual addresses. Line size must be 4 words The number of bits to index the cache must be 8 bits. Page size is 8 KiB Translation from virtual to physical address is controlled by the OS and the page mapping is pseudo-random. Process ID is 4-bit wide. All compiled programs start running from address 0x The OS performs a context switch every 256 instructions. A.3.1 In order to differentiate data from different processes, you have decided to concatenate the process ID to the virtual address tag in the cache. Assume you are using a direct-map cache, what is the minimum width of the combined tag? bits A.3.2 Assume you are using a direct-map cache with the maximum possible size given the above constraints, what is the total number of bits required for tag storage in the cache? r1.1 Page 4 of 17

5 A.3.3 Your machine starts with an empty instruction cache. The following program is run as Process 1 (with process ID = 1): 0x _start: addi a1, a1, 1 # instruction 0 0x addi a3, zero, x000401FC bne a1, a3, _start # instruction 127: NOT taken on 4th time Process 1 begins execution for 256 instructions. Then it is stopped. There is no branch instruction in the first 128 instructions except for the last bne. The bne instruction takes the branch the first 3 times it is encountered. The branch condition is false on the 4th time it is executed. After Process 1 has stopped (after it has executed 256 instructions), how many hits and misses have occurred in the instruction cache? Briefly explain your answer. A.3.4 Following up from the previous part, after Process 1 is stopped, the OS switches in Process 2. Process 2 is the exact same program as Process 1 except it is started by a different user. Therefore, it executes the exact same code as the above part except with a process ID = 2. Process 2 is again stopped after 256 instructions. How many hits and misses will have occurred in the instruction cache due to running Process 2? Briefly explain your answer. r1.1 Page 5 of 17

6 A.3.5 Since there are only 2 processes running, the process switching between Process 1 and Process 2 continues: Process 1 is again switched in after Process 2. It continues its execution for another 256 instructions. By now, Process 1 has completed 512 instructions since it started. Process 2 is switched in place of Process 1 and run for 256 instructions. When the two processes run in the processor for the second time, is the instruction hit rate changed when compared to your answers in A.3.3 and A.3.4? If the hit rate has changed, explain how is it different. If the hit rate remains the same, explain why is it the case. A.3.6 As an attempt to improve performance of the instruction cache, you are given the option to change the instruction cache organization into 2-way set associative while keeping the cache capacity constant. However, as the tag capacity is limited, you can no longer store the process IDs in the cache. As a result, you need to flush the cache for every context switch. Considering the 2-process scenario above, how would the change to a 2-way set associate cache with cache flushing affect the overall instruction hit rate after both Process 1 and 2 have finished running 512 instructions? That is: the processor has just finished executing Process 1 (256 r1.1 Page 6 of 17

7 instructions) Process 2 (256 instructions) Process 1 (256 instructions) Process 2 (256 instructions). A.3.7 Your teammate suggest that you should keep the process ID in the cache tag to avoid flushing of cache during context switch. However, since you are limited in hardware resource, your teammate suggest that the capacity of the cache should be reduced by half as a tradeoff. Briefly discuss if this scheme using a 2-way set associative cache with reduced cache capacity and no flushing may improve performance over the original direct map cache. A.4 Streaming Cache Performance You are investigating the cache performance of your processor regarding the following code segment: // int i, n, a; // int y[], x[]; for (i = 0; i < n; i++) { y[i] = a*x[i] + y[i]; } r1.1 Page 7 of 17

8 A.4.1 Cache Hit or Miss Assume that a, i, and n are stored in registers. Array x is stored at memory address starting at 0xA while the array y is stored at memory address 0xB The processor data cache is initially empty. Below are the details of the data cache: Capacity: 1 MiB Organization: 2-way set associative Line size: 4 words Replacement policy: true LRU Write policy: write through, no write allocate A write buffer is available such that the processor can resume running immediately after writing the data to the buffer. Let n = 2 20, trace through the above code, then show and explain the sequence of cache hit or miss that will occur in the space below. Use the notation RH for read hit, WH for write hit, RM for read miss, and WM for write miss. A.4.2 Based on your result above, what is the overall data cache miss rate of the above code when executed in the main CPU? Consider BOTH read and write access. Assume the write and read miss penalties are both 300 cycles. Hit time is 1 cycle. What is the average memory access time (AMAT) for this code? A.4.3 Now the above C code is compiled into the following RISC-V instructions: r1.1 Page 8 of 17

9 # a0 = 2^20 # a1 is base address of x[] # a2 is base address of y[] # constant a is stored in register a3 00: loop: addi a0 a0-1 04: lw t1, 0(a1) 08: lw t2, 0(a2) 0C: mult t0, t1, a3 10: add t0, t0, t2 14: sw t0, 0(a2) 18: addi a1, a1, 4 1C: addi a2, a2, 4 20: bne a0, zero, loop Let n = Assume add, addi and bne takes 1 cycle; mult takes 4 cycles; and performance of lw and sw depends on the cache performance. What is the total run time of the above code in cycles? You may leave the variable n in your answer. A.4.4 Write Back Cache If the write policy of the cache is changed to write back, how would the following be affected? (i) Read miss penalty (ii) Overall performance of the above code Explain your answer by tracing through the execution of the above code, highlighting any different in the required cache content handling with the use of write back cache. r1.1 Page 9 of 17

10 A.5 Page Table & TLB In this exercise you will experiment with the interaction between the TLB and the page table in a VM system. Assume the following system configuration: 32-bit architecture 8 KiB page size 4-entry, direct mapped TLB Initially, the TLB and page table contains the following entries: TLB Valid VPN (dec) PPN (dec) Page Table Index Valid PPN or Disk disk disk disk A.5.1 The following sequence of memory accesses are issued: 0x0000F234, 0x0000A008, 0x , 0x0000F098, 0x0000A09C, 0x0000F00C, 0x , 0x C, 0x , 0x Assume an invalid page table entry indicates that the page has not been allocated to a user process and all entries in the page table not shown above are invalid. r1.1 Page 10 of 17

11 For each memory access, indicate if it is a generates a TLB hit or miss, and whether it generates a page fault. Also, show the final state of the TLB and page table after the above accesses. A.5.2 What are some of the advantages and disadvantages of larger page size? A.5.3 As TLB access is in the processor critical path, you are experimenting if a direct map TLB could improve performance. Now assume the capacity of the TLB remains unchanged, consider the impact to TLB performance regarding the instructions of a program P. The instructions of P begins at memory location 0x , during run time, it allocates and access only 1 data page starting at address 0xD In the following scenarios, explain how the well the TLB would perform with a direct map organization when compared to the original fully associative organization? (i) Only 1 copy of P running in the system (ii) 2 users both running P at the same time. OS flushes TLB when swapping between processes (iii) 2 users both running P at the same time. OS does not flush TLB. Instead, the process id is prepended to the virtual page number, i.e., VPN tag becomes [process id VPN]. (iv) 2 users both running P at the same time. OS does not flush TLB. Instead, the process id is appended to the virtual page number, i.e., VPN tag becomes [VPN process id]. r1.1 Page 11 of 17

12 Homework 3, Part B Part B: Hands-on Exercise B.1 Micro-Benchmarking Cache Performance In this exercise, you will perform a mirco-benchmarking of the cache system of real machines. The core of the benchmarking program is the following loop: for(stride= STRIDE_MIN; stride <= STRIDE_MAX; stride = stride << 1) { for(i=0; i < asize; i += stride) { array[i] = array[i]; } } By carefully examining the time it takes to access elements of an array with different strides, it is possible to deduce information about the cache system such as its cache size, associativity, block size, etc. B.1.1 Obtain the File Download the file from: B.1.2 Compilation On a Linux machine, such as on tux-1.eee.hku.hk, compile the program as follows: gcc -o mbench -O0 mbench.c -lrt If you want to, on an OSX machine, you can compile the program as follows: gcc -o mbench -O0 mbench.c In both cases, note that the switch in the middle of the line is minus capital-oh Zero. B.1.3 Execute the compiled program by issuing the command: >./mbench By default, the results will be printed to the screen. If you want to save the results to a file, while seeing the output at the same time, you can pipe the output to the tee command as follows: r1.1 Page 12 of 17

13 Homework 3, Part B >./mbench tee output.log With the above command, the output will be printed to the screen while saving to the file output.log. B.1.4 Output Comma Separated Values Be default, the output of mbench is in human readable form. To make plotting the results easier, run the program with the -c switch as follows: >./mbench -c B.1.5 Plotting Results To analyze the micro-benchmark, it is easiest to plot the results. You can plot the CSV file using gnuplot, Excel or Matlab. Your results should look similar to the one below: (a) Full Range (b) Small Arrays Only Your plots should have the stride size on the x-axis, and access time on y-axis. Each series of plot represents the results of one array size. Your x-axis should have a base 2 log scale as your stride size increases exponentially as power of 2. Hint: it may be useful to produce separate plots for very large and very small array sizes so you can clearly observe the pattern in cache access time. B.1.6 Submission Submit your plot(s) for tux-1.eee.hku.hk. If you have access to other Linux or OSX machine, you may compile mbench there and submit results from different processors as well. B.1.7 Analysis From your plots, there are a lot you can learn about the memory hierarchy. By analyzing the plots, try to deduce the following: How many levels of cache does your processor has? For each level of cache, what is its block size, capacity, and associativity? What is the hit time of each level of cache? What is the page fault time? You may not be able to deduce all the information. In fact, modern processors have many advanced features that may obscure your analysis above. If you have access to a machine with older processors, the effect will be easier to see. See hints below. r1.1 Page 13 of 17

14 Homework 3, Part B B.1.8 Hints on analyzing the results Consider a simple example to begin understanding the effect of stride size and array size on the number of cache hits/misses. Recall that all elements of your array are being accessed in strides, and the process is repeated many times. As a result, the number of cache hits/misses is not going to be due to compulsory misses. Now, consider a cache with 16 words capacity, then ask yourself the following questions: If the array size is smaller than 16 words, how many hit/misses will you get. Consider an array with 32 words. Start with a direct map cache, with 1 word block size, how many hits/misses do you get when you increase the stride size from 1 word to 16 words? Now, if the cache is a 2-way set associative cache, how many hits/misses do you get when you increase the stride size from 1 word to 16 words? What if the cache is 4-way set associative? Can you observe a pattern in the change in number of hits/misses when the associativity increases? Now repeat the above exercise with block size of 2 words and 4 words. When does a change in block size have effect on the number of hits/misses you experience? With the above observations, go back and analyze your plot results. Do you observe similar change in hit/miss time? From these results, you can deduce certain features of the cache. B.1.9 Submission Submit your analysis of the results for at least tux-1.eee.hku.hk. If you have tried the same exercise on different processors, you may submit additional analysis. Feel free to compare your results with information you can find about the processor from online resources. r1.1 Page 14 of 17

15 Homework 3, Part C Part C: Open-ended Project C.1 Breadth First Search In homework 1, you have experimented with different ways to implement the breadth first search (BFS) algorithm to maximize performance in your single cycle processor. In this homework, you will continue the same idea, but with a focus on memory hierarchy performance. To evaluate your BFS performance, you will be using a RISC-V ISA simulator called spike. The spike simulator simulates the behavior of the RISC-V ISA with limited hardware implementation details. It features a built-in cache simulator that captures every memory access generated from the processor and collect statistics accordingly. C.1.1 The ISA simulator is already installed on tux-1.eee.hku.hk. If you prefer to run the simulator on your own machine, you need to get the latest RISC-V toolchain source code from: To obtain the files for homework 3, perform the following on tux-1.eee.hku.hk: tux-1$ cd ~ tux-1$ tar xzvf ~elec3441/elec3441hw3.tar.gz tux-1$ cd hw3 tux-1$ export HW3ROOT=$PWD In the downloaded file you will find different benchmark programs located in its own individual directory. C.1.2 Compiling Benchmark Programs The ISA simulator may execute any valid RISC-V program. If you examine the source code in each subdirectory, you will notice they are no different from any other normal program. Feel free to write your own benchmark if you are curious. You must setup your environment correctly to make use of the RISC-V toolchain. On tux-1, you can use the following command: tux-1$. ~elec3441/elec3441.bashrc To compile the provided benchmark, you may either perform a make command in each director, or make use of the top-level makefile that is provided for you: tux-1$ cd ${HW3ROOT}/benchmarks tux-1$ make r1.1 Page 15 of 17

16 Homework 3, Part C C.1.3 Running Simulator You may now execute your compiled program using the spike ISA simulator. Execute the target binary with an L1 instruction cache as follows: tux-1$ cd ${HW3ROOT}/benchmarks/kmean tux-1$ spike --ic=128:2:64 pk kmean Veirification passed! I$ Bytes Read: I$ Bytes Written: 0 I$ Read Accesses: I$ Write Accesses: 0 I$ Read Misses: 163 I$ Write Misses: 0 I$ Writebacks: 0 I$ Miss Rate: 0.000% tux-1$ In the above line, the last argument kmean specifies the RISC-V binary that you are simulating using spike. The argument pk before that stands for proxy kernel, and it tells spike to use the native Linux kernel to handle any system calls. Finally, the argument --ic=128:2:64 tells spike to simulate an instruction cache that has 128 entries 2-way set associative 64-byte block If you multiply the three parameters, you get the capacity of the instruction as = 16 KiB. You can also specify the use of a data cache with the argument --dc=<s>:<w>:<b> and a unified L2 cache with the argument --l2=<s>:<w>:<b>. For example: tux-1$ spike --ic=128:2:64 pk... tux-1$ spike --ic=128:2:64 --dc=128:2:64 pk... tux-1$ spike --ic=128:2:64 --dc=128:2:64 --l2=1024:4:64 pk... C.1.4 Warmup Exercise Now, run all the benchmark programs with the following 2 memory hierarchies, one with L2 and the other without L2 cache: --ic=128:2:64 --dc=128:2:64 --ic=128:2:64 --dc=128:2:64 --l2=1024:4:64 You can collect statistics about the memory hierarchies from the output. Make sure you understand what is the miss rate for L1 I$, L1 D$ and L2$ in each case. Which benchmark has the best and which one has the worst cache performance? Hint: Consider automating the process, as you will probably need to regenerate a lot similar statistics for next part. C.1.5 Basic BFS Implementation Evaluation Now, consider a processor with the following memory hierarchy: L1 I$ L1 D$ L2$ Capacity 64 KiB 64 KiB 4 MiB Associativity Line Size 64 B 64 B 128 B Targeting this processor, evaluate the performance of the 2 versions of BFS that were experimented in homework 1 one version utilizes a link list data structure while the other one utilizes r1.1 Page 16 of 17

17 Homework 3, Part C an adjacency list data structure. Evaluate the programs with the graph of input size Which version results in better cache performance? C.1.6 Improved BFS Now given your understanding from above, try to implement a faste version of BFS than the 2 provided samples. You may use either data structure, or implement you own version of BFS if you like. You may also tune the data/instruction cache to values that beneift your code most. C.1.7 Submission Submit your implementation of BFS together with a report on how you optimized the code for the targeted cache system. r1.1 Page 17 of 17

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm

Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Second Semester, 2016 17 Homework 2 (r1.1) Due: Part (A) -- Apr 2, 2017, 11:55pm Part (B) -- Apr 2, 2017, 11:55pm Part (C) -- Apr 2, 2017, 11:55pm Instruction: Submit your answers electronically through

More information

Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm

Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm Second Semester, 2016 17 Homework 2 (r1.0) Due: Mar 27, 2018, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts in this homework. Part A includes questions

More information

ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12

ELEC3441: Computer Architecture Second Semester, Homework 3 (r1.1) SOLUTION. r1.1 Page 1 of 12 Homework 3, Part ELEC3441: Computer Architecture Second Semester, 2015 16 Homework 3 (r1.1) r1.1 Page 1 of 12 A.1 Cache Access Part A: Problem Set Consider the following sequence of memory accesses to

More information

Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm

Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm Second Semester, 2016 17 Homework 3 (r1.2) Due: Part (A) -- Apr 28, 2017, 11:55pm Part (B) -- Apr 28, 2017, 11:55pm Part (C) -- Apr 28, 2017, 11:55pm Instruction: Submit your answers electronically through

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double *)malloc(sizeof(double)*n*n); B = (double *)malloc(sizeof(double)*n*n);

More information

Homework 1 (r1.0) Due: Part (A) Feb, 2018, 11:55pm Part (B) Feb, 2018, 11:55pm

Homework 1 (r1.0) Due: Part (A) Feb, 2018, 11:55pm Part (B) Feb, 2018, 11:55pm Second Semester, 2017 18 Homework 1 (r1.0) Due: Part (A) -- 28 Feb, 2018, 11:55pm Part (B) -- 28 Feb, 2018, 11:55pm Instruction: Submit your answers electronically through Moodle. There are 3 major parts

More information

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle) ECE331 Homework 4 Due Monday, August 13, 2018 (via Moodle) 1. Below is a list of 32-bit memory address references, given as hexadecimal byte addresses. The memory accesses are all reads and they occur

More information

COSC 3406: COMPUTER ORGANIZATION

COSC 3406: COMPUTER ORGANIZATION COSC 3406: COMPUTER ORGANIZATION Home-Work 5 Due Date: Friday, December 8 by 2.00 pm Instructions for submitting: Type your answers and send it by email or take a printout or handwritten (legible) on paper,

More information

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name:

SOLUTION. Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: SOLUTION Notes: CS 152 Computer Architecture and Engineering CS 252 Graduate Computer Architecture Midterm #1 February 26th, 2018 Professor Krste Asanovic Name: I am taking CS152 / CS252 This is a closed

More information

Virtual Memory Overview

Virtual Memory Overview Virtual Memory Overview Virtual address (VA): What your program uses Virtual Page Number Page Offset Physical address (PA): What actually determines where in memory to go Physical Page Number Page Offset

More information

ELEC3441: Computer Architecture Second Semester, Homework 1 (r1.2) SOLUTION. r1.2 Page 1 of 7

ELEC3441: Computer Architecture Second Semester, Homework 1 (r1.2) SOLUTION. r1.2 Page 1 of 7 Homework 1, Part ELEC3441: Computer Architecture Second Semester, 2015 16 Homework 1 (r1.2) r1.2 Page 1 of 7 A.1 Iron Law A.1.1 A.1.2 Part A: Problem Set #cycles 1 140 + 8 80 + 4 30 CP I A = = = 3.6 #instructions

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Midterm Exam Prof. Martin Thursday, March 15th, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached

More information

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed

and data combined) is equal to 7% of the number of instructions. Miss Rate with Second- Level Cache, Direct- Mapped Speed 5.3 By convention, a cache is named according to the amount of data it contains (i.e., a 4 KiB cache can hold 4 KiB of data); however, caches also require SRAM to store metadata such as tags and valid

More information

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming

MEMORY HIERARCHY BASICS. B649 Parallel Architectures and Programming MEMORY HIERARCHY BASICS B649 Parallel Architectures and Programming BASICS Why Do We Need Caches? 3 Overview 4 Terminology cache virtual memory memory stall cycles direct mapped valid bit block address

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems ECE 356: Architecture, Concurrency, and Energy of Computation Sample Problem Set: Memory Systems TLB 1. Consider a processor system with 256 kbytes of memory, 64 Kbyte pages, and a 1 Mbyte virtual address

More information

CS 251, Winter 2018, Assignment % of course mark

CS 251, Winter 2018, Assignment % of course mark CS 251, Winter 2018, Assignment 5.0.4 3% of course mark Due Wednesday, March 21st, 4:30PM Lates accepted until 10:00am March 22nd with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1>

Digital Logic & Computer Design CS Professor Dan Moldovan Spring Copyright 2007 Elsevier 8-<1> Digital Logic & Computer Design CS 4341 Professor Dan Moldovan Spring 21 Copyright 27 Elsevier 8- Chapter 8 :: Memory Systems Digital Design and Computer Architecture David Money Harris and Sarah L.

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary

Chapter 8 :: Topics. Chapter 8 :: Memory Systems. Introduction Memory System Performance Analysis Caches Virtual Memory Memory-Mapped I/O Summary Chapter 8 :: Systems Chapter 8 :: Topics Digital Design and Computer Architecture David Money Harris and Sarah L. Harris Introduction System Performance Analysis Caches Virtual -Mapped I/O Summary Copyright

More information

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp.

registers data 1 registers MEMORY ADDRESS on-chip cache off-chip cache main memory: real address space part of virtual addr. sp. Cache associativity Cache and performance 12 1 CMPE110 Spring 2005 A. Di Blas 110 Spring 2005 CMPE Cache Direct-mapped cache Reads and writes Textbook Edition: 7.1 to 7.3 Second Third Edition: 7.1 to 7.3

More information

Virtual Memory, Address Translation

Virtual Memory, Address Translation Memory Hierarchy Virtual Memory, Address Translation Slides contents from: Hennessy & Patterson, 5ed Appendix B and Chapter 2 David Wentzlaff, ELE 475 Computer Architecture MJT, High Performance Computing,

More information

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements

Agenda. CS 61C: Great Ideas in Computer Architecture. Virtual Memory II. Goals of Virtual Memory. Memory Hierarchy Requirements CS 61C: Great Ideas in Computer Architecture Virtual II Guest Lecturer: Justin Hsia Agenda Review of Last Lecture Goals of Virtual Page Tables Translation Lookaside Buffer (TLB) Administrivia VM Performance

More information

ENCM 369 Winter 2016 Lab 11 for the Week of April 4

ENCM 369 Winter 2016 Lab 11 for the Week of April 4 page 1 of 13 ENCM 369 Winter 2016 Lab 11 for the Week of April 4 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2016 Lab instructions and other documents for ENCM

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. April 12, 2018 L16-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L16-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

CS152 Computer Architecture and Engineering

CS152 Computer Architecture and Engineering CS152 Computer Architecture and Engineering Caches and the Memory Hierarchy Assigned 9/17/2016 Problem Set #2 Due Tue, Oct 4 http://inst.eecs.berkeley.edu/~cs152/fa16 The problem sets are intended to help

More information

CS 61C: Great Ideas in Computer Architecture. Virtual Memory

CS 61C: Great Ideas in Computer Architecture. Virtual Memory CS 61C: Great Ideas in Computer Architecture Virtual Memory Instructor: Justin Hsia 7/30/2012 Summer 2012 Lecture #24 1 Review of Last Lecture (1/2) Multiple instruction issue increases max speedup, but

More information

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur High Performance Computer Architecture Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 26 Cache Optimization Techniques (Contd.) (Refer

More information

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM

CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM CSCE 513 Computer Architecture, Fall 2018, Assignment #2, due 10/08/2018, 11:55PM Covered topics: 1) pipeline, hazards, and instruction scheduling. 2) pipeline implementation. 3) Cache Organization and

More information

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations?

c. What are the machine cycle times (in nanoseconds) of the non-pipelined and the pipelined implementations? Brown University School of Engineering ENGN 164 Design of Computing Systems Professor Sherief Reda Homework 07. 140 points. Due Date: Monday May 12th in B&H 349 1. [30 points] Consider the non-pipelined

More information

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY

LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY LECTURE 4: LARGE AND FAST: EXPLOITING MEMORY HIERARCHY Abridged version of Patterson & Hennessy (2013):Ch.5 Principle of Locality Programs access a small proportion of their address space at any time Temporal

More information

Computer Architecture EE 4720 Final Examination

Computer Architecture EE 4720 Final Examination Name Computer Architecture EE 4720 Final Examination Primary: 6 December 1999, Alternate: 7 December 1999, 10:00 12:00 CST 15:00 17:00 CST Alias Problem 1 Problem 2 Problem 3 Problem 4 Exam Total (25 pts)

More information

CS 251, Winter 2019, Assignment % of course mark

CS 251, Winter 2019, Assignment % of course mark CS 251, Winter 2019, Assignment 5.1.1 3% of course mark Due Wednesday, March 27th, 5:30PM Lates accepted until 1:00pm March 28th with a 15% penalty 1. (10 points) The code sequence below executes on a

More information

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1

CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 CS 537: Introduction to Operating Systems Fall 2015: Midterm Exam #1 This exam is closed book, closed notes. All cell phones must be turned off. No calculators may be used. You have two hours to complete

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

Cache Architectures Design of Digital Circuits 217 Srdjan Capkun Onur Mutlu http://www.syssec.ethz.ch/education/digitaltechnik_17 Adapted from Digital Design and Computer Architecture, David Money Harris

More information

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,

More information

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1

Virtual Memory. Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. November 15, MIT Fall 2018 L20-1 Virtual Memory Daniel Sanchez Computer Science & Artificial Intelligence Lab M.I.T. L20-1 Reminder: Operating Systems Goals of OS: Protection and privacy: Processes cannot access each other s data Abstraction:

More information

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan

SE-292 High Performance Computing. Memory Hierarchy. R. Govindarajan SE-292 High Performance Computing Memory Hierarchy R. Govindarajan govind@serc Reality Check Question 1: Are real caches built to work on virtual addresses or physical addresses? Question 2: What about

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture. VLIW, Vector, and Multithreaded Machines CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture VLIW, Vector, and Multithreaded Machines Assigned 3/24/2019 Problem Set #4 Due 4/5/2019 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour

CSE 141 Computer Architecture Spring Lectures 17 Virtual Memory. Announcements Office Hour CSE 4 Computer Architecture Spring 25 Lectures 7 Virtual Memory Pramod V. Argade May 25, 25 Announcements Office Hour Monday, June 6th: 6:3-8 PM, AP&M 528 Instead of regular Monday office hour 5-6 PM Reading

More information

ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM

ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM ECE 463/521: Spring 2005 Project 1: Data-Cache System Design Due: Wednesday, February 23, 2005, 11:00 PM Project rules 1. All students are encouraged to work in teams of two, using pair programming. Pair

More information

CS161 Design and Architecture of Computer Systems. Cache $$$$$

CS161 Design and Architecture of Computer Systems. Cache $$$$$ CS161 Design and Architecture of Computer Systems Cache $$$$$ Memory Systems! How can we supply the CPU with enough data to keep it busy?! We will focus on memory issues,! which are frequently bottlenecks

More information

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25

CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 CS152 Computer Architecture and Engineering March 13, 2008 Out of Order Execution and Branch Prediction Assigned March 13 Problem Set #4 Due March 25 http://inst.eecs.berkeley.edu/~cs152/sp08 The problem

More information

CS3350B Computer Architecture

CS3350B Computer Architecture CS335B Computer Architecture Winter 25 Lecture 32: Exploiting Memory Hierarchy: How? Marc Moreno Maza wwwcsduwoca/courses/cs335b [Adapted from lectures on Computer Organization and Design, Patterson &

More information

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1>

Sarah L. Harris and David Money Harris. Digital Design and Computer Architecture: ARM Edition Chapter 8 <1> Chapter 8 Digital Design and Computer Architecture: ARM Edition Sarah L. Harris and David Money Harris Digital Design and Computer Architecture: ARM Edition 215 Chapter 8 Chapter 8 :: Topics Introduction

More information

Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R A case study in modern microarchitecture.

Module 5: MIPS R10000: A Case Study Lecture 9: MIPS R10000: A Case Study MIPS R A case study in modern microarchitecture. Module 5: "MIPS R10000: A Case Study" Lecture 9: "MIPS R10000: A Case Study" MIPS R10000 A case study in modern microarchitecture Overview Stage 1: Fetch Stage 2: Decode/Rename Branch prediction Branch

More information

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved.

LRU. Pseudo LRU A B C D E F G H A B C D E F G H H H C. Copyright 2012, Elsevier Inc. All rights reserved. LRU A list to keep track of the order of access to every block in the set. The least recently used block is replaced (if needed). How many bits we need for that? 27 Pseudo LRU A B C D E F G H A B C D E

More information

Final Exam Fall 2008

Final Exam Fall 2008 COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of

More information

Pipelined processors and Hazards

Pipelined processors and Hazards Pipelined processors and Hazards Two options Processor HLL Compiler ALU LU Output Program Control unit 1. Either the control unit can be smart, i,e. it can delay instruction phases to avoid hazards. Processor

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 24: Cache Performance Analysis Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Overview Last time: Associative caches How do we

More information

Memory Hierarchies &

Memory Hierarchies & Memory Hierarchies & Cache Memory CSE 410, Spring 2009 Computer Systems http://www.cs.washington.edu/410 4/26/2009 cse410-13-cache 2006-09 Perkins, DW Johnson and University of Washington 1 Reading and

More information

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3.

I, J A[I][J] / /4 8000/ I, J A(J, I) Chapter 5 Solutions S-3. 5 Solutions Chapter 5 Solutions S-3 5.1 5.1.1 4 5.1.2 I, J 5.1.3 A[I][J] 5.1.4 3596 8 800/4 2 8 8/4 8000/4 5.1.5 I, J 5.1.6 A(J, I) 5.2 5.2.1 Word Address Binary Address Tag Index Hit/Miss 5.2.2 3 0000

More information

ECE 411 Exam 1. This exam has 5 problems. Make sure you have a complete exam before you begin.

ECE 411 Exam 1. This exam has 5 problems. Make sure you have a complete exam before you begin. This exam has 5 problems. Make sure you have a complete exam before you begin. Write your name on every page in case pages become separated during grading. You will have three hours to complete this exam.

More information

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy

Memory. Principle of Locality. It is impossible to have memory that is both. We create an illusion for the programmer. Employ memory hierarchy Datorarkitektur och operativsystem Lecture 7 Memory It is impossible to have memory that is both Unlimited (large in capacity) And fast 5.1 Intr roduction We create an illusion for the programmer Before

More information

Tutorial 11. Final Exam Review

Tutorial 11. Final Exam Review Tutorial 11 Final Exam Review Introduction Instruction Set Architecture: contract between programmer and designers (e.g.: IA-32, IA-64, X86-64) Computer organization: describe the functional units, cache

More information

HY225 Lecture 12: DRAM and Virtual Memory

HY225 Lecture 12: DRAM and Virtual Memory HY225 Lecture 12: DRAM and irtual Memory Dimitrios S. Nikolopoulos University of Crete and FORTH-ICS May 16, 2011 Dimitrios S. Nikolopoulos Lecture 12: DRAM and irtual Memory 1 / 36 DRAM Fundamentals Random-access

More information

Pipelining Exercises, Continued

Pipelining Exercises, Continued Pipelining Exercises, Continued. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle

More information

ECE 411 Exam 1 Practice Problems

ECE 411 Exam 1 Practice Problems ECE 411 Exam 1 Practice Problems Topics Single-Cycle vs Multi-Cycle ISA Tradeoffs Performance Memory Hierarchy Caches (including interactions with VM) 1.) Suppose a single cycle design uses a clock period

More information

Question 1: (20 points) For this question, refer to the following pipeline architecture.

Question 1: (20 points) For this question, refer to the following pipeline architecture. This is the Mid Term exam given in Fall 2018. Note that Question 2(a) was a homework problem this term (was not a homework problem in Fall 2018). Also, Questions 6, 7 and half of 5 are from Chapter 5,

More information

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches

CS 61C: Great Ideas in Computer Architecture. Cache Performance, Set Associative Caches CS 61C: Great Ideas in Computer Architecture Cache Performance, Set Associative Caches Instructor: Justin Hsia 7/09/2012 Summer 2012 Lecture #12 1 Great Idea #3: Principle of Locality/ Memory Hierarchy

More information

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip

Reducing Hit Times. Critical Influence on cycle-time or CPI. small is always faster and can be put on chip Reducing Hit Times Critical Influence on cycle-time or CPI Keep L1 small and simple small is always faster and can be put on chip interesting compromise is to keep the tags on chip and the block data off

More information

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013

ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 ENGN 2910A Homework 03 (140 points) Due Date: Oct 3rd 2013 Professor: Sherief Reda School of Engineering, Brown University 1. [from Debois et al. 30 points] Consider the non-pipelined implementation of

More information

Computer Architecture: Optional Homework Set

Computer Architecture: Optional Homework Set Computer Architecture: Optional Homework Set Black Board due date: Hard Copy due date: Monday April 27 th, at Midnight. Tuesday April 28 th, during Class. Exercise 1: (50 Points) Patterson and Hennessy

More information

Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm

Winter 2009 FINAL EXAMINATION Location: Engineering A Block, Room 201 Saturday, April 25 noon to 3:00pm University of Calgary Department of Electrical and Computer Engineering ENCM 369: Computer Organization Lecture Instructors: S. A. Norman (L01), N. R. Bartley (L02) Winter 2009 FINAL EXAMINATION Location:

More information

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs

Chapter 5 (Part II) Large and Fast: Exploiting Memory Hierarchy. Baback Izadi Division of Engineering Programs Chapter 5 (Part II) Baback Izadi Division of Engineering Programs bai@engr.newpaltz.edu Virtual Machines Host computer emulates guest operating system and machine resources Improved isolation of multiple

More information

ECE331: Hardware Organization and Design

ECE331: Hardware Organization and Design ECE331: Hardware Organization and Design Lecture 27: Midterm2 review Adapted from Computer Organization and Design, Patterson & Hennessy, UCB Midterm 2 Review Midterm will cover Section 1.6: Processor

More information

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy

CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring Caches and the Memory Hierarchy CS152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Spring 2019 Caches and the Memory Hierarchy Assigned February 13 Problem Set #2 Due Wed, February 27 http://inst.eecs.berkeley.edu/~cs152/sp19

More information

EN1640: Design of Computing Systems Topic 06: Memory System

EN1640: Design of Computing Systems Topic 06: Memory System EN164: Design of Computing Systems Topic 6: Memory System Professor Sherief Reda http://scale.engin.brown.edu Electrical Sciences and Computer Engineering School of Engineering Brown University Spring

More information

CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100

CSE Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100 CSE 30321 Computer Architecture I Fall 2011 Homework 07 Memory Hierarchies Assigned: November 8, 2011, Due: November 22, 2011, Total Points: 100 Problem 1: (30 points) Background: One possible organization

More information

ELE 375 Final Exam Fall, 2000 Prof. Martonosi

ELE 375 Final Exam Fall, 2000 Prof. Martonosi ELE 375 Final Exam Fall, 2000 Prof. Martonosi Question Score 1 /10 2 /20 3 /15 4 /15 5 /10 6 /20 7 /20 8 /25 9 /30 10 /30 11 /30 12 /15 13 /10 Total / 250 Please write your answers clearly in the space

More information

6.004 Tutorial Problems L20 Virtual Memory

6.004 Tutorial Problems L20 Virtual Memory 6.004 Tutorial Problems L20 Virtual Memory Page Table (v + p) bits in virtual address (m + p) bits in physical address 2 v number of virtual pages 2 m number of physical pages 2 p bytes per physical page

More information

Page 1. Memory Hierarchies (Part 2)

Page 1. Memory Hierarchies (Part 2) Memory Hierarchies (Part ) Outline of Lectures on Memory Systems Memory Hierarchies Cache Memory 3 Virtual Memory 4 The future Increasing distance from the processor in access time Review: The Memory Hierarchy

More information

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011

The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 1. Performance Principles [5 pts] The University of Michigan - Department of EECS EECS 370 Introduction to Computer Architecture Midterm Exam 2 solutions April 5, 2011 For each of the following comparisons,

More information

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches

Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Highly-Associative Caches Improving Cache Performance and Memory Management: From Absolute Addresses to Demand Paging 6.823, L8--1 Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Highly-Associative

More information

Faculty of Science FINAL EXAMINATION

Faculty of Science FINAL EXAMINATION Faculty of Science FINAL EXAMINATION COMPUTER SCIENCE COMP 273 INTRODUCTION TO COMPUTER SYSTEMS Examiner: Prof. Michael Langer April 18, 2012 Associate Examiner: Mr. Joseph Vybihal 2 P.M. 5 P.M. STUDENT

More information

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015 University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz (30 minutes) January 2, 205 Student ID number: Student Last Name: Exercise. [ 20 marks] To capture the fact

More information

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM

Computer Architecture Computer Science & Engineering. Chapter 5. Memory Hierachy BK TP.HCM Computer Architecture Computer Science & Engineering Chapter 5 Memory Hierachy Memory Technology Static RAM (SRAM) 0.5ns 2.5ns, $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic

More information

CS/CoE 1541 Mid Term Exam (Fall 2018).

CS/CoE 1541 Mid Term Exam (Fall 2018). CS/CoE 1541 Mid Term Exam (Fall 2018). Name: Question 1: (6+3+3+4+4=20 points) For this question, refer to the following pipeline architecture. a) Consider the execution of the following code (5 instructions)

More information

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false.

CS 2410 Mid term (fall 2015) Indicate which of the following statements is true and which is false. CS 2410 Mid term (fall 2015) Name: Question 1 (10 points) Indicate which of the following statements is true and which is false. (1) SMT architectures reduces the thread context switch time by saving in

More information

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24

CS24: INTRODUCTION TO COMPUTING SYSTEMS. Spring 2018 Lecture 24 CS24: INTRODUCTION TO COMPUTING SYSTEMS Spring 2018 Lecture 24 LAST TIME Extended virtual memory concept to be a cache of memory stored on disk DRAM becomes L4 cache of data stored on L5 disk Extend page

More information

Cache Performance (H&P 5.3; 5.5; 5.6)

Cache Performance (H&P 5.3; 5.5; 5.6) Cache Performance (H&P 5.3; 5.5; 5.6) Memory system and processor performance: CPU time = IC x CPI x Clock time CPU performance eqn. CPI = CPI ld/st x IC ld/st IC + CPI others x IC others IC CPI ld/st

More information

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance

Cache Performance and Memory Management: From Absolute Addresses to Demand Paging. Cache Performance 6.823, L11--1 Cache Performance and Memory Management: From Absolute Addresses to Demand Paging Asanovic Laboratory for Computer Science M.I.T. http://www.csg.lcs.mit.edu/6.823 Cache Performance 6.823,

More information

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995

Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses. Professor Randy H. Katz Computer Science 252 Fall 1995 Lecture 16: Memory Hierarchy Misses, 3 Cs and 7 Ways to Reduce Misses Professor Randy H. Katz Computer Science 252 Fall 1995 Review: Who Cares About the Memory Hierarchy? Processor Only Thus Far in Course:

More information

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L15-Cache Structure 1

Cache Structure. Replacement policies Overhead Implementation Handling writes Cache simulations. Comp 411. L15-Cache Structure 1 Cache Structure Replacement policies Overhead Implementation Handling writes Cache simulations L15-Cache Structure 1 Tag A CPU Data Mem[A] Basic Caching Algorithm ON REFERENCE TO Mem[X]: Look for X among

More information

COSC3330 Computer Architecture Lecture 20. Virtual Memory

COSC3330 Computer Architecture Lecture 20. Virtual Memory COSC3330 Computer Architecture Lecture 20. Virtual Memory Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston Virtual Memory Topics Reducing Cache Miss Penalty (#2) Use

More information

Computer System Architecture Midterm Examination Spring 2002

Computer System Architecture Midterm Examination Spring 2002 Computer System Architecture 6.823 Midterm Examination Spring 2002 Name: This is an open book, open notes exam. 110 Minutes 1 Pages Notes: Not all questions are of equal difficulty, so look over the entire

More information

Inside out of your computer memories (III) Hung-Wei Tseng

Inside out of your computer memories (III) Hung-Wei Tseng Inside out of your computer memories (III) Hung-Wei Tseng Why memory hierarchy? CPU main memory lw $t2, 0($a0) add $t3, $t2, $a1 addi $a0, $a0, 4 subi $a1, $a1, 1 bne $a1, LOOP lw $t2, 0($a0) add $t3,

More information

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN

CHAPTER 4 MEMORY HIERARCHIES TYPICAL MEMORY HIERARCHY TYPICAL MEMORY HIERARCHY: THE PYRAMID CACHE PERFORMANCE MEMORY HIERARCHIES CACHE DESIGN CHAPTER 4 TYPICAL MEMORY HIERARCHY MEMORY HIERARCHIES MEMORY HIERARCHIES CACHE DESIGN TECHNIQUES TO IMPROVE CACHE PERFORMANCE VIRTUAL MEMORY SUPPORT PRINCIPLE OF LOCALITY: A PROGRAM ACCESSES A RELATIVELY

More information

EITF20: Computer Architecture Part 5.1.1: Virtual Memory

EITF20: Computer Architecture Part 5.1.1: Virtual Memory EITF20: Computer Architecture Part 5.1.1: Virtual Memory Liang Liu liang.liu@eit.lth.se 1 Outline Reiteration Cache optimization Virtual memory Case study AMD Opteron Summary 2 Memory hierarchy 3 Cache

More information

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day)

ADMIN. SI232 Set #18: Caching Finale and Virtual Reality (Chapter 7) Down the home stretch. Split Caches. Final Exam Monday May 1 (first exam day) ADMIN SI232 Set #8: Caching Finale and Virtual Reality (Chapter 7) Ethics Discussion & Reading Quiz Wed April 2 Reading posted online Reading finish Chapter 7 Sections 7.4 (skip 53-536), 7.5, 7.7, 7.8

More information

ECE550 PRACTICE Final

ECE550 PRACTICE Final ECE550 PRACTICE Final This is a full length practice midterm exam. If you want to take it at exam pace, give yourself 175 minutes to take the entire test. Just like the real exam, each question has a point

More information

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14 98:23 Intro to Computer Organization Lecture 4 Virtual Memory 98:23 Introduction to Computer Organization Lecture 4 Instructor: Nicole Hynes nicole.hynes@rutgers.edu Credits: Several slides courtesy of

More information

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000

The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE Test II November 14, 2000 The University of Alabama in Huntsville Electrical & Computer Engineering Department CPE 513 01 Test II November 14, 2000 Name: 1. (5 points) For an eight-stage pipeline, how many cycles does it take to

More information

CS252 Graduate Computer Architecture Midterm 1 Solutions

CS252 Graduate Computer Architecture Midterm 1 Solutions CS252 Graduate Computer Architecture Midterm 1 Solutions Part A: Branch Prediction (22 Points) Consider a fetch pipeline based on the UltraSparc-III processor (as seen in Lecture 5). In this part, we evaluate

More information

CSE 351. Virtual Memory

CSE 351. Virtual Memory CSE 351 Virtual Memory Virtual Memory Very powerful layer of indirection on top of physical memory addressing We never actually use physical addresses when writing programs Every address, pointer, etc

More information

Write only as much as necessary. Be brief!

Write only as much as necessary. Be brief! 1 CIS371 Computer Organization and Design Final Exam Prof. Martin Wednesday, May 2nd, 2012 This exam is an individual-work exam. Write your answers on these pages. Additional pages may be attached (with

More information

6.004 Tutorial Problems L14 Cache Implementation

6.004 Tutorial Problems L14 Cache Implementation 6.004 Tutorial Problems L14 Cache Implementation Cache Miss Types Compulsory Miss: Starting with an empty cache, a cache line is first referenced (invalid) Capacity Miss: The cache is not big enough to

More information

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015

University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz 1 (30 minutes) January 21, 2015 University of Western Ontario, Computer Science Department CS3350B, Computer Architecture Quiz (30 minutes) January 2, 205 Student ID number: Student Last Name: Exercise. In the following list of performance

More information