Final Exam Introduction to Computer Systems. May 10, Name: Andrew User ID: Recitation Section: Model Solution fp

Similar documents
Introduction to Computer Systems. Exam 2. April 11, Notes and calculators are permitted, but not computers.

Introduction to Computer Systems. Exam 1. February 22, Model Solution fp

Introduction to Computer Systems. Exam 1. February 22, This is an open-book exam. Notes are permitted, but not computers.

Introduction to Computer Systems. Exam 2. April 10, Notes and calculators are permitted, but not computers.

Recitation 15: Final Exam Preparation

15-213/18-213, Fall 2011 Final Exam

15-213/18-213, Fall 2012 Final Exam

Systems programming and Operating systems, 2005 Tentamen

Final Exam. 11 May 2018, 120 minutes, 26 questions, 100 points

CS 3330 Exam 3 Fall 2017 Computing ID:

Introduction to Computer Systems. Final Exam. May 3, Notes and calculators are permitted, but not computers. Caching. Signals.

Systems programming and Operating systems, 2005 Test Exam

Question F5: Caching [10 pts]

CS , Fall 2007 Exam 1

CS , Fall 2001 Exam 2

Final Exam. 12 December 2018, 120 minutes, 26 questions, 100 points

Computer Systems A Programmer s Perspective 1 (Beta Draft)

Do not turn the page until 12:30.

Final Exam. Spring Semester 2017 KAIST EE209 Programming Structures for Electrical Engineering. Name: Student ID:

Do not turn the page until 5:10.

CS , Fall 2008 Final Exam

Final Exam. Spring Semester 2017 KAIST EE209 Programming Structures for Electrical Engineering. Name: Student ID:

CS , Fall 2001 Exam 2

3. Address Translation (25 pts)

CS , Fall 2008 Final Exam

University of Washington Computer Science & Engineering Winter 2018 Instructor: Mark Wyse March 14, CSE 351 Final Exam. Last Name: First Name:

15-213: Final Exam Review

University of Washington Computer Science & Engineering Winter 2018 Instructor: Mark Wyse March 14, CSE 351 Final Exam. Last Name: First Name:

void twiddle1(int *xp, int *yp) { void twiddle2(int *xp, int *yp) {

CS356: Discussion #15 Review for Final Exam. Marco Paolieri Illustrations from CS:APP3e textbook

CSE 351 Midterm - Winter 2015 Solutions

Virtual Memory Nov 9, 2009"

CSE351 Spring 2018, Midterm Exam April 27, 2018

Corrections made in this version not in first posting:

The cache is 4-way set associative, with 4-byte blocks, and 16 total lines

CSE351 Autumn 2013 Final Exam (11 Dec 2013)

Optimization part 1 1

15-213/18-243, Fall 2010 Exam 1 - Version A

Machine/Assembler Language Putting It All Together

Do not turn the page until 5:10.

Problem 9. VM address translation. (9 points): The following problem concerns the way virtual addresses are translated into physical addresses.

SYSTEMS PROGRAMMING AND COMPUTER ARCHITECTURE Assignment 5: Assembly and C

Synchronization: Basics

CS , Spring 2002 Exam 2

CSCI 237 Sample Final Exam

CS , Spring 2009 Exam 2

Changes made in this version not seen in first lecture:

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

Virtual Memory: Systems

Do not turn the page until 12:30.

198:231 Intro to Computer Organization. 198:231 Introduction to Computer Organization Lecture 14

CROWDMARK. Examination Midterm. Spring 2017 CS 350. Closed Book. Page 1 of 30. University of Waterloo CS350 Midterm Examination.

CSE 351 Midterm - Winter 2015

15-213/ Final Exam Notes Sheet Spring 2013!

CS 33. Architecture and Optimization (1) CS33 Intro to Computer Systems XV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CPSC 261 Midterm 1 Tuesday February 9th, 2016

CS , Fall 1998 Final Exam

Machine Programming 3: Procedures

Synchronization: Basics

ECE331 Homework 4. Due Monday, August 13, 2018 (via Moodle)

CS 4400 Fall 2017 Final Exam Practice

15-213/18-213, Fall 2011 Exam 1

x86 Programming I CSE 351 Winter

CS 3330 Exam 1 Fall 2017 Computing ID:

First Midterm Exam September 28, 2017 CS162 Operating Systems

Review Questions. 1 The DRAM problem [5 points] Suggest a solution. 2 Big versus Little Endian Addressing [5 points]

CS , Fall 2003 Exam 2

Name: CMSC 313 Fall 2001 Computer Organization & Assembly Language Programming Exam 1. Question Points I. /34 II. /30 III.

Intel x86-64 and Y86-64 Instruction Set Architecture

Do not turn the page until 5:10.

Assembly II: Control Flow. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Carnegie Mellon. Bryant and O Hallaron, Computer Systems: A Programmer s Perspective, Third Edition

15-213, Fall 2007 Midterm Exam

CS 3843 Final Exam Fall 2012

Representation of Information

CSE 351 Midterm Exam

Problem Page Possible Score Total 80

CS 33. Architecture and Optimization (2) CS33 Intro to Computer Systems XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 261 Fall Mike Lam, Professor. x86-64 Control Flow

CSE 351 Spring 2017 Final Exam (7 June 2017)

15-213/18-243, Spring 2011 Exam 1

CS 3330 Final Exam Spring 2016 Computing ID:

Assembly II: Control Flow

Sungkyunkwan University

Assembly I: Basic Operations. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

Machine Language CS 3330 Samira Khan

Sections 01 (11:30), 02 (16:00), 03 (8:30) Ashraf Aboulnaga & Borzoo Bonakdarpour

Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m.

There are 16 total numbered pages, 7 Questions. You have 2 hours. Budget your time carefully!

CSE 351 Midterm - Winter 2017

Question Points Score Total: 100

CSE351 Autumn 2012 Midterm Exam (5 Nov 2012)

CSE 351 Section 4 GDB and x86-64 Assembly Hi there! Welcome back to section, we re happy that you re here

CSE 153 Design of Operating Systems

Question Points Score Total: 100

Data Representa/ons: IA32 + x86-64

+ Machine Level Programming: x86-64 History

UW CSE 351, Summer 2013 Final Exam

UW CSE 351, Winter 2013 Final Exam

Assembly level Programming. 198:211 Computer Architecture. (recall) Von Neumann Architecture. Simplified hardware view. Lecture 10 Fall 2012

Transcription:

15-213 Introduction to Computer Systems Final Exam May 10, 2007 Name: Andrew User ID: Recitation Section: Model Solution fp This is an open-book exam. Notes and calculators are permitted, but not computers. Write your answer legibly in the space provided. You have 180 minutes for this exam. Problem Max Score Floating Point Assembly Language Optimization Cache Memory Signals Garbage Collection Threads Synchronization 1 20 20 2 20 20 3 20 20 4 20 20 5 20 20 6 20 20 7 20 20 8 20 20 Total 150+10 160 1

1. Floating Point (20 points) In this problem we consider properties of floating point operations. For each property state whether it is true or false. If false, give a counterexample as a (possibly negative) power of 2 within the range of precision for the variables. We assume that the variables on an x86 64 architecture are declared as follows float x,y,z; double d,e; and initialized to some unknown value different from NaN, +, and. We have given the first answer as an example. (x + y) + z == x + (y + z) false x = 1, y = 2 127, z = 2 127 If x > 0 then x / 2 > 0 false x = 2 149 (x + y) * z == x * z + y * z false x = 2 127, y = 2 127, z = 2 127 If x >= y and z <= 0 then x * z <= y * z If x > y then (double)x > (double)y true true If d > e then (float)d > (float)e false d = 2 129, e = 2 128 x + 1 > x false x = 2 127 2

2. Assembly Language (20 points) In this problem we consider an illustrative program for multiplication of two unsigned int s, returning an unsigned long int holding the product. unsigned long mult (unsigned i, unsigned k) { unsigned long p = 0; unsigned long q = k; while (i!= 0) { if (i & 1) p = p + q; q = q << 1; i = i >> 1; return p; The following is the resulting machine code when compiled on an x86 64 machine with gcc -O2, omitting two instructions. mult:.l10:.l8: xorl %ecx, %ecx mov %esi, %edx testl %edi, %edi jmp.l8 leaq (%rcx,%rdx), %rax testb $1, %dil addq %rdx, %rdx shrl %edi jne.l10 ret # missing conditional move # missing move 3

1. (5 pts) For each register, give the value it holds during the iteration, expressed in terms of the C program. Register C expression %rcx %rdx %rax %edi %dil p q p+q i (char)i 2. (5 pts) Fill in the missing two instructions in the code. cmovne %rax, %rcx and movq %rcx, %rax 3. (4 pts) Rewrite the loop to use a conditional jump instead of a conditional move. See one solution below; there are many others. testb $1, %dil jne.l9 movq %rax, %rcx.l9 addq %rdx, %rdx 4. (3 pts) Explain briefly why the compiler preferred to use a conditional move instruction. Because the branch misprediction penalty would make the loop slower, especially since the outcome of test will be difficult to accurately predict. 4

5. (3 pts) Assume we declared and initialized int i,k; long m; and called m = (long)mult((unsigned)i, (unsigned)k); using the above definition of mult. Will m hold the correct value of the signed product of i and k? Circle the correct answer. Briefly explain your answer. yes no no For example, when multiplying 1 times 1, the negative 1 will actually be interpreted as 2 32 1 and the result will also be 2 32 1 instead of 1. However, the answer will be correct modulo 2 32 because on two s-complement representations, signed and unsigned addition and multiplication are identical: they operate in the ring of integers modulo 2 w for the word size w (= 32, in this case). 5

3. Optimization (20 points) Consider the following code for calculating the dot product of two vectors of double precision floating point numbers. double dot_prod(double A[], double B[], int n) { int i; double r = 0; for (i = 0; i < n; i++) r = r + A[i] * B[i]; return r; Assume that multiplication has a latency of 12 cycles and addition a latency of 7 cycles and load 4 cycles. Also assume that there are an unlimited number of functional units. [Hint: Under this assumption, theoretically optimal performance is dominated by the critical data dependency path.] 1. (5 points) What is the theoretically optimal CPE for this loop? 7 CPE, since the addition constitutes the critical path. 2. (10 points) Show the code for the loop unrolled by 2. You may apply associativity and commutativity of multiplication and addition, assuming that rounding errors are insignificant. double dot_prod2(double A[], double B[], int n) { int i; double r = 0; for (i = 0; i < n-1; i+=2) r = r + (A[i] * B[i] + A[i+1] * B[i+1]); for (; i < n; i++) r = r + A[i] * B[i]; return r; 3. (5 points) What is the theoretically optimal CPE for this loop? 7/2 = 3.5 CPE, since the critical path is still addition, but now two elements will be added in each iteration. 6

4. Cache Memory (20 points) In this problem we explore the operation of a basic TLB as a cache. Assume the following Virtual addresses are 32 bits. The virtual page number (VPN) is 24 bits. The physical page number (PPN) is 32 bits. The TLB is 2-way set associative containing a total of 512 lines. 1. (6 points) Please fill in the following blanks by giving a bit range, such as 0 15. (a) The VPO of a virtual address consists of bits 0 7 of the VA. (b) The VPN of a virtual address consists of bits 8 31 of the VA. (c) The PPO of a physical address consists of bits 0 7 of the PA. (d) The PPN of a physical address consists of bits 8 39 of the PA. (e) The TLB index (TLBI) consists of bits 0 7 of the VPN. (f) The TLB tag (TLBT) consists of bits 8 23 of the VPN. We show a part of the TLB relevant to the next two questions. Index Valid? Tag Entry 3D 1 0x083F 0x0913ABDE 1 0x083E 0xAB18ED24 3E 0 0xF3E9 0x0913ABDE 1 0x083F 0xAB18ED24 3F 1 0x409A 0x0913ABDE 1 0x083F 0xAB18ED24 40 0 0x083E 0x0913ABDE 1 0x3E40 0xAB18ED24 7

2. (7 points) Assume the virtual address is 0x083F3E9A. Fill in the following table in hexadecimal notation. Write U for any value that is unknown, that is, not determined from the parameters and the table above. Parameter VPN VPO TLBI TLBT Cache Hit? (Y/N/U) PPN PA Value 0x083F3E 0x9A 0x3E 0x083F Y 0xAB18ED24 0xAB18ED249A 3. (7 points) Assume the virtual address is 0x083E409B. Fill in the following table in hexadecimal notation. Write U for any value that is unknown, that is, not determined from the parameters and the table above. Parameter VPN VPO TLBI TLBT Cache Hit? (Y/N/U) PPN PA Value 0x083E40 0x9B 0x40 0x083E N U U 8

5. Signals (20 points) Consider the following program. int counter = 0; void handler (int sig) { counter++; int main() { signal(sigusr1, handler); signal(sigusr2, handler); int parent = getpid(); int child = fork(); if (child == 0) { /* insert code here */ exit(0); sleep(1); waitpid(child, NULL, 0); printf("received %d USR{1,2 signals\n", counter); return 0; For each of the following four versions of the above code, list the possible outputs of this program, assuming that all function and system calls succeed and exit without error. You may also assume no externally issued signals are sent to either process. 1. (5 pts) 1,2: If the second SIGUSR1 is sent before the first one is received it will be dropped. 9

2. (5 pts) 1,2,3: The second and third SIGUSR1 may be sent before the first one is received. 3. (5 pts) kill(parent, SIGUSR2); 1,2: Because of a race condition when SIGUSR2 is received while SIGUSR1 is handled, one increment may be dropped. 4. (5 pts) kill(parent, SIGUSR2); kill(parent, SIGUSR2); 1,2,3,4: Two consecutive occurrences as in the answer to the previous question can lead to answers 1+1, 1+2, 2+1 or 2+2. And the race condition from the previous question can lead to the answer 1 if the first three signals are sent before any are received. 10

6. Garbage Collection (20 points) In this problem we consider a tiny list processing machine in which each memory word consists of two bytes: the first byte is a pointer to the tail of the list and the second byte is a data element. The end of a list is marked by a pointer of 0x00. We assume that the data element is never a pointer. We start with the memory state on the left, where the range 0x10 0x1F is the fromspace and the range 0x20 0x2F is the to-space. All addresses and values in the diagram are in hexadecimal. Write in the state of memory after a copying collector is called with root pointers 0x10 and 0x12, in this order. You may leave cells that remain unchanged blank. Please be sure to use the proper breadth-first traversal algorithm covered in lecture. Before GC Addr Ptr Data 10 14 A2 12 1A 1F 14 1E 02 16 1E 20 18 00 33 1A 18 BC 1C 12 DF 1E 10 8F After GC Addr Ptr Data Addr Ptr Data 10 20 20 24 A2 12 22 22 26 1F 14 24 24 28 02 16 26 2A BC 18 2A 28 20 8F 1A 26 2A 00 33 1C 2C 1E 28 2E After garbage collection, free space starts at address 2C 11

7. Threads (20 points) Consider three concurrently executing threads in the same process using two semaphores s1 and s2. Assume s1 has been initialized to 1, while s2 has been initialized to 0. What are the possible values of the global variable x, initialized to 0, after all three threads have terminated? /* thread A */ P(&s2); P(&s1); x = x*2; V(&s1); /* thread B */ P(&s1); x = x*x; V(&s1); /* thread C */ P(&s1); x = x+3; V(&s2); V(&s1); The possible sequences are B,C,A (x = 6) or C,A,B (x = 36) or C,B,A (x = 18). 12

8. Synchronization (20 points) We explore the so-called barbershop problem. A barbershop consists of a n waiting chairs and the barber chair. If there are no customers, the barber waits. If a customer enters, and all the waiting chairs are occupied, then the customer leaves the shop. If the barber is busy, but waiting chairs are available, then the customer sits in one of the free chairs. Here is the skeleton of the code, without synchronization. extern int N; /* initialized elsewhere to value > 0 */ int customers = 0; void* customer() { if (customers > N) { return NULL; customers += 1; gethaircut(); customers -= 1; return NULL; void* barber() { while(1) { cuthair(); 13

For the solution, we use three binary semaphores: mutex to control access to the global variable customers. customer to signal a customer is in the shop. barber to signal the barber is busy. 1. (5 points) Indicate the initial values for the three semaphores. mutex customer barber 2. (15 points) Complete the code above filling in as many copies of the following commands as you need, but no other code. P(&mutex); V(&mutex); P(&customer); V(&customer); P(&barber); V(&barber); 14

Solution: There are a number of solutions; below is one. Be careful to release the mutex before leaving. For this solution, initial values are mutex = 1 (variable customers may be accessed), customer = 0 (no customers) and barber = 0 (barber is not busy). void* customer() { P(&mutex); if (customers > N) { V(&mutex); return NULL; customers += 1; V(&mutex); V(&customer); P(&barber); gethaircut(); P(&mutex); customers -= 1; V(&mutex); return NULL; void* barber() { while(1) { P(&customer); V(&barber); cuthair(); 15