CSE-160 (Winter 2017, Kesden) Practice Midterm Exam. volatile int count = 0; // volatile just keeps count in mem vs register

Size: px
Start display at page:

Download "CSE-160 (Winter 2017, Kesden) Practice Midterm Exam. volatile int count = 0; // volatile just keeps count in mem vs register"

Transcription

1 Full PID: CSE-160 (Winter 2017, Kesden) Practice Midterm Exam 1. Threads, Concurrency Consider the code below: volatile int count = 0; // volatile just keeps count in mem vs register void *count(void *arg) { for (int i=0; i<100000; i++) count++; int main () { pthread_t tid1, tid2; pthread_create(&tid1, NULL, count, NULL); pthread_create(&tid2, NULL, count, NULL); pthread_join (tid1, NULL); pthread_join (tid2, NULL); printf ( i: %d\n, i); The above code is incorrect. It has a data race. A. What could be the symptom(s)? B. What is the critical resource? C. Protect the critical section by adding a simple mutex, including declaration, initialization, etc. D. Protect the critical section by adding a lock_guard, including declaration, initialization, etc.

2 2. Synchronization Primitives A. Under what circumstances should one use a barrier instead of a mutex? B. Please write a short code segment that illustrates your answer to part (A) above. C. Under what circumstances should one use a condition variable instead of just a mutex? D. Please write a short code segment that illustrates your answer to part (C) above. E. Why must the condition variable s wait() operation accept a mutex? What does it protect? F. Why must the condition variable s signal() operation accept a mutex? What purpose does it serve?

3 3. Parallel Speedup A. Assuming that a program s code is 75% parallelizable and 25% necessarily-serial, what is the maximum speedup that can be achieved by adding threads on a 4 core system? Show your work. B. Assuming a program is well designed and well written and has the following running times, what percentage of it is parallelizable? Show your work. 1-thread/1-core: 16 seconds 2-threads/2-cores: 12 seconds 4-thread/4-cores: 10 seconds C. What is the maximum speedup that can be achieved in a program for which 25% of the code is parallelizable? Show your work. D. If parallelizing an algorithm results in a super-linear speedup, what does this suggest? E. Consider Gustafson s observation. How can increasing the amount of data allow us to defeat Amdahl s Law?

4 4. Working Sets and Locality A. For each type of cache miss, please define it and explain how it can be mitigated, if possible. a. Cold/Compulsory b. Conflict c. Capacity B. Write a simple for-loop that exhibits good special locality, but not good temporal locality C. Does the following for-loop exhibit good special locality, temporal locality, neither, or both? Why? // ints a and b are declared and initialized elsewhere // int[16] array is declared and inialized elsewhere for (int index=0; index < 100; index++) array[index] += (a + b) D. Assuming that the values shown below are ints, and that an int is 4 bytes, what is the size of the working set for the loop above? Explain.

5 5. Memory Hierarchy Assume the following memory access times: Registers: L1 Cache: L2 Cache: Main Mem: 1 cycle, 0.5ns 4 cycles, 2ns 8 cycles, 4ns 160 cycles, 80ns Consider a system where 1 in 50 variable accesses require fetching from memory into registers, a 95% hit rate at L1 and a 99% hit rate at L2, and in which memory cache accesses are not performed in parallel. A. What is the effective memory access time of this system? (Just set up the equation, no need to evaluate. It can be in cycles or seconds) 6. OpenMP A. You are reading code parallelized with OpenMP #pragmas. Please explain the relationship between the scope in which a variable is declared upon whether or not it is shared. Include both loop and non-loop cases. B. Under what circumstances is it safe to remove a nowait on the first of two back-to-back loops? C. Why might a nowait be inappropriate for the last of two loops? D. Consider the clause, schedule(dynamic, 2) operating upon a loop with 4 threads and 16 iterations. Which threads will perform each iteration? E. Consider the clause, schedule(static, 1) operating upon a loop with 4 threads and 16 iterations. What are the potential advantages and disadvantages of this configuration as compared to the one described in (D) above?

6 7. Caching #1 (Credit: CMU) Consider the following matrix transpose function: typedef int array[2][2]; void transpose(array dst, arraysrc) { int i, j; for (j = 0; j < 2; j++) { for (i = 0; i < 2;i++) { dst[i][j] = src[j][i]; Running on a hypothetical machine with the following properties: sizeof(int) == 4. The src array starts at address 0 and the dst array starts at address 16 (decimal). There is a single L1 data cache that is direct mapped and write-allocate, with a block size of 8 bytes. Accesses to the src and dst arrays are the only sources of read and write accesses to the cache, respectively. Suppose the cache has a total size of 16 data bytes (i.e., the block size times the number of sets is 16 bytes) and that the cache is initially empty. A. How many bits are used for each of the Index: Offset: B. For each row and col, indicate whether each access to src[row][col] and dst[row][col] is a hit (h) or a miss (m). For example, reading src[0][0] is a miss and writing dst[0][0] is also a miss. src array dst array col 0 col 1 col 0 col 1 row 0 m row 0 m row 1 row 1 C. Repeat part A for a cache with a total size of 32 data bytes. src array dst array col 0 col 1 col 0 col 1 row 0 m row 0 m row 1 row 1

7 8. Caching #2 (Credit: 15 CMU) Consider a computer with an 8-bit address space and a direct-mapped 64-byte data cache with byte cache blocks. A. The boxes below represent the bit-format of an address. In each box, indicate which field that bit represents (it is possible that a field does not exist) by labeling them as follows: B: Block Offset S: Set Index T: Cache Tag B. The table below shows a trace of load addresses accessed in the data cache. Assume the cache is initially empty. For each row in the table, please complete the two rightmost columns, indicating (i) the set number (in decimal notation) for that particular load, and (ii) whether that loads hits (H) or misses (M) in the cache (circle either H or M accordingly). Load No. Hex Address Binary Address Set Number? (in Decimal) Hit or Miss? (Circle one) H M 2 b H M H M 4 f H M 5 b H M H M 7 d H M 8 b H M H M H M

8 8. Caching #2, cont. C. For the trace of load addresses shown in Part B, below is a list of possible final states for the cache, showing the hex value of the tag for each cache block in each set. Assume that initially all cache blocks are invalid (represented by X). (a) (b) (c) (d) (e) (f) (g) 0 X 2 1 X X X 1 X 2 X 0 X 3 X 1 X 2 3 X 0 X X 1 X X 0 X 1 X X X 2 1 X X 4 1 X 2 X 1 X 2 3 Which of the choices above is the correct final state of the cache? 9. Snooping Caches A. Consider a configuration with a per-core L1 cache and a shared L2 cache, configured such that the L1 caches are write-allocate snooping caches. Should the L1 caches be write-back, or write-through? Explain. B. Consider the caching configuration described in part (A) above. What are the relative advantages and disadvantages of configuring the L1 caches a s a write-update cache vs a write-invalidate cache? 10. False Sharing A. Consider the code segments below running on two threads, tid=0 and tid=1. Which is most likely to result in false sharing? Explain. A. for (int index=tid; index<array_length; index += 2) array[index] = 2 * array[index]; B. for (int index=tid*array_length/2; index<array_length/(2-tid); index++) array[index] = 2 * array[index];

CS , Fall 2001 Exam 2

CS , Fall 2001 Exam 2 Andrew login ID: Full Name: CS 15-213, Fall 2001 Exam 2 November 13, 2001 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front. Write

More information

CS , Fall 2001 Exam 2

CS , Fall 2001 Exam 2 Andrew login ID: Full Name: CS 15-213, Fall 2001 Exam 2 November 13, 2001 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front. Write

More information

Name: PID: CSE 160 Final Exam SAMPLE Winter 2017 (Kesden)

Name: PID:   CSE 160 Final Exam SAMPLE Winter 2017 (Kesden) Name: PID: Email: CSE 160 Final Exam SAMPLE Winter 2017 (Kesden) Cache Performance (Questions from 15-213 @ CMU. Thanks!) 1. This problem requires you to analyze the cache behavior of a function that sums

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

5.12 EXERCISES Exercises 263

5.12 EXERCISES Exercises 263 5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from

More information

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4 CS/CoE 1541 Final exam (Fall 2017). Name: This is the cumulative final exam given in the Fall of 2017. Question 1 (12 points): was on Chapter 4 Question 2 (13 points): was on Chapter 4 For Exam 2, you

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

CPSC 261 Midterm 2 Thursday March 17 th, 2016

CPSC 261 Midterm 2 Thursday March 17 th, 2016 CPSC 261 Midterm 2 Thursday March 17 th, 2016 [9] 1. Multiple choices [5] (a) Among the following terms, circle all of those that refer to a responsibility of a thread scheduler: Solution : Avoiding deadlocks

More information

CS Operating system

CS Operating system Name / ID (please PRINT) Seq#: Seat Number CS 3733.001 -- Operating system Spring 2017 -- Midterm II -- April 13, 2017 You have 75 min. Good Luck! This is a closed book/note examination. But You can use

More information

15-213/18-213, Fall 2012 Final Exam

15-213/18-213, Fall 2012 Final Exam Andrew ID (print clearly!): Full Name: 15-213/18-213, Fall 2012 Final Exam Monday, December 10, 2012 Instructions: Make sure that your exam is not missing any sheets, then write your Andrew ID and full

More information

Recitation 14: Proxy Lab Part 2

Recitation 14: Proxy Lab Part 2 Recitation 14: Proxy Lab Part 2 Instructor: TA(s) 1 Outline Proxylab Threading Threads and Synchronization 2 ProxyLab ProxyLab is due in 1 week. No grace days Late days allowed (-15%) Make sure to submit

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

COMP 524 Spring 2018 Midterm Thursday, March 1

COMP 524 Spring 2018 Midterm Thursday, March 1 Name PID COMP 524 Spring 2018 Midterm Thursday, March 1 This exam is open note, open book and open computer. It is not open people. You are to submit this exam through gradescope. Resubmissions have been

More information

CSE 306/506 Operating Systems Threads. YoungMin Kwon

CSE 306/506 Operating Systems Threads. YoungMin Kwon CSE 306/506 Operating Systems Threads YoungMin Kwon Processes and Threads Two characteristics of a process Resource ownership Virtual address space (program, data, stack, PCB ) Main memory, I/O devices,

More information

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading

More information

Shared Memory Programming Model

Shared Memory Programming Model Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache

More information

Parallel Programming using OpenMP

Parallel Programming using OpenMP 1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard

More information

Lecture 10 Midterm review

Lecture 10 Midterm review Lecture 10 Midterm review Announcements The midterm is on Tue Feb 9 th in class 4Bring photo ID 4You may bring a single sheet of notebook sized paper 8x10 inches with notes on both sides (A4 OK) 4You may

More information

This exam paper contains 8 questions (12 pages) Total 100 points. Please put your official name and NOT your assumed name. First Name: Last Name:

This exam paper contains 8 questions (12 pages) Total 100 points. Please put your official name and NOT your assumed name. First Name: Last Name: CSci 4061: Introduction to Operating Systems (Spring 2013) Final Exam May 14, 2013 (4:00 6:00 pm) Open Book and Lecture Notes (Bring Your U Photo Id to the Exam) This exam paper contains 8 questions (12

More information

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100

Q1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100 ECE 2035(A) Programming for Hardware/Software Systems Fall 2013 Exam Three November 20 th 2013 Name: Q1: /8 Q2: /30 Q3: /30 Q4: /32 Total: /100 1/10 For functional call related questions, let s assume

More information

UW CSE 351, Summer 2013 Final Exam

UW CSE 351, Summer 2013 Final Exam Name Instructions: UW CSE 351, Summer 2013 Final Exam 9:40am - 10:40am, Friday, 23 August 2013 Make sure that your exam is not missing any of the 11 pages, then write your full name and UW student ID on

More information

CPSC 3300 Spring 2016 Final Exam Version A No Calculators

CPSC 3300 Spring 2016 Final Exam Version A No Calculators CPSC 3300 Spring 2016 Final Exam Version A No Calculators Name: 1. Find the execution time of a program that executes 8 billion instructions on a processor with an average CPI of 2 and a clock frequency

More information

Recitation 14: Proxy Lab Part 2

Recitation 14: Proxy Lab Part 2 Recitation 14: Proxy Lab Part 2 Instructor: TA(s) 1 Outline Proxylab Threading Threads and Synchronization PXYDRIVE Demo 2 ProxyLab Checkpoint is worth 1%, due Thursday, Nov. 29 th Final is worth 7%, due

More information

Memory hierarchies: caches and their impact on the running time

Memory hierarchies: caches and their impact on the running time Memory hierarchies: caches and their impact on the running time Irene Finocchi Dept. of Computer and Science Sapienza University of Rome A happy coincidence A fundamental property of hardware Different

More information

CS/CoE 1541 Exam 2 (Spring 2019).

CS/CoE 1541 Exam 2 (Spring 2019). CS/CoE 1541 Exam 2 (Spring 2019) Name: Question 1 (5+5+5=15 points): Show the content of each of the caches shown below after the two memory references 35, 44 Use the notation [tag, M(address),] to describe

More information

CSC 1600: Chapter 6. Synchronizing Threads. Semaphores " Review: Multi-Threaded Processes"

CSC 1600: Chapter 6. Synchronizing Threads. Semaphores  Review: Multi-Threaded Processes CSC 1600: Chapter 6 Synchronizing Threads with Semaphores " Review: Multi-Threaded Processes" 1 badcnt.c: An Incorrect Program" #define NITERS 1000000 unsigned int cnt = 0; /* shared */ int main() pthread_t

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel

More information

Martin Kruliš, v

Martin Kruliš, v Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal

More information

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017

CS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017 CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.

More information

Cache and Virtual Memory Simulations

Cache and Virtual Memory Simulations Cache and Virtual Memory Simulations Does it really matter if you pull a USB out before it safely ejects? Data structure: Cache struct Cache { }; Set *sets; int set_count; int line_count; int block_size;

More information

Final Exam Fall 2008

Final Exam Fall 2008 COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of

More information

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...

A common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads... OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.

More information

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first

Caches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first Cache Memory memory hierarchy CPU memory request presented to first-level cache first if data NOT in cache, request sent to next level in hierarchy and so on CS3021/3421 2017 jones@tcd.ie School of Computer

More information

Com S 321 Problem Set 3

Com S 321 Problem Set 3 Com S 321 Problem Set 3 1. A computer has a main memory of size 8M words and a cache size of 64K words. (a) Give the address format for a direct mapped cache with a block size of 32 words. (b) Give the

More information

Parallel and Distributed Computing

Parallel and Distributed Computing Concurrent Programming with OpenMP Rodrigo Miragaia Rodrigues MSc in Information Systems and Computer Engineering DEA in Computational Engineering CS Department (DEI) Instituto Superior Técnico October

More information

Review: Creating a Parallel Program. Programming for Performance

Review: Creating a Parallel Program. Programming for Performance Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed

More information

UW CSE 351, Winter 2013 Final Exam

UW CSE 351, Winter 2013 Final Exam Full Name: Student ID #: UW CSE 351, Winter 2013 Final Exam March 20, 2013 2:30pm - 4:20pm Instructions: Write your full name and UW student ID number on the front of the exam. When the exam begins, make

More information

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.

CSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics. Question 1. (10 points) (Caches) Suppose we have a memory and a direct-mapped cache with the following characteristics. Memory is byte addressable Memory addresses are 16 bits (i.e., the total memory size

More information

CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009.

CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009. CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions The Midterm Exam was given in class on Wednesday, March 18, 2009. 1. [4 pts] Parameter Passing in C++. In the table below, the

More information

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program

Module 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives

More information

ECE 341 Final Exam Solution

ECE 341 Final Exam Solution ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.

More information

Recitation 15: Final Exam Preparation

Recitation 15: Final Exam Preparation 15-213 Recitation 15: Final Exam Preparation 25 April 2016 Ralf Brown and the 15-213 staff 1 Agenda Reminders Final Exam Review Fall 2012 exam 2 Reminders Proxy lab is due tomorrow! NO GRACE DAYS Penalty

More information

Computer Architecture CS372 Exam 3

Computer Architecture CS372 Exam 3 Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card

More information

CSE 240A Midterm Exam

CSE 240A Midterm Exam Student ID Page 1 of 7 2011 Fall Professor Steven Swanson CSE 240A Midterm Exam Please write your name at the top of each page This is a close book, closed notes exam. No outside material may be used.

More information

Caches III. CSE 351 Autumn Instructor: Justin Hsia

Caches III. CSE 351 Autumn Instructor: Justin Hsia Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan https://what if.xkcd.com/111/

More information

CS516 Programming Languages and Compilers II

CS516 Programming Languages and Compilers II CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Mar 12 Parallelism and Shared Memory Hierarchy I Rutgers University Review: Classical Three-pass Compiler Front End IR Middle End IR

More information

ECE 454 Computer Systems Programming

ECE 454 Computer Systems Programming ECE 454 Computer Systems Programming The Edward S. Rogers Sr. Department of Electrical and Computer Engineering Final Examination Fall 2011 Name Student # Professor Greg Steffan Answer all questions. Write

More information

15-213, Fall 2007 Midterm Exam

15-213, Fall 2007 Midterm Exam Andrew login ID: Full Name: 15-213, Fall 2007 Midterm Exam October 17, 2007, 1:00pm-2:20pm Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID

More information

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:... ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure

More information

6.24 Estimate the average time (in ms) to access a sector on the following disk:

6.24 Estimate the average time (in ms) to access a sector on the following disk: Homework Problems 631 There is a large body of literature on building and using disk storage Many storage researchers look for ways to aggregate individual disks into larger, more robust, and more secure

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

CS 450 Exam 2 Mon. 4/11/2016

CS 450 Exam 2 Mon. 4/11/2016 CS 450 Exam 2 Mon. 4/11/2016 Name: Rules and Hints You may use one handwritten 8.5 11 cheat sheet (front and back). This is the only additional resource you may consult during this exam. No calculators.

More information

Introduction to Threads

Introduction to Threads Computer Systems Introduction to Threads Race Conditions Single- vs. Multi-Threaded Processes Process Process Thread Thread Thread Thread Memory Memory Heap Stack Heap Stack Stack Stack Data Data Code

More information

Threading Language and Support. CS528 Multithreading: Programming with Threads. Programming with Threads

Threading Language and Support. CS528 Multithreading: Programming with Threads. Programming with Threads Threading Language and Support CS528 Multithreading: Programming with Threads A Sahu Dept of CSE, IIT Guwahati Pthread: POSIX thread Popular, Initial and Basic one Improved Constructs for threading c++

More information

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011

CS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011 CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday

More information

ECE Spring 2017 Exam 2

ECE Spring 2017 Exam 2 ECE 56300 Spring 2017 Exam 2 All questions are worth 5 points. For isoefficiency questions, do not worry about breaking costs down to t c, t w and t s. Question 1. Innovative Big Machines has developed

More information

Caches III CSE 351 Spring

Caches III CSE 351 Spring Caches III CSE 351 Spring 2018 https://what-if.xkcd.com/111/ Making memory accesses fast! Cache basics Principle of locality Memory hierarchies Cache organization Direct-mapped (sets; index + tag) Associativity

More information

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence

Exam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2014 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable

More information

ENCM 501 Winter 2019 Assignment 9

ENCM 501 Winter 2019 Assignment 9 page 1 of 6 ENCM 501 Winter 2019 Assignment 9 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2019 Assignment instructions and other documents for ENCM 501 can

More information

Cache Impact on Program Performance. T. Yang. UCSB CS240A. 2017

Cache Impact on Program Performance. T. Yang. UCSB CS240A. 2017 Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer systems Topics Performance analysis for multi-level cache Cache performance optimization through program transformation

More information

Multithreading Programming II

Multithreading Programming II Multithreading Programming II Content Review Multithreading programming Race conditions Semaphores Thread safety Deadlock Review: Resource Sharing Access to shared resources need to be controlled to ensure

More information

ECE 411 Exam 1 Practice Problems

ECE 411 Exam 1 Practice Problems ECE 411 Exam 1 Practice Problems Topics Single-Cycle vs Multi-Cycle ISA Tradeoffs Performance Memory Hierarchy Caches (including interactions with VM) 1.) Suppose a single cycle design uses a clock period

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

ME759 High Performance Computing for Engineering Applications

ME759 High Performance Computing for Engineering Applications ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs

More information

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double

CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double *)malloc(sizeof(double)*n*n); B = (double *)malloc(sizeof(double)*n*n);

More information

ECE 2300 Digital Logic & Computer Organization. More Caches

ECE 2300 Digital Logic & Computer Organization. More Caches ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further

More information

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems

ECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems ECE 356: Architecture, Concurrency, and Energy of Computation Sample Problem Set: Memory Systems TLB 1. Consider a processor system with 256 kbytes of memory, 64 Kbyte pages, and a 1 Mbyte virtual address

More information

Go Multicore Series:

Go Multicore Series: Go Multicore Series: Understanding Memory in a Multicore World, Part 2: Software Tools for Improving Cache Perf Joe Hummel, PhD http://www.joehummel.net/freescale.html FTF 2014: FTF-SDS-F0099 TM External

More information

Review: Computer Organization

Review: Computer Organization Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor

More information

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS PAUL L. BAILEY Abstract. This documents amalgamates various descriptions found on the internet, mostly from Oracle or Wikipedia. Very little of this

More information

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013 Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the

More information

Programming Languages

Programming Languages TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK Programming Languages Concurrency: Atomic Executions, Locks and Monitors Dr. Michael Petter Winter term 2016 Atomic Executions, Locks and Monitors

More information

ECE3055B Fall 2004 Computer Architecture and Operating Systems Final Exam Solution Dec 10, 2004

ECE3055B Fall 2004 Computer Architecture and Operating Systems Final Exam Solution Dec 10, 2004 Georgia Tech Page of 4 ECE3055B Fall 24 Computer Architecture and Operatg Systems Fal Exam Solution Dec 0, 24. (5%) General Q&A. Give concise and brief answer to each of the followg questions... (2%) What

More information

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.

ECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours. This exam is open book and open notes. You have 2 hours. Problems 1-5 refer to the following: We wish to add a new R-Format instruction to the MIPS Instruction Set Architecture called l_inc (load and increment).

More information

Lec 26: Parallel Processing. Announcements

Lec 26: Parallel Processing. Announcements Lec 26: Parallel Processing Kavita Bala CS 341, Fall 28 Computer Science Cornell University Announcements Pizza party Tuesday Dec 2, 6:3-9: Location: TBA Final project (parallel ray tracer) out next week

More information

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

CMPSC 311- Introduction to Systems Programming Module: Concurrency

CMPSC 311- Introduction to Systems Programming Module: Concurrency CMPSC 311- Introduction to Systems Programming Module: Concurrency Professor Patrick McDaniel Fall 2013 Sequential Programming Processing a network connection as it arrives and fulfilling the exchange

More information

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming

MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance

More information

Administrivia. Caches III. Making memory accesses fast! Associativity. Cache Organization (3) Example Placement

Administrivia. Caches III. Making memory accesses fast! Associativity. Cache Organization (3) Example Placement s III CSE Autumn Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam ehman Sam Wolfson Savanna Yee Vinny Palaniappan Administrivia Midterm regrade requests

More information

CPSC/ECE 3220 Summer 2017 Exam 2

CPSC/ECE 3220 Summer 2017 Exam 2 CPSC/ECE 3220 Summer 2017 Exam 2 Name: Part 1: Word Bank Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are

More information

Chapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei

Chapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei Chapter 2: Memory Hierarchy Design, part 1 - Introducation Advanced Computer Architecture Mehran Rezaei Temporal Locality The principle of temporal locality in program references says that if you access

More information

Concurrent Programming with OpenMP

Concurrent Programming with OpenMP Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed

More information

Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m.

Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m. Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, 2008 3 p.m. - 5 p.m. Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of

More information

Multi-threaded processors. Hung-Wei Tseng x Dean Tullsen

Multi-threaded processors. Hung-Wei Tseng x Dean Tullsen Multi-threaded processors Hung-Wei Tseng x Dean Tullsen OoO SuperScalar Processor Fetch instructions in the instruction window Register renaming to eliminate false dependencies edule an instruction to

More information

Synchronization. Event Synchronization

Synchronization. Event Synchronization Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization

More information

CSE 160 Lecture 9. Load balancing and Scheduling Some finer points of synchronization NUMA

CSE 160 Lecture 9. Load balancing and Scheduling Some finer points of synchronization NUMA CSE 160 Lecture 9 Load balancing and Scheduling Some finer points of synchronization NUMA Announcements The Midterm: Tuesday Nov 5 th in this room Covers everything in course through Thu. 11/1 Closed book;

More information

CMPSC 311- Introduction to Systems Programming Module: Concurrency

CMPSC 311- Introduction to Systems Programming Module: Concurrency CMPSC 311- Introduction to Systems Programming Module: Concurrency Professor Patrick McDaniel Fall 2016 Sequential Programming Processing a network connection as it arrives and fulfilling the exchange

More information

Final Exam Fall 2007

Final Exam Fall 2007 ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd

More information

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012 CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night

More information

Formal Verification Techniques for GPU Kernels Lecture 1

Formal Verification Techniques for GPU Kernels Lecture 1 École de Recherche: Semantics and Tools for Low-Level Concurrent Programming ENS Lyon Formal Verification Techniques for GPU Kernels Lecture 1 Alastair Donaldson Imperial College London www.doc.ic.ac.uk/~afd

More information

Spring CS 170 Exercise Set 1 (Updated with Part III)

Spring CS 170 Exercise Set 1 (Updated with Part III) Spring 2015. CS 170 Exercise Set 1 (Updated with Part III) Due on May 5 Tuesday 12:30pm. Submit to the CS170 homework box or bring to the classroom. Additional problems will be added as we cover more topics

More information

Introduction to Computer Systems. Final Exam. May 3, Notes and calculators are permitted, but not computers. Caching. Signals.

Introduction to Computer Systems. Final Exam. May 3, Notes and calculators are permitted, but not computers. Caching. Signals. 15-213 Introduction to Computer Systems Final Exam May 3, 2006 Name: Andrew User ID: Recitation Section: This is an open-book exam. Notes and calculators are permitted, but not computers. Write your answers

More information

Threads. studykorner.org

Threads. studykorner.org Threads Thread Subpart of a process Basic unit of CPU utilization Smallest set of programmed instructions, can be managed independently by OS No independent existence (process dependent) Light Weight Process

More information

Caches III. CSE 351 Winter Instructor: Mark Wyse

Caches III. CSE 351 Winter Instructor: Mark Wyse Caches III CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan https://what-if.xkcd.com/111/ Administrative Midterm

More information

CS4411 Intro. to Operating Systems Final Fall points 12 pages

CS4411 Intro. to Operating Systems Final Fall points 12 pages CS44 Intro. to Operating Systems Final Exam Fall 5 CS44 Intro. to Operating Systems Final Fall 5 points pages Name: Most of the following questions only require very short answers. Usually a few sentences

More information

CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)

CSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points) CSE 30321 Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: November 17, 2009 Due: December 1, 2009 This assignment can be done in groups of 1, 2,

More information