CSE-160 (Winter 2017, Kesden) Practice Midterm Exam. volatile int count = 0; // volatile just keeps count in mem vs register
|
|
- Hubert Barnett
- 5 years ago
- Views:
Transcription
1 Full PID: CSE-160 (Winter 2017, Kesden) Practice Midterm Exam 1. Threads, Concurrency Consider the code below: volatile int count = 0; // volatile just keeps count in mem vs register void *count(void *arg) { for (int i=0; i<100000; i++) count++; int main () { pthread_t tid1, tid2; pthread_create(&tid1, NULL, count, NULL); pthread_create(&tid2, NULL, count, NULL); pthread_join (tid1, NULL); pthread_join (tid2, NULL); printf ( i: %d\n, i); The above code is incorrect. It has a data race. A. What could be the symptom(s)? B. What is the critical resource? C. Protect the critical section by adding a simple mutex, including declaration, initialization, etc. D. Protect the critical section by adding a lock_guard, including declaration, initialization, etc.
2 2. Synchronization Primitives A. Under what circumstances should one use a barrier instead of a mutex? B. Please write a short code segment that illustrates your answer to part (A) above. C. Under what circumstances should one use a condition variable instead of just a mutex? D. Please write a short code segment that illustrates your answer to part (C) above. E. Why must the condition variable s wait() operation accept a mutex? What does it protect? F. Why must the condition variable s signal() operation accept a mutex? What purpose does it serve?
3 3. Parallel Speedup A. Assuming that a program s code is 75% parallelizable and 25% necessarily-serial, what is the maximum speedup that can be achieved by adding threads on a 4 core system? Show your work. B. Assuming a program is well designed and well written and has the following running times, what percentage of it is parallelizable? Show your work. 1-thread/1-core: 16 seconds 2-threads/2-cores: 12 seconds 4-thread/4-cores: 10 seconds C. What is the maximum speedup that can be achieved in a program for which 25% of the code is parallelizable? Show your work. D. If parallelizing an algorithm results in a super-linear speedup, what does this suggest? E. Consider Gustafson s observation. How can increasing the amount of data allow us to defeat Amdahl s Law?
4 4. Working Sets and Locality A. For each type of cache miss, please define it and explain how it can be mitigated, if possible. a. Cold/Compulsory b. Conflict c. Capacity B. Write a simple for-loop that exhibits good special locality, but not good temporal locality C. Does the following for-loop exhibit good special locality, temporal locality, neither, or both? Why? // ints a and b are declared and initialized elsewhere // int[16] array is declared and inialized elsewhere for (int index=0; index < 100; index++) array[index] += (a + b) D. Assuming that the values shown below are ints, and that an int is 4 bytes, what is the size of the working set for the loop above? Explain.
5 5. Memory Hierarchy Assume the following memory access times: Registers: L1 Cache: L2 Cache: Main Mem: 1 cycle, 0.5ns 4 cycles, 2ns 8 cycles, 4ns 160 cycles, 80ns Consider a system where 1 in 50 variable accesses require fetching from memory into registers, a 95% hit rate at L1 and a 99% hit rate at L2, and in which memory cache accesses are not performed in parallel. A. What is the effective memory access time of this system? (Just set up the equation, no need to evaluate. It can be in cycles or seconds) 6. OpenMP A. You are reading code parallelized with OpenMP #pragmas. Please explain the relationship between the scope in which a variable is declared upon whether or not it is shared. Include both loop and non-loop cases. B. Under what circumstances is it safe to remove a nowait on the first of two back-to-back loops? C. Why might a nowait be inappropriate for the last of two loops? D. Consider the clause, schedule(dynamic, 2) operating upon a loop with 4 threads and 16 iterations. Which threads will perform each iteration? E. Consider the clause, schedule(static, 1) operating upon a loop with 4 threads and 16 iterations. What are the potential advantages and disadvantages of this configuration as compared to the one described in (D) above?
6 7. Caching #1 (Credit: CMU) Consider the following matrix transpose function: typedef int array[2][2]; void transpose(array dst, arraysrc) { int i, j; for (j = 0; j < 2; j++) { for (i = 0; i < 2;i++) { dst[i][j] = src[j][i]; Running on a hypothetical machine with the following properties: sizeof(int) == 4. The src array starts at address 0 and the dst array starts at address 16 (decimal). There is a single L1 data cache that is direct mapped and write-allocate, with a block size of 8 bytes. Accesses to the src and dst arrays are the only sources of read and write accesses to the cache, respectively. Suppose the cache has a total size of 16 data bytes (i.e., the block size times the number of sets is 16 bytes) and that the cache is initially empty. A. How many bits are used for each of the Index: Offset: B. For each row and col, indicate whether each access to src[row][col] and dst[row][col] is a hit (h) or a miss (m). For example, reading src[0][0] is a miss and writing dst[0][0] is also a miss. src array dst array col 0 col 1 col 0 col 1 row 0 m row 0 m row 1 row 1 C. Repeat part A for a cache with a total size of 32 data bytes. src array dst array col 0 col 1 col 0 col 1 row 0 m row 0 m row 1 row 1
7 8. Caching #2 (Credit: 15 CMU) Consider a computer with an 8-bit address space and a direct-mapped 64-byte data cache with byte cache blocks. A. The boxes below represent the bit-format of an address. In each box, indicate which field that bit represents (it is possible that a field does not exist) by labeling them as follows: B: Block Offset S: Set Index T: Cache Tag B. The table below shows a trace of load addresses accessed in the data cache. Assume the cache is initially empty. For each row in the table, please complete the two rightmost columns, indicating (i) the set number (in decimal notation) for that particular load, and (ii) whether that loads hits (H) or misses (M) in the cache (circle either H or M accordingly). Load No. Hex Address Binary Address Set Number? (in Decimal) Hit or Miss? (Circle one) H M 2 b H M H M 4 f H M 5 b H M H M 7 d H M 8 b H M H M H M
8 8. Caching #2, cont. C. For the trace of load addresses shown in Part B, below is a list of possible final states for the cache, showing the hex value of the tag for each cache block in each set. Assume that initially all cache blocks are invalid (represented by X). (a) (b) (c) (d) (e) (f) (g) 0 X 2 1 X X X 1 X 2 X 0 X 3 X 1 X 2 3 X 0 X X 1 X X 0 X 1 X X X 2 1 X X 4 1 X 2 X 1 X 2 3 Which of the choices above is the correct final state of the cache? 9. Snooping Caches A. Consider a configuration with a per-core L1 cache and a shared L2 cache, configured such that the L1 caches are write-allocate snooping caches. Should the L1 caches be write-back, or write-through? Explain. B. Consider the caching configuration described in part (A) above. What are the relative advantages and disadvantages of configuring the L1 caches a s a write-update cache vs a write-invalidate cache? 10. False Sharing A. Consider the code segments below running on two threads, tid=0 and tid=1. Which is most likely to result in false sharing? Explain. A. for (int index=tid; index<array_length; index += 2) array[index] = 2 * array[index]; B. for (int index=tid*array_length/2; index<array_length/(2-tid); index++) array[index] = 2 * array[index];
CS , Fall 2001 Exam 2
Andrew login ID: Full Name: CS 15-213, Fall 2001 Exam 2 November 13, 2001 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front. Write
More informationCS , Fall 2001 Exam 2
Andrew login ID: Full Name: CS 15-213, Fall 2001 Exam 2 November 13, 2001 Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID on the front. Write
More informationName: PID: CSE 160 Final Exam SAMPLE Winter 2017 (Kesden)
Name: PID: Email: CSE 160 Final Exam SAMPLE Winter 2017 (Kesden) Cache Performance (Questions from 15-213 @ CMU. Thanks!) 1. This problem requires you to analyze the cache behavior of a function that sums
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More information5.12 EXERCISES Exercises 263
5.12 Exercises 263 5.12 EXERCISES 5.1. If it s defined, the OPENMP macro is a decimal int. Write a program that prints its value. What is the significance of the value? 5.2. Download omp trap 1.c from
More informationCS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4
CS/CoE 1541 Final exam (Fall 2017). Name: This is the cumulative final exam given in the Fall of 2017. Question 1 (12 points): was on Chapter 4 Question 2 (13 points): was on Chapter 4 For Exam 2, you
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationCPSC 261 Midterm 2 Thursday March 17 th, 2016
CPSC 261 Midterm 2 Thursday March 17 th, 2016 [9] 1. Multiple choices [5] (a) Among the following terms, circle all of those that refer to a responsibility of a thread scheduler: Solution : Avoiding deadlocks
More informationCS Operating system
Name / ID (please PRINT) Seq#: Seat Number CS 3733.001 -- Operating system Spring 2017 -- Midterm II -- April 13, 2017 You have 75 min. Good Luck! This is a closed book/note examination. But You can use
More information15-213/18-213, Fall 2012 Final Exam
Andrew ID (print clearly!): Full Name: 15-213/18-213, Fall 2012 Final Exam Monday, December 10, 2012 Instructions: Make sure that your exam is not missing any sheets, then write your Andrew ID and full
More informationRecitation 14: Proxy Lab Part 2
Recitation 14: Proxy Lab Part 2 Instructor: TA(s) 1 Outline Proxylab Threading Threads and Synchronization 2 ProxyLab ProxyLab is due in 1 week. No grace days Late days allowed (-15%) Make sure to submit
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationCOMP 524 Spring 2018 Midterm Thursday, March 1
Name PID COMP 524 Spring 2018 Midterm Thursday, March 1 This exam is open note, open book and open computer. It is not open people. You are to submit this exam through gradescope. Resubmissions have been
More informationCSE 306/506 Operating Systems Threads. YoungMin Kwon
CSE 306/506 Operating Systems Threads YoungMin Kwon Processes and Threads Two characteristics of a process Resource ownership Virtual address space (program, data, stack, PCB ) Main memory, I/O devices,
More informationParallel Programming. OpenMP Parallel programming for multiprocessors for loops
Parallel Programming OpenMP Parallel programming for multiprocessors for loops OpenMP OpenMP An application programming interface (API) for parallel programming on multiprocessors Assumes shared memory
More informationParallel Programming using OpenMP
1 OpenMP Multithreaded Programming 2 Parallel Programming using OpenMP OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard to perform shared-memory multithreading
More informationShared Memory Programming Model
Shared Memory Programming Model Ahmed El-Mahdy and Waleed Lotfy What is a shared memory system? Activity! Consider the board as a shared memory Consider a sheet of paper in front of you as a local cache
More informationParallel Programming using OpenMP
1 Parallel Programming using OpenMP Mike Bailey mjb@cs.oregonstate.edu openmp.pptx OpenMP Multithreaded Programming 2 OpenMP stands for Open Multi-Processing OpenMP is a multi-vendor (see next page) standard
More informationLecture 10 Midterm review
Lecture 10 Midterm review Announcements The midterm is on Tue Feb 9 th in class 4Bring photo ID 4You may bring a single sheet of notebook sized paper 8x10 inches with notes on both sides (A4 OK) 4You may
More informationThis exam paper contains 8 questions (12 pages) Total 100 points. Please put your official name and NOT your assumed name. First Name: Last Name:
CSci 4061: Introduction to Operating Systems (Spring 2013) Final Exam May 14, 2013 (4:00 6:00 pm) Open Book and Lecture Notes (Bring Your U Photo Id to the Exam) This exam paper contains 8 questions (12
More informationQ1: /8 Q2: /30 Q3: /30 Q4: /32. Total: /100
ECE 2035(A) Programming for Hardware/Software Systems Fall 2013 Exam Three November 20 th 2013 Name: Q1: /8 Q2: /30 Q3: /30 Q4: /32 Total: /100 1/10 For functional call related questions, let s assume
More informationUW CSE 351, Summer 2013 Final Exam
Name Instructions: UW CSE 351, Summer 2013 Final Exam 9:40am - 10:40am, Friday, 23 August 2013 Make sure that your exam is not missing any of the 11 pages, then write your full name and UW student ID on
More informationCPSC 3300 Spring 2016 Final Exam Version A No Calculators
CPSC 3300 Spring 2016 Final Exam Version A No Calculators Name: 1. Find the execution time of a program that executes 8 billion instructions on a processor with an average CPI of 2 and a clock frequency
More informationRecitation 14: Proxy Lab Part 2
Recitation 14: Proxy Lab Part 2 Instructor: TA(s) 1 Outline Proxylab Threading Threads and Synchronization PXYDRIVE Demo 2 ProxyLab Checkpoint is worth 1%, due Thursday, Nov. 29 th Final is worth 7%, due
More informationMemory hierarchies: caches and their impact on the running time
Memory hierarchies: caches and their impact on the running time Irene Finocchi Dept. of Computer and Science Sapienza University of Rome A happy coincidence A fundamental property of hardware Different
More informationCS/CoE 1541 Exam 2 (Spring 2019).
CS/CoE 1541 Exam 2 (Spring 2019) Name: Question 1 (5+5+5=15 points): Show the content of each of the caches shown below after the two memory references 35, 44 Use the notation [tag, M(address),] to describe
More informationCSC 1600: Chapter 6. Synchronizing Threads. Semaphores " Review: Multi-Threaded Processes"
CSC 1600: Chapter 6 Synchronizing Threads with Semaphores " Review: Multi-Threaded Processes" 1 badcnt.c: An Incorrect Program" #define NITERS 1000000 unsigned int cnt = 0; /* shared */ int main() pthread_t
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationCS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism
CS 61C: Great Ideas in Computer Architecture Amdahl s Law, Thread Level Parallelism Instructor: Alan Christopher 07/17/2014 Summer 2014 -- Lecture #15 1 Review of Last Lecture Flynn Taxonomy of Parallel
More informationMartin Kruliš, v
Martin Kruliš 1 Optimizations in General Code And Compilation Memory Considerations Parallelism Profiling And Optimization Examples 2 Premature optimization is the root of all evil. -- D. Knuth Our goal
More informationCS 433 Homework 5. Assigned on 11/7/2017 Due in class on 11/30/2017
CS 433 Homework 5 Assigned on 11/7/2017 Due in class on 11/30/2017 Instructions: 1. Please write your name and NetID clearly on the first page. 2. Refer to the course fact sheet for policies on collaboration.
More informationCache and Virtual Memory Simulations
Cache and Virtual Memory Simulations Does it really matter if you pull a USB out before it safely ejects? Data structure: Cache struct Cache { }; Set *sets; int set_count; int line_count; int block_size;
More informationFinal Exam Fall 2008
COE 308 Computer Architecture Final Exam Fall 2008 page 1 of 8 Saturday, February 7, 2009 7:30 10:00 AM Computer Engineering Department College of Computer Sciences & Engineering King Fahd University of
More informationA common scenario... Most of us have probably been here. Where did my performance go? It disappeared into overheads...
OPENMP PERFORMANCE 2 A common scenario... So I wrote my OpenMP program, and I checked it gave the right answers, so I ran some timing tests, and the speedup was, well, a bit disappointing really. Now what?.
More informationCaches. Cache Memory. memory hierarchy. CPU memory request presented to first-level cache first
Cache Memory memory hierarchy CPU memory request presented to first-level cache first if data NOT in cache, request sent to next level in hierarchy and so on CS3021/3421 2017 jones@tcd.ie School of Computer
More informationCom S 321 Problem Set 3
Com S 321 Problem Set 3 1. A computer has a main memory of size 8M words and a cache size of 64K words. (a) Give the address format for a direct mapped cache with a block size of 32 words. (b) Give the
More informationParallel and Distributed Computing
Concurrent Programming with OpenMP Rodrigo Miragaia Rodrigues MSc in Information Systems and Computer Engineering DEA in Computational Engineering CS Department (DEI) Instituto Superior Técnico October
More informationReview: Creating a Parallel Program. Programming for Performance
Review: Creating a Parallel Program Can be done by programmer, compiler, run-time system or OS Steps for creating parallel program Decomposition Assignment of tasks to processes Orchestration Mapping (C)
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico March 7, 2016 CPD (DEI / IST) Parallel and Distributed
More informationUW CSE 351, Winter 2013 Final Exam
Full Name: Student ID #: UW CSE 351, Winter 2013 Final Exam March 20, 2013 2:30pm - 4:20pm Instructions: Write your full name and UW student ID number on the front of the exam. When the exam begins, make
More informationCSE 410 Final Exam 6/09/09. Suppose we have a memory and a direct-mapped cache with the following characteristics.
Question 1. (10 points) (Caches) Suppose we have a memory and a direct-mapped cache with the following characteristics. Memory is byte addressable Memory addresses are 16 bits (i.e., the total memory size
More informationCS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions. The Midterm Exam was given in class on Wednesday, March 18, 2009.
CS 311 Data Structures and Algorithms, Spring 2009 Midterm Exam Solutions The Midterm Exam was given in class on Wednesday, March 18, 2009. 1. [4 pts] Parameter Passing in C++. In the table below, the
More informationModule 10: Open Multi-Processing Lecture 19: What is Parallelization? The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program
The Lecture Contains: What is Parallelization? Perfectly Load-Balanced Program Amdahl's Law About Data What is Data Race? Overview to OpenMP Components of OpenMP OpenMP Programming Model OpenMP Directives
More informationECE 341 Final Exam Solution
ECE 341 Final Exam Solution Time allowed: 110 minutes Total Points: 100 Points Scored: Name: Problem No. 1 (10 points) For each of the following statements, indicate whether the statement is TRUE or FALSE.
More informationRecitation 15: Final Exam Preparation
15-213 Recitation 15: Final Exam Preparation 25 April 2016 Ralf Brown and the 15-213 staff 1 Agenda Reminders Final Exam Review Fall 2012 exam 2 Reminders Proxy lab is due tomorrow! NO GRACE DAYS Penalty
More informationComputer Architecture CS372 Exam 3
Name: Computer Architecture CS372 Exam 3 This exam has 7 pages. Please make sure you have all of them. Write your name on this page and initials on every other page now. You may only use the green card
More informationCSE 240A Midterm Exam
Student ID Page 1 of 7 2011 Fall Professor Steven Swanson CSE 240A Midterm Exam Please write your name at the top of each page This is a close book, closed notes exam. No outside material may be used.
More informationCaches III. CSE 351 Autumn Instructor: Justin Hsia
Caches III CSE 351 Autumn 2017 Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam Gehman Sam Wolfson Savanna Yee Vinny Palaniappan https://what if.xkcd.com/111/
More informationCS516 Programming Languages and Compilers II
CS516 Programming Languages and Compilers II Zheng Zhang Spring 2015 Mar 12 Parallelism and Shared Memory Hierarchy I Rutgers University Review: Classical Three-pass Compiler Front End IR Middle End IR
More informationECE 454 Computer Systems Programming
ECE 454 Computer Systems Programming The Edward S. Rogers Sr. Department of Electrical and Computer Engineering Final Examination Fall 2011 Name Student # Professor Greg Steffan Answer all questions. Write
More information15-213, Fall 2007 Midterm Exam
Andrew login ID: Full Name: 15-213, Fall 2007 Midterm Exam October 17, 2007, 1:00pm-2:20pm Instructions: Make sure that your exam is not missing any sheets, then write your full name and Andrew login ID
More informationITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...
ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 016 Solutions Name:... Answer questions in space provided below questions. Use additional paper if necessary but make sure
More information6.24 Estimate the average time (in ms) to access a sector on the following disk:
Homework Problems 631 There is a large body of literature on building and using disk storage Many storage researchers look for ways to aggregate individual disks into larger, more robust, and more secure
More informationParallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops
Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading
More informationCS 450 Exam 2 Mon. 4/11/2016
CS 450 Exam 2 Mon. 4/11/2016 Name: Rules and Hints You may use one handwritten 8.5 11 cheat sheet (front and back). This is the only additional resource you may consult during this exam. No calculators.
More informationIntroduction to Threads
Computer Systems Introduction to Threads Race Conditions Single- vs. Multi-Threaded Processes Process Process Thread Thread Thread Thread Memory Memory Heap Stack Heap Stack Stack Stack Data Data Code
More informationThreading Language and Support. CS528 Multithreading: Programming with Threads. Programming with Threads
Threading Language and Support CS528 Multithreading: Programming with Threads A Sahu Dept of CSE, IIT Guwahati Pthread: POSIX thread Popular, Initial and Basic one Improved Constructs for threading c++
More informationCS4961 Parallel Programming. Lecture 12: Advanced Synchronization (Pthreads) 10/4/11. Administrative. Mary Hall October 4, 2011
CS4961 Parallel Programming Lecture 12: Advanced Synchronization (Pthreads) Mary Hall October 4, 2011 Administrative Thursday s class Meet in WEB L130 to go over programming assignment Midterm on Thursday
More informationECE Spring 2017 Exam 2
ECE 56300 Spring 2017 Exam 2 All questions are worth 5 points. For isoefficiency questions, do not worry about breaking costs down to t c, t w and t s. Question 1. Innovative Big Machines has developed
More informationCaches III CSE 351 Spring
Caches III CSE 351 Spring 2018 https://what-if.xkcd.com/111/ Making memory accesses fast! Cache basics Principle of locality Memory hierarchies Cache organization Direct-mapped (sets; index + tag) Associativity
More informationExam-2 Scope. 3. Shared memory architecture, distributed memory architecture, SMP, Distributed Shared Memory and Directory based coherence
Exam-2 Scope 1. Memory Hierarchy Design (Cache, Virtual memory) Chapter-2 slides memory-basics.ppt Optimizations of Cache Performance Memory technology and optimizations Virtual memory 2. SIMD, MIMD, Vector,
More informationTHE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing
THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2014 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable
More informationENCM 501 Winter 2019 Assignment 9
page 1 of 6 ENCM 501 Winter 2019 Assignment 9 Steve Norman Department of Electrical & Computer Engineering University of Calgary April 2019 Assignment instructions and other documents for ENCM 501 can
More informationCache Impact on Program Performance. T. Yang. UCSB CS240A. 2017
Cache Impact on Program Performance T. Yang. UCSB CS240A. 2017 Multi-level cache in computer systems Topics Performance analysis for multi-level cache Cache performance optimization through program transformation
More informationMultithreading Programming II
Multithreading Programming II Content Review Multithreading programming Race conditions Semaphores Thread safety Deadlock Review: Resource Sharing Access to shared resources need to be controlled to ensure
More informationECE 411 Exam 1 Practice Problems
ECE 411 Exam 1 Practice Problems Topics Single-Cycle vs Multi-Cycle ISA Tradeoffs Performance Memory Hierarchy Caches (including interactions with VM) 1.) Suppose a single cycle design uses a clock period
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationME759 High Performance Computing for Engineering Applications
ME759 High Performance Computing for Engineering Applications Parallel Computing on Multicore CPUs October 25, 2013 Dan Negrut, 2013 ME964 UW-Madison A programming language is low level when its programs
More informationCSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double
CSE 141 Spring 2016 Homework 5 PID: Name: 1. Consider the following matrix transpose code int i, j,k; double *A, *B, *C; A = (double *)malloc(sizeof(double)*n*n); B = (double *)malloc(sizeof(double)*n*n);
More informationECE 2300 Digital Logic & Computer Organization. More Caches
ECE 23 Digital Logic & Computer Organization Spring 218 More Caches 1 Announcements Prelim 2 stats High: 79.5 (out of 8), Mean: 65.9, Median: 68 Prelab 5(C) deadline extended to Saturday 3pm No further
More informationECE 3056: Architecture, Concurrency, and Energy of Computation. Sample Problem Set: Memory Systems
ECE 356: Architecture, Concurrency, and Energy of Computation Sample Problem Set: Memory Systems TLB 1. Consider a processor system with 256 kbytes of memory, 64 Kbyte pages, and a 1 Mbyte virtual address
More informationGo Multicore Series:
Go Multicore Series: Understanding Memory in a Multicore World, Part 2: Software Tools for Improving Cache Perf Joe Hummel, PhD http://www.joehummel.net/freescale.html FTF 2014: FTF-SDS-F0099 TM External
More informationReview: Computer Organization
Review: Computer Organization Cache Chansu Yu Caches: The Basic Idea A smaller set of storage locations storing a subset of information from a larger set. Typically, SRAM for DRAM main memory: Processor
More informationAP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS
AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS PAUL L. BAILEY Abstract. This documents amalgamates various descriptions found on the internet, mostly from Oracle or Wikipedia. Very little of this
More informationLecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013
Lecture 13: Memory Consistency + a Course-So-Far Review Parallel Computer Architecture and Programming Today: what you should know Understand the motivation for relaxed consistency models Understand the
More informationProgramming Languages
TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÄT FÜR INFORMATIK Programming Languages Concurrency: Atomic Executions, Locks and Monitors Dr. Michael Petter Winter term 2016 Atomic Executions, Locks and Monitors
More informationECE3055B Fall 2004 Computer Architecture and Operating Systems Final Exam Solution Dec 10, 2004
Georgia Tech Page of 4 ECE3055B Fall 24 Computer Architecture and Operatg Systems Fal Exam Solution Dec 0, 24. (5%) General Q&A. Give concise and brief answer to each of the followg questions... (2%) What
More informationECE 313 Computer Organization FINAL EXAM December 14, This exam is open book and open notes. You have 2 hours.
This exam is open book and open notes. You have 2 hours. Problems 1-5 refer to the following: We wish to add a new R-Format instruction to the MIPS Instruction Set Architecture called l_inc (load and increment).
More informationLec 26: Parallel Processing. Announcements
Lec 26: Parallel Processing Kavita Bala CS 341, Fall 28 Computer Science Cornell University Announcements Pizza party Tuesday Dec 2, 6:3-9: Location: TBA Final project (parallel ray tracer) out next week
More informationMULTIPROCESSORS AND THREAD-LEVEL. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationCMPSC 311- Introduction to Systems Programming Module: Concurrency
CMPSC 311- Introduction to Systems Programming Module: Concurrency Professor Patrick McDaniel Fall 2013 Sequential Programming Processing a network connection as it arrives and fulfilling the exchange
More informationMULTIPROCESSORS AND THREAD-LEVEL PARALLELISM. B649 Parallel Architectures and Programming
MULTIPROCESSORS AND THREAD-LEVEL PARALLELISM B649 Parallel Architectures and Programming Motivation behind Multiprocessors Limitations of ILP (as already discussed) Growing interest in servers and server-performance
More informationAdministrivia. Caches III. Making memory accesses fast! Associativity. Cache Organization (3) Example Placement
s III CSE Autumn Instructor: Justin Hsia Teaching Assistants: Lucas Wotton Michael Zhang Parker DeWilde Ryan Wong Sam ehman Sam Wolfson Savanna Yee Vinny Palaniappan Administrivia Midterm regrade requests
More informationCPSC/ECE 3220 Summer 2017 Exam 2
CPSC/ECE 3220 Summer 2017 Exam 2 Name: Part 1: Word Bank Write one of the words or terms from the following list into the blank appearing to the left of the appropriate definition. Note that there are
More informationChapter 2: Memory Hierarchy Design, part 1 - Introducation. Advanced Computer Architecture Mehran Rezaei
Chapter 2: Memory Hierarchy Design, part 1 - Introducation Advanced Computer Architecture Mehran Rezaei Temporal Locality The principle of temporal locality in program references says that if you access
More informationConcurrent Programming with OpenMP
Concurrent Programming with OpenMP Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico October 11, 2012 CPD (DEI / IST) Parallel and Distributed
More informationMidterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, p.m. - 5 p.m.
Midterm Sample Answer ECE 454F 2008: Computer Systems Programming Date: Tuesday, Oct 28, 2008 3 p.m. - 5 p.m. Instructor: Cristiana Amza Department of Electrical and Computer Engineering University of
More informationMulti-threaded processors. Hung-Wei Tseng x Dean Tullsen
Multi-threaded processors Hung-Wei Tseng x Dean Tullsen OoO SuperScalar Processor Fetch instructions in the instruction window Register renaming to eliminate false dependencies edule an instruction to
More informationSynchronization. Event Synchronization
Synchronization Synchronization: mechanisms by which a parallel program can coordinate the execution of multiple threads Implicit synchronizations Explicit synchronizations Main use of explicit synchronization
More informationCSE 160 Lecture 9. Load balancing and Scheduling Some finer points of synchronization NUMA
CSE 160 Lecture 9 Load balancing and Scheduling Some finer points of synchronization NUMA Announcements The Midterm: Tuesday Nov 5 th in this room Covers everything in course through Thu. 11/1 Closed book;
More informationCMPSC 311- Introduction to Systems Programming Module: Concurrency
CMPSC 311- Introduction to Systems Programming Module: Concurrency Professor Patrick McDaniel Fall 2016 Sequential Programming Processing a network connection as it arrives and fulfilling the exchange
More informationFinal Exam Fall 2007
ICS 233 - Computer Architecture & Assembly Language Final Exam Fall 2007 Wednesday, January 23, 2007 7:30 am 10:00 am Computer Engineering Department College of Computer Sciences & Engineering King Fahd
More informationCS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012
CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night
More informationFormal Verification Techniques for GPU Kernels Lecture 1
École de Recherche: Semantics and Tools for Low-Level Concurrent Programming ENS Lyon Formal Verification Techniques for GPU Kernels Lecture 1 Alastair Donaldson Imperial College London www.doc.ic.ac.uk/~afd
More informationSpring CS 170 Exercise Set 1 (Updated with Part III)
Spring 2015. CS 170 Exercise Set 1 (Updated with Part III) Due on May 5 Tuesday 12:30pm. Submit to the CS170 homework box or bring to the classroom. Additional problems will be added as we cover more topics
More informationIntroduction to Computer Systems. Final Exam. May 3, Notes and calculators are permitted, but not computers. Caching. Signals.
15-213 Introduction to Computer Systems Final Exam May 3, 2006 Name: Andrew User ID: Recitation Section: This is an open-book exam. Notes and calculators are permitted, but not computers. Write your answers
More informationThreads. studykorner.org
Threads Thread Subpart of a process Basic unit of CPU utilization Smallest set of programmed instructions, can be managed independently by OS No independent existence (process dependent) Light Weight Process
More informationCaches III. CSE 351 Winter Instructor: Mark Wyse
Caches III CSE 351 Winter 2018 Instructor: Mark Wyse Teaching Assistants: Kevin Bi Parker DeWilde Emily Furst Sarah House Waylon Huang Vinny Palaniappan https://what-if.xkcd.com/111/ Administrative Midterm
More informationCS4411 Intro. to Operating Systems Final Fall points 12 pages
CS44 Intro. to Operating Systems Final Exam Fall 5 CS44 Intro. to Operating Systems Final Fall 5 points pages Name: Most of the following questions only require very short answers. Usually a few sentences
More informationCSE Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: Due: Problem 1: (10 points)
CSE 30321 Computer Architecture I Fall 2009 Homework 08 Pipelined Processors and Multi-core Programming Assigned: November 17, 2009 Due: December 1, 2009 This assignment can be done in groups of 1, 2,
More information