EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: 100
|
|
- Melina Ellis
- 5 years ago
- Views:
Transcription
1 EE/CSCI 451 Spring 2017 Homework 3 solution Total Points: [10 points] 1. Task parallelism: The computations in a parallel algorithm can be split into a set of tasks for concurrent execution. Task parallelism exploits the parallelism by distributing the execution of different tasks across different parallel processing elements. 2. Race condition: When the output of a parallel program for a given input is nondeterministic as it depends upon the rate at which the various threads are executing, the program has a race condition. 3. PRAM: PRAM is a shared memory programming model which consists of p (p > 1) processors connected to a shared memory executing in synchronous manner (using a common clock). Each computation and each access to memory take 1 unit of time. 4. Shared memory programming model: Shared memory programming model provides a globally shared data space that is accessible to all the threads. Threads can also have their own private data. Programmer is responsible for synchronizing access globally shared data to ensure correctness of the program. 5. Asynchronous execution: Asynchronous execution has no global clock to coordinate execution. The order of execution of instructions depends on input data, scheduling algorithm, speed of the processors, and speed of communication network. 1
2 2 [25 points] 1. For simplicity, let us assume the number of threads w is a power of 2 and denoted as 2 m, m k. We evenly divide the input vector p into w sub-vectors, each sub-vector with length 2 k m. These sub-vectors are denoted as p sub0,...,p subw 1. Similarly, we can obtain q sub0,...,q subw 1. Then, we use T hread i (0 i < w) to compute the dot product of p subi and q subi. After each thread obtains a partial dot product, we sum up these partial dot products following the algorithm in Lecture 7 (Title: Adding in PRAM) to obtain the final result. The time complexity for the serial execution is O(2 k )+O(2 k 1) = O(2 k ). The time complexity for the parallel execution is O(2 k m )+O(log w) = O(2 k m )+O(m) = O(2 k m ) = O(2 k O(2 /w). Therefore, the speedup is ) =O(w) scalable solution. O(2 k /w) 2. 1 /* Pseudo code executed by the thread with index id */ 2 Partial_dot_product[id]=0; // Partial_dot_product is a shared array to store partial dot products; the final result will be output as Partial_dot_product[0] by the thread with index 0 3 for (i = id*2 k m ; i < (id+1)*2 k m ; i++) 4 Partial_dot_product[id]+ = p i q i ; 5 end for 6 barrier; // A barrier is needed here to synchronize threads 7 for (i = 0; i < m; i++) 8 if(id mod 2 i+1 = 0) then 9 Partial_dot_product[id]+ =Partial_dot_product[id+2 i ]; 10 end if 11 barrier; 12 end for 2
3 3 [30 points] 1. The shared variables include At least one vertex has update and the s array which records the shortest path lengths /* Pseudo code executed by Thread(i,j) */ 2 for (k = 0; k < # of vertices; k++) 3 if At_least_one_vertex_has_update = true then 4 At_least_one_vertex_has_update = false; 5 barrier; 6 Lock(s(i), s(j)); 7 if s(i)+w(i,j)<s(j) then 8 s(j) = s(i)+w(i,j); 9 At_least_one_vertex_has_update = true; 10 end if 11 Unlock(s(i), s(j)); 12 else then 13 Return; 14 end if 15 barrier; 16 end for 3
4 4 [25 points] The Pthreads program discussed in class when converted into PRAM will cause multiple writes to same location C(i, k) in k-th iteration by threads (i, 1 : n). We can have another for loop within each of the k iterations to serialize the thread accesses by threads (i, 1 : n). The final program will take n 2 clocks. A better approach is to rearrange the data accesses by threads to the elements in C. The program can be implemented in n clocks as shown by the program below. Thread(i,j) { Do k from 1 to n index = (k + j - 1) % (n+1) + floor((k + j - 1)/(n+1)) C(i, index) = C(i, index) + A(i,j)*B(j, index) End } 4
5 5 [10 points] No, the parallel execution will not produce the same output A. This is because the loop we are parallelizing has dependence within the loop (i.e., loop dependence). For example, in the serial version, A[i][j 1] is always computed before A[i][j]; but in the parallel version, it is likely that A[i][j 1] is computed after A[i][j] due to the scheduling of threads; this is problematic because the computation of A[i][j] depends on A[i][j 1]. 5
EE/CSCI 451 Midterm 1
EE/CSCI 451 Midterm 1 Spring 2018 Instructor: Xuehai Qian Friday: 02/26/2018 Problem # Topic Points Score 1 Definitions 20 2 Memory System Performance 10 3 Cache Performance 10 4 Shared Memory Programming
More informationEE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.
EE/CSCI 451 Spring 2018 Homework 8 Total Points: 100 1 [10 points] Explain the following terms: EREW PRAM CRCW PRAM Brent s Theorem BSP model 1 2 [15 points] Assume two sorted sequences of size n can be
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #15 3/7/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 From last class Outline
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #12 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Last class Outline
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #7 2/5/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline From last class
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:
More informationLesson 1 1 Introduction
Lesson 1 1 Introduction The Multithreaded DAG Model DAG = Directed Acyclic Graph : a collection of vertices and directed edges (lines with arrows). Each edge connects two vertices. The final result of
More informationEE/CSCI 451: Parallel and Distributed Computation
EE/CSCI 451: Parallel and Distributed Computation Lecture #4 1/24/2018 Xuehai Qian xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Announcements PA #1
More informationImplementation of Parallel Path Finding in a Shared Memory Architecture
Implementation of Parallel Path Finding in a Shared Memory Architecture David Cohen and Matthew Dallas Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180 Email: {cohend4, dallam}
More informationLecture 8 Parallel Algorithms II
Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationCS4961 Parallel Programming. Lecture 4: Data and Task Parallelism 9/3/09. Administrative. Mary Hall September 3, Going over Homework 1
CS4961 Parallel Programming Lecture 4: Data and Task Parallelism Administrative Homework 2 posted, due September 10 before class - Use the handin program on the CADE machines - Use the following command:
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationThe PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2
The PRAM model A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 Introduction The Parallel Random Access Machine (PRAM) is one of the simplest ways to model a parallel computer. A PRAM consists of
More informationCS4230 Parallel Programming. Lecture 12: More Task Parallelism 10/5/12
CS4230 Parallel Programming Lecture 12: More Task Parallelism Mary Hall October 4, 2012 1! Homework 3: Due Before Class, Thurs. Oct. 18 handin cs4230 hw3 Problem 1 (Amdahl s Law): (i) Assuming a
More informationand 6.855J. The Successive Shortest Path Algorithm and the Capacity Scaling Algorithm for the Minimum Cost Flow Problem
15.082 and 6.855J The Successive Shortest Path Algorithm and the Capacity Scaling Algorithm for the Minimum Cost Flow Problem 1 Pseudo-Flows A pseudo-flow is a "flow" vector x such that 0 x u. Let e(i)
More informationParallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming University of Evansville
Parallel Programming Patterns Overview CS 472 Concurrent & Parallel Programming of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information
More informationParallel Random-Access Machines
Parallel Random-Access Machines Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS3101 (Moreno Maza) Parallel Random-Access Machines CS3101 1 / 69 Plan 1 The PRAM Model 2 Performance
More informationECE 574 Cluster Computing Lecture 10
ECE 574 Cluster Computing Lecture 10 Vince Weaver http://www.eece.maine.edu/~vweaver vincent.weaver@maine.edu 1 October 2015 Announcements Homework #4 will be posted eventually 1 HW#4 Notes How granular
More informationParallel Programming with OpenMP. CS240A, T. Yang, 2013 Modified from Demmel/Yelick s and Mary Hall s Slides
Parallel Programming with OpenMP CS240A, T. Yang, 203 Modified from Demmel/Yelick s and Mary Hall s Slides Introduction to OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for
More informationCSE 5095 Topics in Big Data Analytics Spring 2014; Homework 1 Solutions
CSE 5095 Topics in Big Data Analytics Spring 2014; Homework 1 Solutions Note: Solutions to problems 4, 5, and 6 are due to Marius Nicolae. 1. Consider the following algorithm: for i := 1 to α n log e n
More informationParallel Random Access Machine (PRAM)
PRAM Algorithms Parallel Random Access Machine (PRAM) Collection of numbered processors Access shared memory Each processor could have local memory (registers) Each processor can access any shared memory
More informationData Structure and Algorithm, Spring 2013 Midterm Examination 120 points Time: 2:20pm-5:20pm (180 minutes), Tuesday, April 16, 2013
Data Structure and Algorithm, Spring 2013 Midterm Examination 120 points Time: 2:20pm-5:20pm (180 minutes), Tuesday, April 16, 2013 Problem 1. In each of the following question, please specify if the statement
More informationParallelization of an Example Program
Parallelization of an Example Program [ 2.3] In this lecture, we will consider a parallelization of the kernel of the Ocean application. Goals: Illustrate parallel programming in a low-level parallel language.
More informationChapter 2 Abstract Machine Models. Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam
Chapter 2 Abstract Machine Models Lectured by: Phạm Trần Vũ Prepared by: Thoại Nam Parallel Computer Models (1) A parallel machine model (also known as programming model, type architecture, conceptual
More informationExploring Parallelism At Different Levels
Exploring Parallelism At Different Levels Balanced composition and customization of optimizations 7/9/2014 DragonStar 2014 - Qing Yi 1 Exploring Parallelism Focus on Parallelism at different granularities
More informationAn Introduction to Parallel Programming
An Introduction to Parallel Programming Ing. Andrea Marongiu (a.marongiu@unibo.it) Includes slides from Multicore Programming Primer course at Massachusetts Institute of Technology (MIT) by Prof. SamanAmarasinghe
More informationCMSC Computer Architecture Lecture 12: Multi-Core. Prof. Yanjing Li University of Chicago
CMSC 22200 Computer Architecture Lecture 12: Multi-Core Prof. Yanjing Li University of Chicago Administrative Stuff! Lab 4 " Due: 11:49pm, Saturday " Two late days with penalty! Exam I " Grades out on
More informationSimulating ocean currents
Simulating ocean currents We will study a parallel application that simulates ocean currents. Goal: Simulate the motion of water currents in the ocean. Important to climate modeling. Motion depends on
More informationCopyright 2010, Elsevier Inc. All rights Reserved
An Introduction to Parallel Programming Peter Pacheco Chapter 6 Parallel Program Development 1 Roadmap Solving non-trivial problems. The n-body problem. The traveling salesman problem. Applying Foster
More informationLesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans
Lesson 1 4 Prefix Sum Definitions Prefix sum given an array...the prefix sum is the sum of all the elements in the array from the beginning to the position, including the value at the position. The sequential
More informationEE/CSCI 451 Spring 2018 Homework 2 Assigned: February 7, 2018 Due: February 14, 2018, before 11:59 pm Total Points: 100
EE/CSCI 45 Spring 08 Homework Assigned: February 7, 08 Due: February 4, 08, before :59 pm Total Points: 00 [0 points] Explain the following terms:. Diameter of a network. Bisection width of a network.
More informationContents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11
Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed
More informationTree Search for Travel Salesperson Problem Pacheco Text Book Chapt 6 T. Yang, UCSB CS140, Spring 2014
Tree Search for Travel Salesperson Problem Pacheco Text Book Chapt 6 T. Yang, UCSB CS140, Spring 2014 Outline Tree search for travel salesman problem. Recursive code Nonrecusive code Parallelization with
More informationL21: Putting it together: Tree Search (Ch. 6)!
Administrative CUDA project due Wednesday, Nov. 28 L21: Putting it together: Tree Search (Ch. 6)! Poster dry run on Dec. 4, final presentations on Dec. 6 Optional final report (4-6 pages) due on Dec. 14
More informationCOMP Parallel Computing. SMM (2) OpenMP Programming Model
COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel
More informationCSE 332 Winter 2018 Final Exam (closed book, closed notes, no calculators)
Name: Sample Solution Email address (UWNetID): CSE 332 Winter 2018 Final Exam (closed book, closed notes, no calculators) Instructions: Read the directions for each question carefully before answering.
More informationFlat Parallelization. V. Aksenov, ITMO University P. Kuznetsov, ParisTech. July 4, / 53
Flat Parallelization V. Aksenov, ITMO University P. Kuznetsov, ParisTech July 4, 2017 1 / 53 Outline Flat-combining PRAM and Flat parallelization PRAM binary heap with Flat parallelization ExtractMin Insert
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationCS307: Operating Systems
CS307: Operating Systems Chentao Wu 吴晨涛 Associate Professor Dept. of Computer Science and Engineering Shanghai Jiao Tong University SEIEE Building 3-513 wuct@cs.sjtu.edu.cn Download Lectures ftp://public.sjtu.edu.cn
More informationParallel Programming Multicore systems
FYS3240 PC-based instrumentation and microcontrollers Parallel Programming Multicore systems Spring 2011 Lecture #9 Bekkeng, 4.4.2011 Introduction Until recently, innovations in processor technology have
More informationCSE332 Summer 2010: Final Exam
CSE332 Summer 2010: Final Exam Closed notes, closed book; calculator ok. Read the instructions for each problem carefully before answering. Problems vary in point-values, difficulty and length, so you
More informationCS4961 Parallel Programming. Lecture 2: Introduction to Parallel Algorithms 8/31/10. Mary Hall August 26, Homework 1, cont.
Parallel Programming Lecture 2: Introduction to Parallel Algorithms Mary Hall August 26, 2010 1 Homework 1 Due 10:00 PM, Wed., Sept. 1 To submit your homework: - Submit a PDF file - Use the handin program
More informationL20: Putting it together: Tree Search (Ch. 6)!
Administrative L20: Putting it together: Tree Search (Ch. 6)! November 29, 2011! Next homework, CUDA, MPI (Ch. 3) and Apps (Ch. 6) - Goal is to prepare you for final - We ll discuss it in class on Thursday
More informationSection. Announcements
Lecture 7 Section Announcements Have you been to section; why or why not? A. I have class and cannot make either time B. I have work and cannot make either time C. I went and found section helpful D. I
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationLecture 16: Recapitulations. Lecture 16: Recapitulations p. 1
Lecture 16: Recapitulations Lecture 16: Recapitulations p. 1 Parallel computing and programming in general Parallel computing a form of parallel processing by utilizing multiple computing units concurrently
More informationMultithreading in C with OpenMP
Multithreading in C with OpenMP ICS432 - Spring 2017 Concurrent and High-Performance Programming Henri Casanova (henric@hawaii.edu) Pthreads are good and bad! Multi-threaded programming in C with Pthreads
More informationIntroduction to Parallel & Distributed Computing Parallel Graph Algorithms
Introduction to Parallel & Distributed Computing Parallel Graph Algorithms Lecture 16, Spring 2014 Instructor: 罗国杰 gluo@pku.edu.cn In This Lecture Parallel formulations of some important and fundamental
More informationCSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting
More informationVerifying Concurrent Programs
Verifying Concurrent Programs Daniel Kroening 8 May 1 June 01 Outline Shared-Variable Concurrency Predicate Abstraction for Concurrent Programs Boolean Programs with Bounded Replication Boolean Programs
More informationParallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring
Parallel Programs Conditions of Parallelism: Data Dependence Control Dependence Resource Dependence Bernstein s Conditions Asymptotic Notations for Algorithm Analysis Parallel Random-Access Machine (PRAM)
More informationAgenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2
Lecture 3: Processes Agenda Process Concept Process Scheduling Operations on Processes Interprocess Communication 3.2 Process in General 3.3 Process Concept Process is an active program in execution; process
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical
More informationNikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris
Early Experiences on Accelerating Dijkstra s Algorithm Using Transactional Memory Nikos Anastopoulos, Konstantinos Nikas, Georgios Goumas and Nectarios Koziris Computing Systems Laboratory School of Electrical
More informationParallel Computer Architecture and Programming Written Assignment 3
Parallel Computer Architecture and Programming Written Assignment 3 50 points total. Due Monday, July 17 at the start of class. Problem 1: Message Passing (6 pts) A. (3 pts) You and your friend liked the
More informationThreads. What is a thread? Motivation. Single and Multithreaded Processes. Benefits
CS307 What is a thread? Threads A thread is a basic unit of CPU utilization contains a thread ID, a program counter, a register set, and a stack shares with other threads belonging to the same process
More informationParallelization Principles. Sathish Vadhiyar
Parallelization Principles Sathish Vadhiyar Parallel Programming and Challenges Recall the advantages and motivation of parallelism But parallel programs incur overheads not seen in sequential programs
More informationParallel Computing: Parallel Algorithm Design Examples Jin, Hai
Parallel Computing: Parallel Algorithm Design Examples Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! Given associative operator!! a 0! a 1! a 2!! a
More informationCSE613: Parallel Programming, Spring 2012 Date: March 31. Homework #2. ( Due: April 14 )
CSE613: Parallel Programming, Spring 2012 Date: March 31 Homework #2 ( Due: April 14 ) Serial-BFS( G, s, d ) (Inputs are an unweighted directed graph G with vertex set G[V ], and a source vertex s G[V
More informationBreadth First Search. cse2011 section 13.3 of textbook
Breadth irst Search cse section. of textbook Graph raversal (.) Application example Given a graph representation and a vertex s in the graph, find all paths from s to the other vertices. wo common graph
More informationThe Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.
Cilk Plus The Cilk part is a small set of linguistic extensions to C/C++ to support fork-join parallelism. (The Plus part supports vector parallelism.) Developed originally by Cilk Arts, an MIT spinoff,
More information1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors
1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors on an EREW PRAM: See solution for the next problem. Omit the step where each processor sequentially computes the AND of
More informationEfficient Data Race Detection for Unified Parallel C
P A R A L L E L C O M P U T I N G L A B O R A T O R Y Efficient Data Race Detection for Unified Parallel C ParLab Winter Retreat 1/14/2011" Costin Iancu, LBL" Nick Jalbert, UC Berkeley" Chang-Seo Park,
More informationAMath 483/583 Lecture 16 May 2, Notes: Notes: Fine vs. coarse grain parallelism. Solution of independent ODEs by Euler s method.
AMath 483/583 Lecture 16 May 2, 2011 Today: Fine grain vs. coarse grain parallelism Manually splitting do loops among threads Wednesday: Adaptive quadrature, recursive functions Start MPI? Read: Class
More informationDynamic Programming II
Lecture 11 Dynamic Programming II 11.1 Overview In this lecture we continue our discussion of dynamic programming, focusing on using it for a variety of path-finding problems in graphs. Topics in this
More informationChapter 4: Threads. Operating System Concepts. Silberschatz, Galvin and Gagne
Chapter 4: Threads Silberschatz, Galvin and Gagne Chapter 4: Threads Overview Multithreading Models Thread Libraries Threading Issues Operating System Examples Linux Threads 4.2 Silberschatz, Galvin and
More informationChap. 6 Part 3. CIS*3090 Fall Fall 2016 CIS*3090 Parallel Programming 1
Chap. 6 Part 3 CIS*3090 Fall 2016 Fall 2016 CIS*3090 Parallel Programming 1 OpenMP popular for decade Compiler-based technique Start with plain old C, C++, or Fortran Insert #pragmas into source file You
More informationCSE 490/590 Computer Architecture Homework 2
CSE 490/590 Computer Architecture Homework 2 1. Suppose that you have the following out-of-order datapath with 1-cycle ALU, 2-cycle Mem, 3-cycle Fadd, 5-cycle Fmul, no branch prediction, and in-order fetch
More informationCS4961 Parallel Programming. Lecture 9: Task Parallelism in OpenMP 9/22/09. Administrative. Mary Hall September 22, 2009.
Parallel Programming Lecture 9: Task Parallelism in OpenMP Administrative Programming assignment 1 is posted (after class) Due, Tuesday, September 22 before class - Use the handin program on the CADE machines
More informationHigh Performance Computing: Tools and Applications
High Performance Computing: Tools and Applications Edmond Chow School of Computational Science and Engineering Georgia Institute of Technology Lecture 15 Numerically solve a 2D boundary value problem Example:
More informationParallel Programming in C with MPI and OpenMP
Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions
More informationThreaded Programming. Lecture 1: Concepts
Threaded Programming Lecture 1: Concepts Overview Shared memory systems Basic Concepts in Threaded Programming 2 Shared memory systems Threaded programming is most often used on shared memory parallel
More informationLecture 4: OpenMP Open Multi-Processing
CS 4230: Parallel Programming Lecture 4: OpenMP Open Multi-Processing January 23, 2017 01/23/2017 CS4230 1 Outline OpenMP another approach for thread parallel programming Fork-Join execution model OpenMP
More informationCS 475. Process = Address space + one thread of control Concurrent program = multiple threads of control
Processes & Threads Concurrent Programs Process = Address space + one thread of control Concurrent program = multiple threads of control Multiple single-threaded processes Multi-threaded process 2 1 Concurrent
More informationBasic Communication Operations (Chapter 4)
Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:
More informationSorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel
More informationLectures 11 & 12: Synchronous Sequential Circuits Minimization
Lectures & 2: Synchronous Sequential Circuits Minimization. This week I noted that our seven-state edge detector machine on the left side below could be simplified to a five-state machine on the right.
More informationDiscrete Mathematics, Spring 2004 Homework 8 Sample Solutions
Discrete Mathematics, Spring 4 Homework 8 Sample Solutions 6.4 #. Find the length of a shortest path and a shortest path between the vertices h and d in the following graph: b c d a 7 6 7 4 f 4 6 e g 4
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationParallelizing The Matrix Multiplication. 6/10/2013 LONI Parallel Programming Workshop
Parallelizing The Matrix Multiplication 6/10/2013 LONI Parallel Programming Workshop 2013 1 Serial version 6/10/2013 LONI Parallel Programming Workshop 2013 2 X = A md x B dn = C mn d c i,j = a i,k b k,j
More information1 On the reduce implementation
1 On the reduce implementation Definition 1 (Reduce Operator). Given an associative operator and a vector A R M, we define the a second order reduce operator as y = reduce(a, ) = A 1 A 2... A M (1) If
More informationFork / Join Parallelism
Fork / Join Parallelism Image courtesy of http://www.llnl.gov/computing/tutorials/openmp/ Speedup limited by linear portion Amdahl s Law, Speedup = 1 / [(1- F) + F/S] Synchronization wait time OpenMP:
More informationComp2310 & Comp6310 Systems, Networks and Concurrency
The Australian National University Final Examination November 2017 Comp2310 & Comp6310 Systems, Networks and Concurrency Study period: 15 minutes Writing time: 3 hours (after study period) Total marks:
More informationRouting algorithms. Jan Lönnberg, 51101M. October 2, Based on G. Tel: Introduction to Distributed Algorithms, chapter 4.
Routing algorithms Jan Lönnberg, 51101M October 2, 2002 Based on G. Tel: Introduction to Distributed Algorithms, chapter 4. 1 Contents Introduction Destination-based routing Floyd-Warshall (single processor)
More informationIntroduction Single-source shortest paths All-pairs shortest paths. Shortest paths in graphs
Shortest paths in graphs Remarks from previous lectures: Path length in unweighted graph equals to edge count on the path Oriented distance (δ(u, v)) between vertices u, v equals to the length of the shortest
More informationCMSC 714 Lecture 14 Lamport Clocks and Eraser
Notes CMSC 714 Lecture 14 Lamport Clocks and Eraser Midterm exam on April 16 sample exam questions posted Research project questions? Alan Sussman (with thanks to Chris Ackermann) 2 Lamport Clocks Distributed
More informationCOSC 6385 Computer Architecture - Pipelining (II)
COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018 Performance evaluation of pipelines (I) General Speedup Formula: Time Speedup Time IC IC ClockCycle ClockClycle CPI CPI For a
More informationProcessor speed. Concurrency Structure and Interpretation of Computer Programs. Multiple processors. Processor speed. Mike Phillips <mpp>
Processor speed 6.037 - Structure and Interpretation of Computer Programs Mike Phillips Massachusetts Institute of Technology http://en.wikipedia.org/wiki/file:transistor_count_and_moore%27s_law_-
More informationLecture 7. OpenMP: Reduction, Synchronization, Scheduling & Applications
Lecture 7 OpenMP: Reduction, Synchronization, Scheduling & Applications Announcements Section and Lecture will be switched on Thursday and Friday Thursday: section and Q2 Friday: Lecture 2010 Scott B.
More informationGeneration of parallel synchronization-free tiled code
Computing (2018) 100:277 302 https://doi.org/10.1007/s00607-017-0576-3 Generation of parallel synchronization-free tiled code Wlodzimierz Bielecki 1 Marek Palkowski 1 Piotr Skotnicki 1 Received: 22 August
More informationCSE : PARALLEL SOFTWARE TOOLS
CSE 4392-601: PARALLEL SOFTWARE TOOLS (Summer 2002: T R 1:00-2:50, Nedderman 110) Instructor: Bob Weems, Associate Professor Office: 344 Nedderman, 817/272-2337, weems@uta.edu Hours: T R 3:00-5:30 GTA:
More informationLecture 16/17: Distributed Shared Memory. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 16/17: Distributed Shared Memory CSC 469H1F Fall 2006 Angela Demke Brown Outline Review distributed system basics What is distributed shared memory? Design issues and tradeoffs Distributed System
More informationLecture 9: Load Balancing & Resource Allocation
Lecture 9: Load Balancing & Resource Allocation Introduction Moler s law, Sullivan s theorem give upper bounds on the speed-up that can be achieved using multiple processors. But to get these need to efficiently
More informationSciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
Parallel Tiled Algorithms for Multicore Architectures Alfredo Buttari, Jack Dongarra, Jakub Kurzak and Julien Langou SciDAC CScADS Summer Workshop on Libraries and Algorithms for Petascale Applications
More informationC09: Process Synchronization
CISC 7310X C09: Process Synchronization Hui Chen Department of Computer & Information Science CUNY Brooklyn College 3/29/2018 CUNY Brooklyn College 1 Outline Race condition and critical regions The bounded
More informationOpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system
OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives
More informationTotal Points: 60. Duration: 1hr
CS5800 : Algorithms Fall 015 Nov, 015 Quiz Practice Total Points: 0. Duration: 1hr 1. (7,8) points Binary Heap. (a) The following is a sequence of elements presented to you (in order from left to right):
More informationEI 338: Computer Systems Engineering (Operating Systems & Computer Architecture)
EI 338: Computer Systems Engineering (Operating Systems & Computer Architecture) Dept. of Computer Science & Engineering Chentao Wu wuct@cs.sjtu.edu.cn Download lectures ftp://public.sjtu.edu.cn User:
More informationDynamic-Programming algorithms for shortest path problems: Bellman-Ford (for singlesource) and Floyd-Warshall (for all-pairs).
Lecture 13 Graph Algorithms I 13.1 Overview This is the first of several lectures on graph algorithms. We will see how simple algorithms like depth-first-search can be used in clever ways (for a problem
More information