PARALLEL AND DISTRIBUTED COMPUTING

Similar documents
PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING

ITCS 4/5145 Parallel Computing Test 1 5:00 pm - 6:15 pm, Wednesday February 17, 2016 Solutions Name:...

Introduction to Parallel and Distributed Systems - INZ0277Wcl 5 ECTS. Teacher: Jan Kwiatkowski, Office 201/15, D-2

Introduction to Parallel Programming

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Lecture 14: Mixed MPI-OpenMP programming. Lecture 14: Mixed MPI-OpenMP programming p. 1

Parallel Numerical Algorithms

ECE 563 Midterm 1 Spring 2015

ECE 563 Spring 2016, Second Exam

Chip Multiprocessors COMP Lecture 9 - OpenMP & MPI

Parallel Programming Using MPI

OpenMP and MPI. Parallel and Distributed Computing. Department of Computer Science and Engineering (DEI) Instituto Superior Técnico.

Hybrid MPI and OpenMP Parallel Programming

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 18. Combining MPI and OpenMP

Parallel Computing. Lecture 17: OpenMP Last Touch

MultiCore Architecture and Parallel Programming Final Examination

Introduction to parallel computing concepts and technics

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

A short overview of parallel paradigms. Fabio Affinito, SCAI

Introduction to Parallel Programming Message Passing Interface Practical Session Part I

Message Passing Interface

Lecture 36: MPI, Hybrid Programming, and Shared Memory. William Gropp

Parallel Computing Why & How?

COSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Hybrid Programming. John Urbanic Parallel Computing Specialist Pittsburgh Supercomputing Center. Copyright 2017

OpenMPand the PGAS Model. CMSC714 Sept 15, 2015 Guest Lecturer: Ray Chen

Sample Mid-Term Exam 2

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

Message Passing Interface (MPI)

ECE 563 Second Exam, Spring 2014

Lecture 7: Distributed memory

Parallel Programming. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430. Parallel Systems

PARALLEL AND DISTRIBUTED COMPUTING

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

Parallel Programming. Libraries and Implementations

AMath 483/583 Lecture 24. Notes: Notes: Steady state diffusion. Notes: Finite difference method. Outline:

Parallel Computing and the MPI environment

Parallel Programming. OpenMP Parallel programming for multiprocessors for loops

COSC 6385 Computer Architecture - Multi Processor Systems

Parallel Programming in C with MPI and OpenMP

Multicore and Multiprocessor Systems: Part I

MPI and OpenMP (Lecture 25, cs262a) Ion Stoica, UC Berkeley November 19, 2016

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Marc Snir U. of Illinois at Urbana-Champaign & Argonne National Lab

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Homework # 2 Due: October 6. Programming Multiprocessors: Parallelism, Communication, and Synchronization

AMath 483/583 Lecture 24

Chapter 18 Parallel Processing

The following program computes a Calculus value, the "trapezoidal approximation of

Lecture 9: MPI continued

Outline. Overview Theoretical background Parallel computing systems Parallel programming models MPI/OpenMP examples

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

Lecture 10 Midterm review

Limitations of parallel processing

Exam Issued: May 29, 2017, 13:00 Hand in: May 29, 2017, 16:00

Review of previous examinations TMA4280 Introduction to Supercomputing

AMath 483/583 Lecture 21 May 13, 2011

CS 426. Building and Running a Parallel Application

Parallel Architectures

Parallel Computing: Overview

a. Assuming a perfect balance of FMUL and FADD instructions and no pipeline stalls, what would be the FLOPS rate of the FPU?

Operating Systems EDA092, DIT 400 Exam

Homework # 1 Due: Feb 23. Multicore Programming: An Introduction

Chapter 5: Thread-Level Parallelism Part 1

Parallel Programming in C with MPI and OpenMP

Parallel Programming Libraries and implementations

Introduction to OpenMP

Message Passing Interface - MPI

Shared Memory Architectures

Chap. 4 Multiprocessors and Thread-Level Parallelism

ECE Spring 2017 Exam 2

6.189 IAP Lecture 5. Parallel Programming Concepts. Dr. Rodric Rabbah, IBM IAP 2007 MIT

Assignment 3 MPI Tutorial Compiling and Executing MPI programs

CSE-160 (Winter 2017, Kesden) Practice Midterm Exam. volatile int count = 0; // volatile just keeps count in mem vs register

A More Sophisticated Snooping-Based Multi-Processor

A brief introduction to OpenMP

Tasking and OpenMP Success Stories

CS 61C: Great Ideas in Computer Architecture. Amdahl s Law, Thread Level Parallelism

Midterm I Exam Principles of Imperative Computation André Platzer Ananda Gunawardena. February 23, Name: Andrew ID: Section:

Dynamic Programming with CUDA Part I Assessment. Robert Hochberg

CSC 252: Computer Organization Spring 2018: Lecture 26

Parallel Programming in C with MPI and OpenMP

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING UNIT-1

Message Passing Interface - MPI

COSC 6374 Parallel Computation. Introduction to OpenMP(I) Some slides based on material by Barbara Chapman (UH) and Tim Mattson (Intel)

Lecture 2. Memory locality optimizations Address space organization

Parallel Programming

CS/CoE 1541 Final exam (Fall 2017). This is the cumulative final exam given in the Fall of Question 1 (12 points): was on Chapter 4

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

CSE 160 Lecture 9. Load balancing and Scheduling Some finer points of synchronization NUMA

EN164: Design of Computing Systems Lecture 34: Misc Multi-cores and Multi-processors

Introduction to MPI. SHARCNET MPI Lecture Series: Part I of II. Paul Preney, OCT, M.Sc., B.Ed., B.Sc.

[4] 1 cycle takes 1/(3x10 9 ) seconds. One access to memory takes 50/(3x10 9 ) seconds. =16ns. Performance = 4 FLOPS / (2x50/(3x10 9 )) = 120 MFLOPS.

UvA-SARA High Performance Computing Course June Clemens Grelck, University of Amsterdam. Parallel Programming with Compiler Directives: OpenMP

Parallel Programming. Libraries and implementations

High Performance Computing Lecture 41. Matthew Jacob Indian Institute of Science

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

Transcription:

PARALLEL AND DISTRIBUTED COMPUTING 2010/2011 1 st Semester Recovery Exam February 2, 2011 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers in the available space after each question. You can use either Portuguese or English. - Be sure to write your name and number on all pages, non-identified pages will not be graded! - Justify all your answers. - Don t hurry, you should have plenty of time to finish this test. Skip questions that you find less comfortable with and come back to them later on. I. (1,5 + 0,5 + 1 + 1 + 1 = 5 val.) 1. Discuss the advantages and disadvantages of using dynamic loop scheduling versus static loop scheduling in SMP systems. In this context, argue about the reason for the option guided in the parallel for directive of OpenMP. Number: Name: 1/10

2. Consider a three processor shared-memory multiprocessor (with processors named A, B, and C) with snooping cache coherent system using an invalidate protocol. Suppose that processor A has block X (in its cache) in exclusive state. a) What will be state of block X in processors B and C? b) Describe the sequence of actions (bus activity, state transitions etc.) that will be performed by the snooping cache coherent protocol when another processor B issues a write to block X? Number: Name: 2/10

3. The following program was working correctly and in order to parallelize it the pragma directive shown in the program was added. #define N 1000 int i, a = 1; a) For all the variables in this program, state which are private and which are shared in the parallel region. int main(int argc, char **argv) { int found = 0; int b = 3; #pragma omp parallel for private(b) for (i = 0; i < N; i++){ a = i * i * i; if (mult(a, b) % 42 = 0) found++; } printf("result: %i\n", found); } int mult(int x, int y) { int z; z = x * y; } return z; b) Is this parallel implementation working correctly? Justify. If not, suggest how to correct it. Number: Name: 3/10

II. (1 + 0,75 + 0,75 + 0,5 + 1 + 1 = 5 val.) 1. What does Non Uniform Memory Architecture mean? 2. In a distributed application, the computation of the elements of an array was evenly divided among the nodes of the system. It is now necessary to make available the complete array to all the nodes. a) State the best way to perform this operation in MPI. You don t need to know the exact syntax of the MPI functions, but be sure to indicate the name of the functions and the essential parameters they require. Number: Name: 4/10

b) Analyze the asymptotic complexity of this procedure, as a function of the number of nodes p and the size of the array n. Justify. c) How would you modify the previous solution if, for some reason, the array distribution was unbalanced among nodes? Number: Name: 5/10

3. Consider the following MPI program, which is executed by all nodes: MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &me); while(1) { MPI_Send(A, SIZE, MPI_DOUBLE, (me + 1) % nprocs, MTAG, MPI_COMM_WORLD); MPI_Recv(B, SIZE, MPI_DOUBLE, (nprocs + me - 1) % nprocs, MTAG, MPI_COMM_WORLD, &status); } A = update_state(a, B); a) Ignoring eventual programming errors, briefly describe the intended general workings of this program. b) Is there the possibility of a deadlock situation? Justify. If so, propose a solution that resolves this problem. Number: Name: 6/10

III. (1,5 + 1,5 + 2 = 5 val.) 1. Why is the parallel programming community much more fond of Gustafson s Law than Amdahl s Law? What is the reasoning behind Gustafson s Law? 2. The following execution times T where obtained for a parallel program with a varying number of processors p: p 1 2 4 8 T 1.75.625.5625 Explain how you can use this information to improve your program. (note:.75 = 6/8 ;.625 = 5/8 ;.5625 = 9/16) Number: Name: 7/10

3. Suppose that the serial cost is O(n), the parallel computation cost O(n/p) and memory requirements O(n 2 ). Derive the maximum allowable parallel overhead h(p, n) for an application to scale indefinitely. Number: Name: 8/10

IV. (1,5 + 1,5 + 1 + 1 = 5 val.) 1. Describe what you understand as the Foster s design methodology. Give a brief description of each step, indicating its main objective and how to achieve it. 2. Give an example of a piece of code in C (no more than 4 lines) that is easily ported to an efficient implementation on a GPUs, and a different example for which this porting is particularly difficult. Justify. Number: Name: 9/10

3. Consider a problem that is being solved through a 2-dimensional finite difference method, and which has been discretized in a n n matrix, where n is the problem size. A parallel implementation is being run on a cluster with p nodes. If λ is the fixed cost of setting up a message β is a measure of the network band-width between processors compute an estimate of the amortized communication cost, per iteration and per processor, as a function of n, p, λ and β, for both a row-wise decomposition and a checkered-board decomposition, if: a) 1 level of ghostpoints is used. b) 2 levels of ghostpoints are used (caution: think carefully about this one, a drawing may help). Number: Name: 10/10