PARALLEL AND DISTRIBUTED COMPUTING

Similar documents
PARALLEL AND DISTRIBUTED COMPUTING

PARALLEL AND DISTRIBUTED COMPUTING

Parallel Programming with MPI and OpenMP

ECE 563 Spring 2012 First Exam

ECE 563 Second Exam, Spring 2014

Performance analysis. Performance analysis p. 1

ECE Spring 2017 Exam 2

The typical speedup curve - fixed problem size

Parallel Programming with MPI and OpenMP

CSE 613: Parallel Programming. Lecture 2 ( Analytical Modeling of Parallel Algorithms )

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2011 COMP4300/6430. Parallel Systems

Design of Parallel Algorithms. Course Introduction

ECE 563 Spring 2016, Second Exam

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

COSC 6385 Computer Architecture - Multi Processor Systems

CSCE 626 Experimental Evaluation.

CSL 860: Modern Parallel

COSC 6385 Computer Architecture - Thread Level Parallelism (I)

Analytical Modeling of Parallel Systems. To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.

Outline. CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Student Number: CSE191 Midterm II Spring Plagiarism will earn you an F in the course and a recommendation of expulsion from the university.

CSC630/CSC730 Parallel & Distributed Computing

Computation of Multiple Node Disjoint Paths

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Parallel and Distributed Systems. Programming Models. Why Parallel or Distributed Computing? What is a parallel computer?

Claude TADONKI. MINES ParisTech PSL Research University Centre de Recherche Informatique

(1) Measuring performance on multiprocessors using linear speedup instead of execution time is a good idea.

Parallel and Distributed Computing

Multicore Programming

11 Data Structures Foundations of Computer Science Cengage Learning

Parallel Programming Concepts. Parallel Algorithms. Peter Tröger

CS2102, B11 Exam 1. Name:

Understanding Parallelism and the Limitations of Parallel Computing

Scalability of Processing on GPUs

Case study: OpenMP-parallel sparse matrix-vector multiplication

Operating Systems EDA092, DIT 400 Exam

Finding heap-bounds for hardware synthesis

Summer Final Exam Review Session August 5, 2009

Midterm Exam 02/09/2009

Parallel and Distributed Computing

Advanced Topics UNIT 2 PERFORMANCE EVALUATIONS

Parallel Computing. Hwansoo Han (SKKU)

1 Introduction to Parallel Computing

Amdahl's Law in the Multicore Era

What is Performance for Internet/Grid Computation?

There are 16 total numbered pages, 7 Questions. You have 2 hours. Budget your time carefully!

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

Computer and Information Sciences College / Computer Science Department CS 207 D. Computer Architecture. Lecture 9: Multiprocessors

Structure and Interpretation of Computer Programs

740: Computer Architecture, Fall 2013 Midterm I

11 Data Structures Foundations of Computer Science Cengage Learning

Parallel Systems. Part 7: Evaluation of Computers and Programs. foils by Yang-Suk Kee, X. Sun, T. Fahringer

Performance analysis basics

Parallel Convex Hull using MPI. CSE633 Fall 2012 Alex Vertlieb

Hash table basics mod 83 ate. ate. hashcode()

PARALLEL AND DISTRIBUTED COMPUTING

Structured Parallel Programming Patterns for Efficient Computation

COSC 462. Parallel Algorithms. The Design Basics. Piotr Luszczek

Review: Creating a Parallel Program. Programming for Performance

Basic Communication Operations (Chapter 4)

Analytical Modeling of Parallel Programs

CS430 Computer Architecture

UNIVERSITI SAINS MALAYSIA. CCS524 Parallel Computing Architectures, Algorithms & Compilers

Stacks, Queues (cont d)

Parallel Computing Using OpenMP/MPI. Presented by - Jyotsna 29/01/2008

ECE 454 Computer Systems Programming

I. INTRODUCTION FACTORS RELATED TO PERFORMANCE ANALYSIS

Page 1. Program Performance Metrics. Program Performance Metrics. Amdahl s Law. 1 seq seq 1

Structured Parallel Programming

740: Computer Architecture, Fall 2013 SOLUTIONS TO Midterm I

Multiprocessors - Flynn s Taxonomy (1966)

EE282 Computer Architecture. Lecture 1: What is Computer Architecture?

PARALLEL MEMORY ARCHITECTURE

Interconnection Networks. Issues for Networks

Programming Language. Control Structures: Selection (switch) Eng. Anis Nazer First Semester

S = 32 2 d kb (1) L = 32 2 D B (2) A = 2 2 m mod 4 (3) W = 16 2 y mod 4 b (4)

Sorting Algorithms. Slides used during lecture of 8/11/2013 (D. Roose) Adapted from slides by

Problem Pts Score Grader Problem Pts Score Grader

Computer Architecture EE 4720 Final Examination

15-740/ Computer Architecture, Fall 2011 Midterm Exam II

Structure and Interpretation of Computer Programs

Engineering Robust Server Software

CDA3101 Recitation Section 13

Performance Metrics. 1 cycle. 1 cycle. Computer B performs more instructions per second, thus it is the fastest for this program.

Shared Memory and Distributed Multiprocessing. Bhanu Kapoor, Ph.D. The Saylor Foundation

Lecture 10 Midterm review

A Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004

CS 455 Midterm 2 Fall 2017 [Bono] Nov. 7, 2017

9. Heap : Priority Queue

Lecture 13: Memory Consistency. + a Course-So-Far Review. Parallel Computer Architecture and Programming CMU , Spring 2013

Parallel Computing Why & How?

Midterm Exam. October 20th, Thursday NSC

Unit #8: Shared-Memory Parallelism and Concurrency

27. Parallel Programming I

Heaps. Heaps. A heap is a complete binary tree.

27. Parallel Programming I

Text Analytics. Index-Structures for Information Retrieval. Ulf Leser

An Introduction to Parallel Programming

Chapter 5: Thread-Level Parallelism Part 1

DPHPC: Performance Recitation session

Transcription:

PARALLEL AND DISTRIBUTED COMPUTING 2013/2014 1 st Semester 2 nd Exam January 29, 2014 Duration: 2h00 - No extra material allowed. This includes notes, scratch paper, calculator, etc. - Give your answers in the available space after each question. You can use either Portuguese or English. - Be sure to write your name and number on all pages, non-identified pages will not be graded! - Justify all your answers. - Do not hurry, you should have plenty of time to finish this exam. Skip questions that you find less comfortable with and come back to them later on. I. (1 + 1 + 1,5 + 1 + 0,5 = 5 val.) 1. Define functional and data parallelism. What are the typical reasons that limit the scalability of functional parallelism? And of data parallelism? IST ID: Name: 1/9

2. Consider the following program: 1 unsigned int row, column, t, i; 2 unsigned int m[n*n]; /* assume m is initialized with random integers */ 3 unsigned int v[5] = {0,0,0,0,0}; 4 register unsigned int tmp = 0; /* unused for now */ 5 6 for (row = 0; row < N; row++) { 7 m[row*n] = tmp = 0; 8 for (column = 1; column < N; column++) { 9 if (m[row*n+column] == 7) 10 m[row*n]++; /* accumulate in the first column */ 11 } 12 /* empty line*/ 13 } 14 for (row = 0; row < N; row++) 15 for (t = 0; t < 5; t++) 16 if (m[row*n] > v[t]) { 17 for (i = 3; i > t; i--) 18 v[i+1] = v[i]; 19 v[t] = m[row*n]; 20 break; 21 } a) Explain in detail what is the content of v at the end of the program. b) Parallelize the program using OpenMP. IST ID: Name: 2/9

c) Suggest an improvement to the program to improve parallelization. d) Assume N=1.000.000. Does replacing lines 10 to 12 with the lines below improve the program? Why? 10 tmp++; 11 } 12 m[row*n] = tmp; /* accumulate in the first column */ IST ID: Name: 3/9

II. (1,5 + 1,5 + 2 = 5 val.) 1. Write a simple MPI program to estimate the bandwidth of the network. Discuss the limitations of the solution you propose. IST ID: Name: 4/9

2. What is the best network topology to use in the MPI_Alltoall function? 3. Explain what the MPI_Barrier function does. Give an example of an algorithm where it is required. IST ID: Name: 5/9

III. (1,5 + 1,5 + 2 = 5 val.) 1 1. Amdahl s Law is given by ψ(n, p) and Gustafson-Barsis Law is given by ψ(n, p) f+ 1 f p p + (1 p)s. Explain clearly what do f and s represent. How do f and s change as the size of the problem, n, grows? 2. Explain how the Karp-Flatt Metric, the experimentally determined serial fraction, can be used to optimize a parallel program. What experimental measurements does it use? IST ID: Name: 6/9

3. Consider a problem with a sequential algorithm that runs in Θ(n 2 log n) and with a parallel implementation whose overhead (communication + redundant computation) per processor is given by Θ(log n). If the required memory grows with n 3, compute the scalability function for this parallel algorithm. Discuss the result obtained. What does it mean? IST ID: Name: 7/9

IV. (1 + 0,5 + 0,5 + 1,5 + 1,5 = 5 val.) 1. Algorithms for Branch and Bound Search use a priority queue to allow the selection of the most promising node to explore next. a) Discuss why parallel implementations of this algorithm use multiple priority queues. b) State the potential inefficiency for the parallel algorithm created by these multiple queues. c) How can this inefficiency be minimized? What is the tradeoff? IST ID: Name: 8/9

2. The scalability of the Parallel Sorting by Regular Sampling (PSRS) algorithm we derived in class was p C 1 and the scalability of the Odd-Even Sort algorithm was C. Yet, in practice, PSRS is much more widely used. Discuss why this is so. 3. Give the main advantage and main disadvantage of Cache-Coherent NUMA (aka ccnuma) systems versus Message-Passing systems. IST ID: Name: 9/9