Parallel algorithms at ENS Lyon

Size: px
Start display at page:

Download "Parallel algorithms at ENS Lyon"

Transcription

1 Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 February 2010 Parallel algorithms 1/ 10

2 Outline 1 Scope 2 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 2/ 10

3 Scope Follow-on of classic CLRS-based algorithm Objective: Apprehend the complexity of parallel algorithms Focus is on models and algorithms Provides a sound basis for parallel programming Not a HPC course Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 3/ 10

4 Organization 16 weeks Each week = 2h class + 2h supervised exercises (or programming sessions) MPI project Midterm and final exam Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 4/ 10

5 Outline 1 Scope 2 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 5/ 10

6 Models (4 weeks) Sorting networks Odd-even merge sort, 0-1 principle Odd-even transposition sort Odd-even sorting on a 1D network (work optimal) PRAM Models (EREW, CREW, CRCW) Pointer jumping (list ranking, prefix, Euler tour) Performance evaluation Cost, work, speedup and efficiency, Brent s theorem Comparison of PRAM models Model separation, simulation theorem Sorting machine Merge, sorting trees, complexity and correctness Relevance of the PRAM model Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 6/ 10

7 Networking (3 weeks) Interconnection networks Static and dynamic topologies Communication models Point-to-point communication protocols Case study: the unidirectional ring Broadcast, scatter, all-to-all, pipelined broadcast Case study: the hypercube Labeling vertices, paths and routing Embedding rings and grids Collective communications Peer-to-peer computing Distributed hash tables and structured overlay networks Chord, Plaxton s routing algorithm Multi-casting in a distributed hash table Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 7/ 10

8 Algorithms on a processor ring (2 weeks) Matrix-vector multiplication Matrix-matrix multiplication First look at stencil applications LU factorization Basic version, pipelining on the ring, look-ahead algorithm Second look at stencil applications Granularity, overlap, mapping, dependencies Implementing logical topologies Distributed vs. centralized implementations Summary of algorithmic principles Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 8/ 10

9 Processor grids and load balancing (3 weeks) Logical 2-D grid topologies Matrix multiplication on processor grids Outer-product algorithm Grid vs. ring? Three matrix multiplication algorithms 2-D block cyclic data distribution Load balancing for heterogeneous platforms Load balancing for 1-D data distributions Static vs. incremental allocation algorithm Application to stencils and LU factorization Load balancing for 2-D data distributions Matrix multiplication on a heterogeneous grid Hardness of the 2-D data partitioning problem Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 9/ 10

10 Scheduling and loop parallelization (4 weeks) Where do task graphs come from? Solving Pb( ) Solving Pb(p) NP-completeness of Pb(p), list schedules, Graham s bound and critical paths Approximation algorithms for independent tasks Taking Communication Costs Into Account NP-completeness of Pb( ), guaranteed heuristics List heuristics for Pb(p) HEFT (extension to heterogeneous platforms) Scheduling at Compile-Time Dependence levels and Kennedy-Allen algorithm Dependence vectors and Lamport s hyperplane method Uniform loop nests and unimodular space-time transformations Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 10/ 10

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K. Fundamentals of Parallel Computing Sanjay Razdan Alpha Science International Ltd. Oxford, U.K. CONTENTS Preface Acknowledgements vii ix 1. Introduction to Parallel Computing 1.1-1.37 1.1 Parallel Computing

More information

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Part 2 1 3 Maximum Selection Problem : Given n numbers, x 1, x 2,, x

More information

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11 Preface xvii Acknowledgments xix CHAPTER 1 Introduction to Parallel Computing 1 1.1 Motivating Parallelism 2 1.1.1 The Computational Power Argument from Transistors to FLOPS 2 1.1.2 The Memory/Disk Speed

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem. EE/CSCI 451 Spring 2018 Homework 8 Total Points: 100 1 [10 points] Explain the following terms: EREW PRAM CRCW PRAM Brent s Theorem BSP model 1 2 [15 points] Assume two sorted sequences of size n can be

More information

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day 184.727: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day Jesper Larsson Träff, Francesco Versaci Parallel Computing Group TU Wien October 16,

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining

More information

Introduction to Parallel Computing

Introduction to Parallel Computing Introduction to Parallel Computing George Karypis Sorting Outline Background Sorting Networks Quicksort Bucket-Sort & Sample-Sort Background Input Specification Each processor has n/p elements A ordering

More information

Mapping pipeline skeletons onto heterogeneous platforms

Mapping pipeline skeletons onto heterogeneous platforms Mapping pipeline skeletons onto heterogeneous platforms Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon January 2007 Yves.Robert@ens-lyon.fr January 2007 Mapping pipeline skeletons

More information

Basic Communication Ops

Basic Communication Ops CS 575 Parallel Processing Lecture 5: Ch 4 (GGKK) Sanjay Rajopadhye Colorado State University Basic Communication Ops n PRAM, final thoughts n Quiz 3 n Collective Communication n Broadcast & Reduction

More information

Basic Communication Operations (Chapter 4)

Basic Communication Operations (Chapter 4) Basic Communication Operations (Chapter 4) Vivek Sarkar Department of Computer Science Rice University vsarkar@cs.rice.edu COMP 422 Lecture 17 13 March 2008 Review of Midterm Exam Outline MPI Example Program:

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

CS256 Applied Theory of Computation

CS256 Applied Theory of Computation CS256 Applied Theory of Computation Parallel Computation IV John E Savage Overview PRAM Work-time framework for parallel algorithms Prefix computations Finding roots of trees in a forest Parallel merging

More information

Scheduling Tasks Sharing Files from Distributed Repositories

Scheduling Tasks Sharing Files from Distributed Repositories from Distributed Repositories Arnaud Giersch 1, Yves Robert 2 and Frédéric Vivien 2 1 ICPS/LSIIT, University Louis Pasteur, Strasbourg, France 2 École normale supérieure de Lyon, France September 1, 2004

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

Sorting (Chapter 9) Alexandre David B2-206

Sorting (Chapter 9) Alexandre David B2-206 Sorting (Chapter 9) Alexandre David B2-206 1 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms Dr Izadi CSE-40533 Introduction to Parallel Processing Chapter 5 PRAM and Basic Algorithms Define PRAM and its various submodels Show PRAM to be a natural extension of the sequential computer (RAM) Develop

More information

Sorting (Chapter 9) Alexandre David B2-206

Sorting (Chapter 9) Alexandre David B2-206 Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =

More information

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup Sorting Algorithms - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup The worst-case time complexity of mergesort and the average time complexity of quicksort are

More information

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2 Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 2 Trivia ISRO has a new supercomputer rated at 220 Tflops Can be extended to Pflops. Consumes only 150 KW of power. LINPACK is

More information

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206 1 Today Characteristics of Tasks and Interactions (3.3). Mapping Techniques for Load Balancing (3.4). Methods for Containing Interaction

More information

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

Complexity results for throughput and latency optimization of replicated and data-parallel workflows Complexity results for throughput and latency optimization of replicated and data-parallel workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon June 2007 Anne.Benoit@ens-lyon.fr

More information

A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer

A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer Orieb AbuAlghanam, Mohammad Qatawneh Computer Science Department University of Jordan Hussein A. al Ofeishat

More information

COMP Parallel Computing. PRAM (4) PRAM models and complexity

COMP Parallel Computing. PRAM (4) PRAM models and complexity COMP 633 - Parallel Computing Lecture 5 September 4, 2018 PRAM models and complexity Reading for Thursday Memory hierarchy and cache-based systems Topics Comparison of PRAM models relative performance

More information

The PRAM Model. Alexandre David

The PRAM Model. Alexandre David The PRAM Model Alexandre David 1.2.05 1 Outline Introduction to Parallel Algorithms (Sven Skyum) PRAM model Optimality Examples 11-02-2008 Alexandre David, MVP'08 2 2 Standard RAM Model Standard Random

More information

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD Parallel Algorithms Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD The PRAM Model Parallel Random Access Machine All processors

More information

CSE : PARALLEL SOFTWARE TOOLS

CSE : PARALLEL SOFTWARE TOOLS CSE 4392-601: PARALLEL SOFTWARE TOOLS (Summer 2002: T R 1:00-2:50, Nedderman 110) Instructor: Bob Weems, Associate Professor Office: 344 Nedderman, 817/272-2337, weems@uta.edu Hours: T R 3:00-5:30 GTA:

More information

Lecture 8 Parallel Algorithms II

Lecture 8 Parallel Algorithms II Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel

More information

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows Anne Benoit and Yves Robert GRAAL team, LIP École Normale Supérieure de Lyon September 2007 Anne.Benoit@ens-lyon.fr

More information

Advanced Computer Architecture. The Architecture of Parallel Computers

Advanced Computer Architecture. The Architecture of Parallel Computers Advanced Computer Architecture The Architecture of Parallel Computers Computer Systems No Component Can be Treated In Isolation From the Others Application Software Operating System Hardware Architecture

More information

Lecture outline. Graph coloring Examples Applications Algorithms

Lecture outline. Graph coloring Examples Applications Algorithms Lecture outline Graph coloring Examples Applications Algorithms Graph coloring Adjacent nodes must have different colors. How many colors do we need? Graph coloring Neighbors must have different colors

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

Parallel Longest Increasing Subsequences in Scalable Time and Memory

Parallel Longest Increasing Subsequences in Scalable Time and Memory Parallel Longest Increasing Subsequences in Scalable Time and Memory Peter Krusche Alexander Tiskin Department of Computer Science University of Warwick, Coventry, CV4 7AL, UK PPAM 2009 What is in this

More information

Parallel Sorting Algorithms

Parallel Sorting Algorithms Parallel Sorting Algorithms Ricardo Rocha and Fernando Silva Computer Science Department Faculty of Sciences University of Porto Parallel Computing 2016/2017 (Slides based on the book Parallel Programming:

More information

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors CSCE 750, Spring 2001 Notes 3 Page 1 5 Parallel Algorithms 5.1 Basic Concepts With ordinary computers and serial (=sequential) algorithms, we have one processor and one memory. We count the number of operations

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 17 January 2017 Last Thursday

More information

Design of Parallel Algorithms. Models of Parallel Computation

Design of Parallel Algorithms. Models of Parallel Computation + Design of Parallel Algorithms Models of Parallel Computation + Chapter Overview: Algorithms and Concurrency n Introduction to Parallel Algorithms n Tasks and Decomposition n Processes and Mapping n Processes

More information

Algorithms & Data Structures 2

Algorithms & Data Structures 2 Algorithms & Data Structures 2 PRAM Algorithms WS2017 B. Anzengruber-Tanase (Institute for Pervasive Computing, JKU Linz) (Institute for Pervasive Computing, JKU Linz) RAM MODELL (AHO/HOPCROFT/ULLMANN

More information

Scheduling on clusters and grids

Scheduling on clusters and grids Some basics on scheduling theory Grégory Mounié, Yves Robert et Denis Trystram ID-IMAG 6 mars 2006 Some basics on scheduling theory 1 Some basics on scheduling theory Notations and Definitions List scheduling

More information

What is Parallel Computing?

What is Parallel Computing? What is Parallel Computing? Parallel Computing is several processing elements working simultaneously to solve a problem faster. 1/33 What is Parallel Computing? Parallel Computing is several processing

More information

Compilation for Heterogeneous Platforms

Compilation for Heterogeneous Platforms Compilation for Heterogeneous Platforms Grid in a Box and on a Chip Ken Kennedy Rice University http://www.cs.rice.edu/~ken/presentations/heterogeneous.pdf Senior Researchers Ken Kennedy John Mellor-Crummey

More information

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest Introduction to Algorithms Preface xiii 1 Introduction 1 1.1 Algorithms 1 1.2 Analyzing algorithms 6 1.3 Designing algorithms 1 1 1.4 Summary 1 6

More information

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Computing 2012 Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides) Parallel Algorithm Design Outline Computational Model Design Methodology Partitioning Communication

More information

Automated Mapping of Regular Communication Graphs on Mesh Interconnects

Automated Mapping of Regular Communication Graphs on Mesh Interconnects Automated Mapping of Regular Communication Graphs on Mesh Interconnects Abhinav Bhatele, Gagan Gupta, Laxmikant V. Kale and I-Hsin Chung Motivation Running a parallel application on a linear array of processors:

More information

Mapping Linear Workflows with Computation/Communication Overlap

Mapping Linear Workflows with Computation/Communication Overlap Mapping Linear Workflows with Computation/Communication Overlap 1 Kunal Agrawal 1, Anne Benoit,2 and Yves Robert 2 1 CSAIL, Massachusetts Institute of Technology, USA 2 LIP, École Normale Supérieure de

More information

Lecture 4: Principles of Parallel Algorithm Design (part 4)

Lecture 4: Principles of Parallel Algorithm Design (part 4) Lecture 4: Principles of Parallel Algorithm Design (part 4) 1 Mapping Technique for Load Balancing Minimize execution time Reduce overheads of execution Sources of overheads: Inter-process interaction

More information

A Parallel Algorithm for Relational Coarsest Partition Problems and Its Implementation

A Parallel Algorithm for Relational Coarsest Partition Problems and Its Implementation A Parallel Algorithm for Relational Coarsest Partition Problems and Its Implementation Insup Lee and S. Rajasekaran Department of Computer and Information Science University of Pennsylvania Philadelphia,

More information

About this exam review

About this exam review Final Exam Review About this exam review I ve prepared an outline of the material covered in class May not be totally complete! Exam may ask about things that were covered in class but not in this review

More information

Sorting on Linear Arrays. Xuan Guo

Sorting on Linear Arrays. Xuan Guo Sorting on Linear Arrays Xuan Guo 1 Outline Motivation & Models Sorting algorithms on linear array Sorting by Comparison Exchange Sorting by Merging Paper Reference 2 Motivation Linear array is the simplest

More information

CSIT5300: Advanced Database Systems

CSIT5300: Advanced Database Systems CSIT5300: Advanced Database Systems E10: Exercises on Query Processing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,

More information

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Workloads Programmierung Paralleler und Verteilter Systeme (PPV) Sommer 2015 Frank Feinbube, M.Sc., Felix Eberhardt, M.Sc., Prof. Dr. Andreas Polze Workloads 2 Hardware / software execution environment

More information

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved An Introduction to Parallel Programming Peter Pacheco Chapter 3 Distributed Memory Programming with MPI 1 Roadmap Writing your first MPI program. Using the common MPI functions. The Trapezoidal Rule in

More information

Multicast Communications. Tarik Čičić, 4. March. 2016

Multicast Communications. Tarik Čičić, 4. March. 2016 Multicast Communications Tarik Čičić, 4. March. 06 Overview One-to-many communication, why and how Algorithmic approach: Steiner trees Practical algorithms Multicast tree types Basic concepts in multicast

More information

DPHPC: Performance Recitation session

DPHPC: Performance Recitation session SALVATORE DI GIROLAMO DPHPC: Performance Recitation session spcl.inf.ethz.ch Administrativia Reminder: Project presentations next Monday 9min/team (7min talk + 2min questions) Presentations

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 9 Document Classification Chapter Objectives Complete introduction of MPI functions Show how to implement manager-worker programs

More information

Principles of Parallel Algorithm Design: Concurrency and Mapping

Principles of Parallel Algorithm Design: Concurrency and Mapping Principles of Parallel Algorithm Design: Concurrency and Mapping John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 3 28 August 2018 Last Thursday Introduction

More information

IE 495 Lecture 3. Septermber 5, 2000

IE 495 Lecture 3. Septermber 5, 2000 IE 495 Lecture 3 Septermber 5, 2000 Reading for this lecture Primary Miller and Boxer, Chapter 1 Aho, Hopcroft, and Ullman, Chapter 1 Secondary Parberry, Chapters 3 and 4 Cosnard and Trystram, Chapter

More information

Document Classification Problem

Document Classification Problem Document Classification Problem Search directories, subdirectories for documents (look for.html,.txt,.tex, etc.) Using a dictionary of key words, create a profile vector for each document Store profile

More information

L3S Research Center, University of Hannover

L3S Research Center, University of Hannover , University of Hannover Dynamics of Wolf-Tilo Balke and Wolf Siberski 21.11.2007 *Original slides provided by S. Rieche, H. Niedermayer, S. Götz, K. Wehrle (University of Tübingen) and A. Datta, K. Aberer

More information

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU. The PRAM (Parallel Random Access Memory) model All processors operate synchronously under the control of a common CPU. The PRAM (Parallel Random Access Memory) model All processors operate synchronously

More information

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 The PRAM model A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 Introduction The Parallel Random Access Machine (PRAM) is one of the simplest ways to model a parallel computer. A PRAM consists of

More information

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3 Preface About the Authors PARTI BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS l. 1 The Demand for Computational Speed 3 1.2 Potential for Increased Computational Speed 6 Speedup Factor 6 What Is the Maximum

More information

An Introduction to Parallel Programming

An Introduction to Parallel Programming F 'C 3 R'"'C,_,. HO!.-IJJ () An Introduction to Parallel Programming Peter S. Pacheco University of San Francisco ELSEVIER AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO

More information

Algorithm Engineering with PRAM Algorithms

Algorithm Engineering with PRAM Algorithms Algorithm Engineering with PRAM Algorithms Bernard M.E. Moret moret@cs.unm.edu Department of Computer Science University of New Mexico Albuquerque, NM 87131 Rome School on Alg. Eng. p.1/29 Measuring and

More information

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Non-Uniform Memory Access (NUMA) Architecture and Multicomputers Parallel and Distributed Computing Department of Computer Science and Engineering (DEI) Instituto Superior Técnico February 29, 2016 CPD

More information

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring Parallel Programs Conditions of Parallelism: Data Dependence Control Dependence Resource Dependence Bernstein s Conditions Asymptotic Notations for Algorithm Analysis Parallel Random-Access Machine (PRAM)

More information

14 More Graphs: Euler Tours and Hamilton Cycles

14 More Graphs: Euler Tours and Hamilton Cycles 14 More Graphs: Euler Tours and Hamilton Cycles 14.1 Degrees The degree of a vertex is the number of edges coming out of it. The following is sometimes called the First Theorem of Graph Theory : Lemma

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP Part 2 1 OPENMP: SORTING 1 Bubble Sort Serial Odd-Even Transposition Sort 2 Serial Odd-Even Transposition Sort First OpenMP Odd-Even

More information

ECE 574 Cluster Computing Lecture 13

ECE 574 Cluster Computing Lecture 13 ECE 574 Cluster Computing Lecture 13 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 21 March 2017 Announcements HW#5 Finally Graded Had right idea, but often result not an *exact*

More information

EE/CSCI 451: Parallel and Distributed Computation

EE/CSCI 451: Parallel and Distributed Computation EE/CSCI 451: Parallel and Distributed Computation Lecture #11 2/21/2017 Xuehai Qian Xuehai.qian@usc.edu http://alchem.usc.edu/portal/xuehaiq.html University of Southern California 1 Outline Midterm 1:

More information

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter Lecture Topics Today: Advanced Scheduling (Stallings, chapter 10.1-10.4) Next: Deadlock (Stallings, chapter 6.1-6.6) 1 Announcements Exam #2 returned today Self-Study Exercise #10 Project #8 (due 11/16)

More information

Implementation and evaluation of 3D FFT parallel algorithms based on software component model

Implementation and evaluation of 3D FFT parallel algorithms based on software component model Master 2 - Visualisation Image Performance University of Orléans (2013-2014) Implementation and evaluation of 3D FFT parallel algorithms based on software component model Jérôme RICHARD October 7th 2014

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 6: Parallel Algorithms The PRAM Model Jan Křetínský Winter 2017/18 Chapter 6: Parallel Algorithms The PRAM Model, Winter 2017/18 1 Example: Parallel Sorting Definition Sorting

More information

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3 UMEÅ UNIVERSITET Institutionen för datavetenskap Lars Karlsson, Bo Kågström och Mikael Rännar Design and Analysis of Algorithms for Parallel Computer Systems VT2009 June 2, 2009 Exam Design and Analysis

More information

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel

More information

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

2, 3, 5, 7, 11, 17, 19, 23, 29, 31 148 Chapter 12 Indexing and Hashing implementation may be by linking together fixed size buckets using overflow chains. Deletion is difficult with open hashing as all the buckets may have to inspected

More information

Centralized versus distributed schedulers for multiple bag-of-task applications

Centralized versus distributed schedulers for multiple bag-of-task applications Centralized versus distributed schedulers for multiple bag-of-task applications O. Beaumont, L. Carter, J. Ferrante, A. Legrand, L. Marchal and Y. Robert Laboratoire LaBRI, CNRS Bordeaux, France Dept.

More information

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems International Journal of Information and Education Technology, Vol., No. 5, December A Level-wise Priority Based Task Scheduling for Heterogeneous Systems R. Eswari and S. Nickolas, Member IACSIT Abstract

More information

LooPo: Automatic Loop Parallelization

LooPo: Automatic Loop Parallelization LooPo: Automatic Loop Parallelization Michael Claßen Fakultät für Informatik und Mathematik Düsseldorf, November 27 th 2008 Model-Based Loop Transformations model-based approach: map source code to an

More information

The Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED.

The Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED. The Power of Streams on the SRC MAP Wim Bohm Colorado State University RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RSRV. MAP C Pure C runs on the MAP Generated code: circuits Basic blocks in

More information

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator EXAMINATION ( End Semester ) SEMESTER ( Autumn ) Roll Number Section Name Subject Number C S 6 0 0 2 6 Subject Name Parallel

More information

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms Parallel Models Advanced Algorithms Piyush Kumar (Lecture 10: Parallel Algorithms) An abstract description of a real world parallel machine. Attempts to capture essential features (and suppress details?)

More information

Parallel Random-Access Machines

Parallel Random-Access Machines Parallel Random-Access Machines Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS3101 (Moreno Maza) Parallel Random-Access Machines CS3101 1 / 69 Plan 1 The PRAM Model 2 Performance

More information

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 Lecture 6: Overlay Networks CS 598: Advanced Internetworking Matthew Caesar February 15, 2011 1 Overlay networks: Motivations Protocol changes in the network happen very slowly Why? Internet is shared

More information

Matrix multiplication

Matrix multiplication Matrix multiplication Standard serial algorithm: procedure MAT_VECT (A, x, y) begin for i := 0 to n - 1 do begin y[i] := 0 for j := 0 to n - 1 do y[i] := y[i] + A[i, j] * x [j] end end MAT_VECT Complexity:

More information

15-750: Parallel Algorithms

15-750: Parallel Algorithms 5-750: Parallel Algorithms Scribe: Ilari Shafer March {8,2} 20 Introduction A Few Machine Models of Parallel Computation SIMD Single instruction, multiple data: one instruction operates on multiple data

More information

Parallel Numerics, WT 2013/ Introduction

Parallel Numerics, WT 2013/ Introduction Parallel Numerics, WT 2013/2014 1 Introduction page 1 of 122 Scope Revise standard numerical methods considering parallel computations! Required knowledge Numerics Parallel Programming Graphs Literature

More information

OBJECT ORIENTED DATA STRUCTURE & ALGORITHMS

OBJECT ORIENTED DATA STRUCTURE & ALGORITHMS OBJECT ORIENTED DATA STRUCTURE & ALGORITHMS C++ PROGRAMMING LANGUAGE CONTENT C++ Language Contents: Introduction to C++ Language Difference and Similarities between C and C++ Role Of Compilers and Assemblers

More information

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs

More information

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9 Chapter 9 Document Classification Document Classification Problem Search directories, subdirectories for documents (look for.html,.txt,.tex, etc.) Using a dictionary of key words, create a profile vector

More information

CSC630/CSC730 Parallel & Distributed Computing

CSC630/CSC730 Parallel & Distributed Computing CSC630/CSC730 Parallel & Distributed Computing Parallel Sorting Chapter 9 1 Contents General issues Sorting network Bitonic sort Bubble sort and its variants Odd-even transposition Quicksort Other Sorting

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting

More information

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David Scalable Algorithmic Techniques Decompositions & Mapping Alexandre David 1.2.05 adavid@cs.aau.dk Introduction Focus on data parallelism, scale with size. Task parallelism limited. Notion of scalability

More information

Enhancing Parallelism

Enhancing Parallelism CSC 255/455 Software Analysis and Improvement Enhancing Parallelism Instructor: Chen Ding Chapter 5,, Allen and Kennedy www.cs.rice.edu/~ken/comp515/lectures/ Where Does Vectorization Fail? procedure vectorize

More information

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links Arnaud Legrand, Hélène Renard, Yves Robert, and Frédéric Vivien LIP, UMR CNRS-INRIA-UCBL 5668, École normale

More information

Parallel Programming. Functional Decomposition (Document Classification)

Parallel Programming. Functional Decomposition (Document Classification) Parallel Programming Functional Decomposition (Document Classification) Document Classification Problem Search directories, subdirectories for text documents (look for.html,.txt,.tex, etc.) Using a dictionary

More information

High-Performance Parallel Database Processing and Grid Databases

High-Performance Parallel Database Processing and Grid Databases High-Performance Parallel Database Processing and Grid Databases David Taniar Monash University, Australia Clement H.C. Leung Hong Kong Baptist University and Victoria University, Australia Wenny Rahayu

More information

Porting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method

Porting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method Porting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method Josh Romero, Massimiliano Fatica - NVIDIA Vamsi Spandan, Roberto Verzicco -

More information