Parallel algorithms at ENS Lyon

Similar documents
Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

Contents. Preface xvii Acknowledgments. CHAPTER 1 Introduction to Parallel Computing 1. CHAPTER 2 Parallel Programming Platforms 11

Algorithms and Applications

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

Introduction to Parallel Computing

Mapping pipeline skeletons onto heterogeneous platforms

Basic Communication Ops

Basic Communication Operations (Chapter 4)

Lecture 3: Sorting 1

CS256 Applied Theory of Computation

Scheduling Tasks Sharing Files from Distributed Repositories

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Sorting (Chapter 9) Alexandre David B2-206

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms

Sorting (Chapter 9) Alexandre David B2-206

Sorting Algorithms. - rearranging a list of numbers into increasing (or decreasing) order. Potential Speedup

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Principle Of Parallel Algorithm Design (cont.) Alexandre David B2-206

Complexity results for throughput and latency optimization of replicated and data-parallel workflows

A New Parallel Matrix Multiplication Algorithm on Tree-Hypercube Network using IMAN1 Supercomputer

COMP Parallel Computing. PRAM (4) PRAM models and complexity

The PRAM Model. Alexandre David

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

CSE : PARALLEL SOFTWARE TOOLS

Lecture 8 Parallel Algorithms II

Complexity Results for Throughput and Latency Optimization of Replicated and Data-parallel Workflows

Advanced Computer Architecture. The Architecture of Parallel Computers

Lecture outline. Graph coloring Examples Applications Algorithms

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Parallel Longest Increasing Subsequences in Scalable Time and Memory

Parallel Sorting Algorithms

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors

Principles of Parallel Algorithm Design: Concurrency and Mapping

Design of Parallel Algorithms. Models of Parallel Computation

Algorithms & Data Structures 2

Scheduling on clusters and grids

What is Parallel Computing?

Compilation for Heterogeneous Platforms

Thomas H. Cormen Charles E. Leiserson Ronald L. Rivest. Introduction to Algorithms

Parallel Computing. Slides credit: M. Quinn book (chapter 3 slides), A Grama book (chapter 3 slides)

Automated Mapping of Regular Communication Graphs on Mesh Interconnects

Mapping Linear Workflows with Computation/Communication Overlap

Lecture 4: Principles of Parallel Algorithm Design (part 4)

A Parallel Algorithm for Relational Coarsest Partition Problems and Its Implementation

About this exam review

Sorting on Linear Arrays. Xuan Guo

CSIT5300: Advanced Database Systems

Peter Pacheco. Chapter 3. Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Workloads Programmierung Paralleler und Verteilter Systeme (PPV)

Distributed Memory Programming with MPI. Copyright 2010, Elsevier Inc. All rights Reserved

Multicast Communications. Tarik Čičić, 4. March. 2016

DPHPC: Performance Recitation session

Parallel Programming in C with MPI and OpenMP

Principles of Parallel Algorithm Design: Concurrency and Mapping

IE 495 Lecture 3. Septermber 5, 2000

Document Classification Problem

L3S Research Center, University of Hannover

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

Contents. Preface. About the Authors BASIC TECHNIQUES CHAPTER 1 PARALLEL COMPUTERS. l. 1 The Demand for Computational Speed 3

An Introduction to Parallel Programming

Algorithm Engineering with PRAM Algorithms

Non-Uniform Memory Access (NUMA) Architecture and Multicomputers

Parallel Programs. EECC756 - Shaaban. Parallel Random-Access Machine (PRAM) Example: Asynchronous Matrix Vector Product on a Ring

14 More Graphs: Euler Tours and Hamilton Cycles

CS691/SC791: Parallel & Distributed Computing

ECE 574 Cluster Computing Lecture 13

EE/CSCI 451: Parallel and Distributed Computation

Lecture Topics. Announcements. Today: Advanced Scheduling (Stallings, chapter ) Next: Deadlock (Stallings, chapter

Implementation and evaluation of 3D FFT parallel algorithms based on software component model

Fundamental Algorithms

Exam Design and Analysis of Algorithms for Parallel Computer Systems 9 15 at ÖP3

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

Centralized versus distributed schedulers for multiple bag-of-task applications

A Level-wise Priority Based Task Scheduling for Heterogeneous Systems

LooPo: Automatic Loop Parallelization

The Power of Streams on the SRC MAP. Wim Bohm Colorado State University. RSS!2006 Copyright 2006 SRC Computers, Inc. ALL RIGHTS RESERVED.

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Parallel Random-Access Machines

Lecture 6: Overlay Networks. CS 598: Advanced Internetworking Matthew Caesar February 15, 2011

Matrix multiplication

15-750: Parallel Algorithms

Parallel Numerics, WT 2013/ Introduction

OBJECT ORIENTED DATA STRUCTURE & ALGORITHMS

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 9

CSC630/CSC730 Parallel & Distributed Computing

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

Scalable Algorithmic Techniques Decompositions & Mapping. Alexandre David

Enhancing Parallelism

Load-Balancing Iterative Computations on Heterogeneous Clusters with Shared Communication Links

Parallel Programming. Functional Decomposition (Document Classification)

High-Performance Parallel Database Processing and Grid Databases

Porting Scientific Research Codes to GPUs with CUDA Fortran: Incompressible Fluid Dynamics using the Immersed Boundary Method

Transcription:

Parallel algorithms at ENS Lyon Yves Robert Ecole Normale Supérieure de Lyon & Institut Universitaire de France TCPP Workshop February 2010 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 1/ 10

Outline 1 Scope 2 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 2/ 10

Scope Follow-on of classic CLRS-based algorithm Objective: Apprehend the complexity of parallel algorithms Focus is on models and algorithms Provides a sound basis for parallel programming Not a HPC course Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 3/ 10

Organization 16 weeks Each week = 2h class + 2h supervised exercises (or programming sessions) MPI project Midterm and final exam Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 4/ 10

Outline 1 Scope 2 Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 5/ 10

Models (4 weeks) Sorting networks Odd-even merge sort, 0-1 principle Odd-even transposition sort Odd-even sorting on a 1D network (work optimal) PRAM Models (EREW, CREW, CRCW) Pointer jumping (list ranking, prefix, Euler tour) Performance evaluation Cost, work, speedup and efficiency, Brent s theorem Comparison of PRAM models Model separation, simulation theorem Sorting machine Merge, sorting trees, complexity and correctness Relevance of the PRAM model Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 6/ 10

Networking (3 weeks) Interconnection networks Static and dynamic topologies Communication models Point-to-point communication protocols Case study: the unidirectional ring Broadcast, scatter, all-to-all, pipelined broadcast Case study: the hypercube Labeling vertices, paths and routing Embedding rings and grids Collective communications Peer-to-peer computing Distributed hash tables and structured overlay networks Chord, Plaxton s routing algorithm Multi-casting in a distributed hash table Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 7/ 10

Algorithms on a processor ring (2 weeks) Matrix-vector multiplication Matrix-matrix multiplication First look at stencil applications LU factorization Basic version, pipelining on the ring, look-ahead algorithm Second look at stencil applications Granularity, overlap, mapping, dependencies Implementing logical topologies Distributed vs. centralized implementations Summary of algorithmic principles Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 8/ 10

Processor grids and load balancing (3 weeks) Logical 2-D grid topologies Matrix multiplication on processor grids Outer-product algorithm Grid vs. ring? Three matrix multiplication algorithms 2-D block cyclic data distribution Load balancing for heterogeneous platforms Load balancing for 1-D data distributions Static vs. incremental allocation algorithm Application to stencils and LU factorization Load balancing for 2-D data distributions Matrix multiplication on a heterogeneous grid Hardness of the 2-D data partitioning problem Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 9/ 10

Scheduling and loop parallelization (4 weeks) Where do task graphs come from? Solving Pb( ) Solving Pb(p) NP-completeness of Pb(p), list schedules, Graham s bound and critical paths Approximation algorithms for independent tasks Taking Communication Costs Into Account NP-completeness of Pb( ), guaranteed heuristics List heuristics for Pb(p) HEFT (extension to heterogeneous platforms) Scheduling at Compile-Time Dependence levels and Kennedy-Allen algorithm Dependence vectors and Lamport s hyperplane method Uniform loop nests and unimodular space-time transformations Yves.Robert@ens-lyon.fr February 2010 Parallel algorithms 10/ 10