Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

Similar documents
Fundamental Algorithms

Parallel Random Access Machine (PRAM)

Fundamental Algorithms

Real parallel computers

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms

The PRAM (Parallel Random Access Memory) model. All processors operate synchronously under the control of a common CPU.

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

CS256 Applied Theory of Computation

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

PRAM Divide and Conquer Algorithms

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

DPHPC: Performance Recitation session

COMP Parallel Computing. PRAM (4) PRAM models and complexity

Parallel Random-Access Machines

Algorithms & Data Structures 2

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

SHARED MEMORY VS DISTRIBUTED MEMORY

What is Parallel Computing?

Advanced Computer Architecture. The Architecture of Parallel Computers

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

PRAM (Parallel Random Access Machine)

Data Structures and Algorithms CSE 465

Data Structures and Algorithms

Arrays aren t going to work. What can we do? Use pointers Copy a large section of a heap, with a single pointer assignment

Student Number: CSE191 Midterm II Spring Plagiarism will earn you an F in the course and a recommendation of expulsion from the university.

IE 495 Lecture 3. Septermber 5, 2000

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

Paradigms for Parallel Algorithms

Chapter 2 Parallel Computer Models & Classification Thoai Nam

CSCE 750, Spring 2001 Notes 3 Page Symmetric Multi Processors (SMPs) (e.g., Cray vector machines, Sun Enterprise with caveats) Many processors

Data Structures and Algorithms

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Fundamentals of. Parallel Computing. Sanjay Razdan. Alpha Science International Ltd. Oxford, U.K.

Collection of priority-job pairs; priorities are comparable.

DO NOT REPRODUCE. CS61B, Fall 2008 Test #3 (revised) P. N. Hilfinger

CSCE 750, Fall 2002 Notes 3 Page 2 with memory access time. And this is not easy Symmetric Multi Processors (SMPs) (e.g., Cray vector machines,

Binary Heaps. CSE 373 Data Structures Lecture 11

each processor can in one step do a RAM op or read/write to one global memory location

CSL 730: Parallel Programming. Algorithms

CSL 730: Parallel Programming

Recursion. COMS W1007 Introduction to Computer Science. Christopher Conway 26 June 2003

CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics

Priority Queues. Priority Queues Trees and Heaps Representations of Heaps Algorithms on Heaps Building a Heap Heapsort.

The PRAM Model. Alexandre David

Priority queues. Priority queues. Priority queue operations

A Parallel Algorithm for Relational Coarsest Partition Problems and Its Implementation

Examination Questions Midterm 2

Heaps. Heaps. A heap is a complete binary tree.

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell. COMP4300/8300 Lecture 2-1 Copyright c 2015 The Australian National University

Lecture 5: Sorting Part A

COMP4300/8300: Overview of Parallel Hardware. Alistair Rendell

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

Parallel scan on linked lists

Chapter 6. Parallel Algorithms. Chapter by M. Ghaari. Last update 1 : January 2, 2019.

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

An NC Algorithm for Sorting Real Numbers

Readings. Priority Queue ADT. FindMin Problem. Priority Queues & Binary Heaps. List implementation of a Priority Queue

Priority Queues and Heaps (continues) Chapter 13: Heaps, Balances Trees and Hash Tables Hash Tables In-class Work / Suggested homework.

Structure and Interpretation of Computer Programs Fall 2016 Midterm 2

Algorithms Dr. Haim Levkowitz

Transform & Conquer. Presorting

ALGORITHM DESIGN DYNAMIC PROGRAMMING. University of Waterloo

A Many-Core Machine Model for Designing Algorithms with Minimum Parallelism Overheads

CSE 4351/5351 Notes 9: PRAM and Other Theoretical Model s

Introduction to Parallel Computing

CS S-06 Binary Search Trees 1

CSE 4500 (228) Fall 2010 Selected Notes Set 2

Properties of a heap (represented by an array A)

Priority queues. Priority queues. Priority queue operations

Midterm solutions. n f 3 (n) = 3

CS 140 : Numerical Examples on Shared Memory with Cilk++

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

Data Structures. Giri Narasimhan Office: ECS 254A Phone: x-3748

Introduction to Parallel Algorithms

CSL 201 Data Structures Mid-Semester Exam minutes

Heap sort. Carlos Moreno uwaterloo.ca EIT

Structure and Interpretation of Computer Programs

Models In Parallel Computation

Parallel algorithms at ENS Lyon

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

CS/COE 1501

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Basic Communication Ops

Chapter 3: The Efficiency of Algorithms. Invitation to Computer Science, C++ Version, Third Edition

Priority Queues. 04/10/03 Lecture 22 1

Priority Queues. Lecture15: Heaps. Priority Queue ADT. Sequence based Priority Queue

Lecture 13: AVL Trees and Binary Heaps

Heaps. Heapsort. [Reading: CLRS 6] Laura Toma, csci2000, Bowdoin College

( ) n 3. n 2 ( ) D. Ο

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

The heap is essentially an array-based binary tree with either the biggest or smallest element at the root.

Parallel Connected Components

Chapter 3: The Efficiency of Algorithms

A data structure and associated algorithms, NOT GARBAGE COLLECTION

Topic: Heaps and priority queues

Chapter 3: The Efficiency of Algorithms Invitation to Computer Science,

Transcription:

Parallel Algorithms

Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

The PRAM Model Parallel Random Access Machine All processors act in lock-step Number of processors is not limited All processors have local memory One global memory accessible to all processors Processors must read and write global memory

A Pram Algorithm Every Processor knows its own index (usually indicated by variable i) Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i];

Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For

Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For

Pointer Jumping Read M[i] Into Total; For k := 1 to lg n Read Next[i] into Ptr If Ptr 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For

Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If

Calculate Node Depth I If there is a Left Child 1-1 To 1 of Left Child 0 From -1 of Left Child

Calculate Node Depth 2 If there is no left child 1-1 0

Calculate Node Depth 3 1-1 If there is a Right Child 0 From -1 of Right Child To 1 of Right Child

Calculate Node Depth 4 1-1 0 If there is no right child

Concurrent Reads & Writes EREW - Exclusive Read, Exclusive Write CREW - Common Read, Exclusive Write CRCW - Common Read, Common Write All common writes must write the same thing Highest Priority Processor wins contest CREW is more powerful than EREW CRCW is more powerful than CREW

Finding Max Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If

CRCW V.S. CREW CRCW Max runs in constant time CREW Max runs in lg n time CRCW cannot be any better than lg p faster than EREW

EREW V.S. CREW Finding Roots by Shortcutting Pointers CREW Runs in lg lg n Time EREW Runs in lg n Time

Optimal Parallel Algorithms NC -- The class of algorithms that run in Θ(log m n) time using Θ(n k ) processors General Boolean Functions Cannot be Computed any Faster than Θ(lg n) Θ(lg n) is optimal for computing the sum of n integers

Parallel Algorithms

Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

The PRAM Model Parallel Random Access Machine All processors act in lock-step Number of processors is not limited All processors have local memory One global memory accessible to all processors Processors must read and write global memory

A Pram Algorithm Every Processor knows its own index (usually indicated by variable i) Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i];

Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For

Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For

Pointer Jumping Read M[i] Into Total; For k := 1 to lg n Read Next[i] into Ptr If Ptr 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For

Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If

Calculate Node Depth I If there is a Left Child 1-1 To 1 of Left Child 0 From -1 of Left Child

Calculate Node Depth 2 If there is no left child 1-1 0

Calculate Node Depth 3 1-1 If there is a Right Child 0 From -1 of Right Child To 1 of Right Child

Calculate Node Depth 4 1-1 0 If there is no right child

Concurrent Reads & Writes EREW - Exclusive Read, Exclusive Write CREW - Common Read, Exclusive Write CRCW - Common Read, Common Write All common writes must write the same thing Highest Priority Processor wins contest CREW is more powerful than EREW CRCW is more powerful than CREW

Finding Max Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If

CRCW V.S. CREW CRCW Max runs in constant time CREW Max runs in lg n time CRCW cannot be any better than lg p faster than EREW

EREW V.S. CREW Finding Roots by Shortcutting Pointers CREW Runs in lg lg n Time EREW Runs in lg n Time

Optimal Parallel Algorithms NC -- The class of algorithms that run in Θ(log m n) time using Θ(n k ) processors General Boolean Functions Cannot be Computed any Faster than Θ(lg n) Θ(lg n) is optimal for computing the sum of n integers