Example of usage of Prefix Sum Compacting an Array. Example of usage of Prexix Sum Compacting an Array

Similar documents
CSL 730: Parallel Programming

CSL 860: Modern Parallel

CS256 Applied Theory of Computation

CSL 730: Parallel Programming. Algorithms

CS 223: Data Structures and Programming Techniques. Exam 2

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

List Ranking. Chapter 4

List Ranking. Chapter 4

The PRAM Model. Alexandre David

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

Real parallel computers

Introduction to the Analysis of Algorithms. Algorithm

Parallel scan on linked lists

EE 3613: Computer Organization Homework #2

Hypercubes. (Chapter Nine)

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

Lesson 1 1 Introduction

March 3, George Mason University Sorting Networks. Indranil Banerjee. Parallel Sorting: Hardware Level Parallelism

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

CSL 860: Modern Parallel

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

COP 3502 (Computer Science I) Final Exam 12/10/2015. Last Name:, First Name:

CS 179: GPU Programming. Lecture 7

Fast Sorting and Selection. A Lower Bound for Worst Case

Paradigms for Parallel Algorithms

We will focus on data dependencies: when an operand is written at some point and read at a later point. Example:!

Mark Redekopp, All rights reserved. EE 352 Unit 8. HW Constructs

15-853:Algorithms in the Real World. Outline. Parallelism: Lecture 1 Nested parallelism Cost model Parallel techniques and algorithms

Lecture 19: Arithmetic Modules 14-1

Synchronous Computations

Parallel Sorting Algorithms

Introduction to Computers and Programming. Today

CS 170 DISCUSSION 8 DYNAMIC PROGRAMMING. Raymond Chan raychan3.github.io/cs170/fa17.html UC Berkeley Fall 17

Algorithms and Applications

Parallel and Sequential Data Structures and Algorithms Lecture (Spring 2012) Lecture 25 Suffix Arrays

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

CSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components )

CSCI 104 Log Structured Merge Trees. Mark Redekopp

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

Optimal Parallel Randomized Renaming

Scan and its Uses. 1 Scan. 1.1 Contraction CSE341T/CSE549T 09/17/2014. Lecture 8

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

Algorithm Analysis. Spring Semester 2007 Programming and Data Structure 1

CS252 Graduate Computer Architecture Midterm 1 Solutions

Last Lecture: Adder Examples

15-750: Parallel Algorithms

Parallel Models RAM. Parallel RAM aka PRAM. Variants of CRCW PRAM. Advanced Algorithms

Data Structures and Algorithms for Engineers

COMP4128 Programming Challenges

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

CSE030 Fall 2012 Final Exam Friday, December 14, PM

UNIT-2. Problem of size n. Sub-problem 1 size n/2. Sub-problem 2 size n/2. Solution to the original problem

Parallel Scanning. University of Western Ontario, London, Ontario (Canada) Marc Moreno Maza CS2101

Parallel Sorting. Sathish Vadhiyar

ECE250: Algorithms and Data Structures AVL Trees (Part A)

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

Suffix Trees. Martin Farach-Colton Rutgers University & Tokutek, Inc

Writing Parallel Programs; Cost Model.

Sorting Pearson Education, Inc. All rights reserved.

BST Deletion. First, we need to find the value which is easy because we can just use the method we developed for BST_Search.

CIS 194: Homework 6. Due Monday, February 25. Fibonacci numbers

COMP 322: Fundamentals of Parallel Programming. Lecture 22: Parallelism in Java Streams, Parallel Prefix Sums

Appendix A: Source Code

08 A: Sorting III. CS1102S: Data Structures and Algorithms. Martin Henz. March 10, Generated on Tuesday 9 th March, 2010, 09:58

Tutorial 6-7. Dynamic Programming and Greedy

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

A Review of Various Adders for Fast ALU

2008 The McGraw-Hill Companies, Inc. All rights reserved.

Dr. Joe Zhang PDC-3: Parallel Platforms

6.886: Algorithm Engineering

Topic B (Cont d) Dataflow Model of Computation

CSE 638: Advanced Algorithms. Lectures 10 & 11 ( Parallel Connected Components )

Arithmetic Circuits. Nurul Hazlina Adder 2. Multiplier 3. Arithmetic Logic Unit (ALU) 4. HDL for Arithmetic Circuit

Parallel Computing: Parallel Algorithm Design Examples Jin, Hai

1 Short Answer (15 Points Each)

each processor can in one step do a RAM op or read/write to one global memory location

Lecture: Pipelining Basics

ECE608 - Chapter 15 answers

CSE 613: Parallel Programming. Lecture 6 ( Basic Parallel Algorithmic Techniques )

Sorting Goodrich, Tamassia Sorting 1

EE 4683/5683: COMPUTER ARCHITECTURE

COMP171. AVL-Trees (Part 1)

CMPS 102 Solutions to Homework 6

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS

DATA STRUCTURES AND ALGORITHMS

IS 709/809: Computational Methods in IS Research. Algorithm Analysis (Sorting)

/463 Algorithms - Fall 2013 Solution to Assignment 3

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Digital Computer Arithmetic ECE 666

ECE331: Hardware Organization and Design

Parallel Programming. Easy Cases: Data Parallelism

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS

Linear Work Suffix Array Construction

Transportation problem

CS558 Programming Languages

Reading Assignment. Lazy Evaluation

Lecture 19 Sorting Goodrich, Tamassia

1. Gusfield text for chapter 5&6 about suffix trees are scanned and uploaded on the web 2. List of Project ideas is uploaded

Transcription:

Example of usage of Prefix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Initialize B with zeroes Any idea on the solution (first in sequential)? If A[i]!= 0 then B[????] =???? Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 0 0 0 0 0 0 0 0 Hint: use an extra (binary) array such that [i] == 0 if A[i] == 0 [i] == 1 if A[i]!= 0 Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 0 0 0 0 0 0 0 0 Initialize B with zeroes If A[i]!= 0 then B[[i]] = A[i] How can we use? How would you do it in parallel? Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Hint2: compute the prexif sum of = [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 3] And now? A Example of usage of Prefix um 0 0 0 e 1 0 0 0 0 0 Algorithm COMPACT 1. assign value 1 to e i and value 0 to the others B e 1 2. compute the PREFIX UM of these values, and store the results on 3. begin time of tep 3???? for 1 i n pardo begin B(i): = 0; if A(i) 0 then B( (i) ):=A(i)

A Example of usage of Prefix um 0 0 0 e 1 0 0 0 0 0 Algorithm COMPACT 1. assign value 1 to e i and value 0 to the others 2. compute the PREFIX UM of these values, and store the results on 3. begin constant time! for 1 i n pardo begin B(i): = 0; if A(i) 0 then B( (i) ):=A(i) B e 1 Can we compute the prexif sums of X from Z? To understand the parallel solution, let us start from describing it sequentially We will first present a recursive solution Ideas? [2, Z[?],...] How can we use Y? [2, Z[1],...]

[2, Z[1],?] [2, Z[1], Z[1]+X[3], Z[2], Z[2]+X[5],...] [2, Z[1], Z[1]+X[3],?] [2, Z[1], Z[1]+X[3], Z[2], Z[2]+X[5], Z[3], Z[3]+X[7],...] [2, Z[1], Z[1]+X[3], Z[2],?] Prexif ums (parallel version) 1.if n = 1 then s(1) := x(1) return 2.for 1 i n/2 pardo y(i) := x(2i - 1) * x(2i) 3.recursively compute the prefix sums of y(1),..., y(n/2) and store them in z(1),..., z(n/2) 4. for 1 i n pardo i. if i even then s(i) :=z (i/2) ii.if i =1then s(1):= x(1) iii.if i odd then s(i) := z(i-1/2)*x(i)

via Doubling X 3 5 2 3 4 6 3 1 3 8 10 13 17 23 26 27 Y 8 5 10 4 8 13 23 27 Z Y 13 14 13 27 Z Y 27 27 Z T(n) =???? Another interesting technique that can be used to solve the Prexif ums is the Doubling Iterative A processing technique in which accesses or actions are governed by increasing powers or 2 That is, processing proceeds by 1, 2, 4, 8, 16, etc., doubling on each iteration X 3 5 2 3 4 6 3 1 3 8 10 13 17 23 26 27 Y 8 5 10 4 8 13 23 27 Z Y 13 14 13 27 Z Y 27 27 Z At the first step, each X[i] is added to X[i+1] X = [4, 9, 5, 2, 10, 6, 12, 8] X1 = [4, 13, 14, 7, 12, 16, 18, 20] T(n) = T(n/2) + O(1) T(n) = O(log n) How would you continue? W(n) =???? X 3 5 2 3 4 6 3 1 3 8 10 13 17 23 26 27 Y 8 5 10 4 8 13 23 27 Z Y 13 14 13 27 Z Y 27 27 Z T(n) = T(n/2) + O(1) T(n) = O(log n) work-optimal!! At the first step, each X[i] is added to X[i+1] At any time if an index exceeds n, the operation is supressed X = [4, 9, 5, 2, 10, 6, 12, 8] X1 = [4, 13, 14, 7, 12, 16, 18, 20] At the second step, each X[i] is added to X[i+2] X2 = [4, 13, 18, 20, 26, 23, 30, 36] W(n) = W(n/2) + O(n) W(n) = O(n) Next step?

At the first step, each X[i] is added to X[i+1] by Doubling * Operation supressed # contains final sum At any time if an index exceeds n, the operation is supressed X = [4, 9, 5, 2, 10, 6, 12, 8] X1 = [4, 13, 14, 7, 12, 16, 18, 20] At the second step, each X[i] is added to X[i+2] X2 = [4, 13, 18, 20, 26, 23, 30, 36] Doubling Time: At step k, X[i] is added to X[i+2 k-1 ] p = n-1 Tp = O(log n) by Doubling by Doubling * Operation supressed At the first step:???? operations At the second step:???? operations At the third step:???? operations How many steps do we need to finish? by Doubling * Operation supressed # contains final sum by Doubling At the second step:???? operations At the third step:???? operations p =???? Tp = O(????)

by Doubling At the third step:???? operations by Doubling by Doubling by Doubling (n-1) + (n-2) +... + (n-2 log n - 1 ) = by Doubling by Doubling (n-1) + (n-2) +... + (n-2 log n - 1 ) = (n log n) (1 + 2 + 4 + 2 log n - 1 ) =

by Doubling List Ranking INPUT C e 1 1 n ø n CONTENT UCCEOR (n-1) + (n-2) +... + (n-2 log n - 1 ) = (n log n) (1 + 2 + 4 + 2 log n - 1 ) = (n log n) (2 log n 1) OUTPUT Array R such that R(i) is equal to the distance (rank) of item C(i) from the of the list. Idea e 1 e 4 e 5 e 6 e 7 e 8 Algorithm Work Time R: 1 1 1 1 1 1 1 NIL equential N N Recursive N Log N Doubling N log N (2 log N 1) Log N At the beginning we initialize an array R (rank), that will contain the rank of each element That is, the distance of each element from the of the list List Ranking Given a linked list, stored in an array, compute the distance of each element from the (either ) of the list Problem is similar to prefix sums, using all 1 s to sum Called Pointer Jumping (not doubling) when using pointers Don t destroy original list! Idea e 1 e 4 e 5 e 6 e 7 e 8 R: 1 1 1 1 1 1 1 NIL At the beginning, there are two elements with the correct rank...which ones????