Parallel scan on linked lists

Size: px
Start display at page:

Download "Parallel scan on linked lists"

Transcription

1 Parallel scan on linked lists prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy (PI-PPA) LS 00/, Seminář 4 Evropský sociální fond. Praha & EU: Investujeme do vaší budoucnosti prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

2 Linked lists An n-element single-linked list L is represented by an array S of successors. Each element has a unique identification from {,..., n}. L: S(ucc): prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

3 Conversion of a single-linked list to a double-linked list P [i] = j S[j] = i Fully parallel (i.e., linearly scalable) EREW PRAM(n, p) algorithm: T (n, p) = O(n/p). Algorithm Algorithm EREW PRAM SingleDouble (in: S[,..., n]; out: P [,..., n]) for all i :=,..., n do in parallel {P [i] := i; if (S[i] i) then P [S[i]] := i} prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

4 List ranking: sequential vs. parallel algorithms The rank = the distance (= the # of pointers) from the end of the list. Sequential algorithm: trivial by traversing pointers backwards. List ranking by pointer jumping (PJ) on CREW PRAM(n, n) Algorithm Algorithm CREW PRAM ListRanking (in,out:s[,..., n], out: R[,..., n]) for all i :=,..., n do in parallel { if (S[i] = i) then R[i] := 0 else R[i] := ; repeat log n times { R[i] := R[i] + R[S[i]]; S[i] := S[S[i]]}} ( pointer jumping ) prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 4 /

5 Parallel suffix sum Parallel suffix sum on linked lists on CREW PRAM(n, n) Algorithm Alg. CREW PRAM Par Suffix Sum (in,out: S[,..., n], out: V [,..., n]) for all i :=,..., n do in parallel { if (S[i] = i) then V [i] := 0 else V [i] := v i ; repeat log n times { V [i] := V [i] V [S[i]]; S[i] := S[S[i]] }; ( pointer jumping ) if (original V [last] 0) then V [i] := V [i] V [last]} prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 5 /

6 An example of CREW PRAM(6, 6) pointer jumping L: S(ucc): R(ank): prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 6 /

7 The importance of list ranking The ranking allows to transform (permute) a linked list into an array: each element is placed into the location equal to its rank. Since then, all prefix computations can be performed by previous PPSs on this array. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 7 /

8 Scalability issues of pointer jumping The CREW PRAM(n, n) PJ is not cost optimal: C(n, n) = O(n log n). The PJ is oblivious: it keeps jumping even over pointers to the last element. Hence, W (n, n) = O(n log n), too. However, a cost optimization cannot be based on reducing p and assigning n/p elements of S to each processor, since Θ(n/) elements require Θ(log n) jumps to get to the end. The number of elements which are done after step i is i + for i (see the yellow nodes x on the previous figure). CREW PRAM(n, p) PJ algorithm can be made work-optimal only if each processor keeps a list of and jumps only over active (=not done yet) elements. Even this optimized non-oblivious approach does not improve the cost, since one processor can have n/p active elements in all log n steps and hence T (n, p) = O((n/p) log n) and C(n, p) = O(n log n). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 8 /

9 Observation Consider a linked list of n = n/ log n elements and apply pointer jumping using p = n/ log n processors T (n, p) = O(log n) and C(n, p) = O(n). The idea of a scalable list ranking Let p = Θ(n/ log n) and L = n elements. Using p processors, shrink L to L of size n = O(n/ log n) with C(n, p) = O(n), i.e., in T (n, p) = O(log n). Apply pointer jumping to L. Then T (n, p) = O(log n ) = O(log n) and C(n, p) = O(n). 4 Restore L from L and finish ranking computations for elements in L L with the same complexity as in step. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 9 /

10 Symmetry breaking: Independent sets Definition 4 A subset I L is an independent set of a linked list L if i I; if S[i] i then S[i] I. Lemma 5 I L is an independent set i I can be deleted from L in parallel. Proof. If I L is an independent set, then i I; P [i] I. L: L : prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 0 /

11 Local minima of a linked list coloring Lemma 6 The set of local minima of a k-coloring c of an n-element list L is an independent set of size Ω(n/k) and a work-optimal parallel algorithm to determine the local minima. Proof. Let u, v = local minima of c with no other local minima in between. Then u and v cannot be adjacent (If u = S[v], then c[u] < c[v] and c[v] < c[u].) Colors of elements between u and v must form a bitonic sequence of at most k colors I n/(k ) = Ω(n/k). Determining local minima on EREW PRAM is inherently parallel: u in parallel: compare c[s[u]], c[u], and c[p [u]]. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

12 Lemma 7 The best possible coloring is a -coloring. Proof. Two colors do not allow parallel coloring by processors working independently on disjoint sublists of L. Lemma 8 After Θ(log log n) removals of local minima of -colorings of an n-element list L L reduces to L with L n/ log n. Proof. Let m = the number of iterations needed to reduce L to L. Let L k = L after k reductions and I k = the set of local minima of a -coloring of L k. Then I k L k /4 and L k+ = L k I k (/4) L k. By recursion, L k ( k 4) n. L m n/ log n m.4 log log n. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

13 Symmetry breaking and deterministic coin tossing (DCT) Definition 9 Symmetry breaking = an initial n-coloring. If t log i, let bin t (i) = i t... i 0 = the binary representation of i on t bits. DiffPos(t, i, j) = the least bit number in which bin t (i) and bin t (j) differ. Example, DiffPos(5,, 9) = (since bin 5 () = 0 and bin 5 (9) = 00). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

14 Reduction of an n-coloring to a 6-coloring by DCT for i :=,..., n do in parallel c[i] := bin t (i), where t = log n repeat log n times { for i :=,..., n do in parallel { π[i] := DiffPos(t, c[i], c[s[i]]); c [i] := bin t (π[i] + c[i] π[i] ), where t = log t + c[i] := c [i]; }; t := t } prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 4 /

15 Proof of correctness Lemma 0 If c is a valid coloring of L, then c is a coloring of L, too. Proof. c is a coloring c[i] c[s[i]] π[i] is well defined. Assume c is not a valid coloring. Then i such that c [i] = c [S[i]]. c[i]π[i] π[i] = π[s[i]] and c[i] π[i] = c[s[i]] π[s[i]]. But then c[i]π[i] = c[s[i]] π[i] : contradiction. Lemma DCT reduces a t-bit coloring c to t = ( log t + )-bit coloring c. Proof. If c is a t-bit coloring, then for any i, 0 π[i] t the greatest color number in c can be (t ) + = t to encode colors from 0 to t, we need log(t) bits. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 5 /

16 Corollary DCT must be applied O(log n) times, where the log-star function is defined as log (n) = min{i log (i) (x) }. For any realistic value of n, the number of iterations of DCT is at most 5. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 6 /

17 n log n < n < n < n 4 4 < n < n Lemma Iterations of DCT can reduce the number of colors of a coloring only to 6. Proof. Assume t = log n =. Then t = log(t) = and 0 c [i] t = 5. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 7 /

18 6-coloring -coloring for i :=,..., n do in parallel if c[i] = 5 then c[i] {0,, } {c[s[i]], c[p [i]]}; for i :=,..., n do in parallel if c[i] = 4 then c[i] {0,, } {c[s[i]], c[p [i]]}; for i :=,..., n do in parallel if c[i] = then c[i] {0,, } {c[s[i]], c[p [i]]}; Theorem 4 Using DCT, a -coloring on p processors takes T (n, p) = O((n log n)/p) and C(n, p) = O(n log n). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 8 /

19 Input: L = {n, S[,..., n], P [,..., n], R[,..., n]}; Output: R[,..., n]; ( final values of ranks of elements ) Aux: F [,..., n], N[,..., n], c[,..., n];. k := 0; L k := L; n k := L k ;. while n k > n/ log n do {.. apply -coloring c to L k ;.. ( identify I k = the set of local minima of c in L k ); for i :=,..., n k do in parallel if (c[i] < min(c[p [i]], c[s[i]])) then F [i] := N[i] := else F [i] := N[i] := 0;.. ( remove I k from L k );... apply parallel scan to N[,..., n k ];... for i :=,..., n k do in parallel if F [i] = then {U[N[i]] := {i, S[i], P [i], R[i], k}; R[P [i]] := R[P [i]] + R[i]; S[P [i]] := S[i]; P [S[i]] := P [i] };.4. ( compact L k+ = L k I k into consecutive mem. locations );.4.. for i :=,..., n k do in parallel N[i] := F [i];.4.. apply parallel scan to N[,..., n k ];.4.. for i :=,..., n k do in parallel if F [i] = 0 then { S[N[i]] := N[S[i]]; P [N[i]] := N[P [i]]; R[N[i]] := R[i];}.5. n k+ := n k I k ; k := k + ; };. apply pointer jumping to compute ranking in L k ; 4. restore L from L k and rank all removed nodes by reversing steps. and.4; prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 9 /

20 A more scalable list ranking - performance Theorem 5 The previous list ranking algorithm takes on EREW PRAM with p = Θ(n/ log n) T (n, p) = O(log n log log n), C(n, p) = O(n log log n), and W (n, p) = O(n), supposing that we use approximation log n = O(). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 0 /

21 A more scalable list ranking - performance Proof..: T (n k, p k ) = O(log n k ) if p k = Ω(n k / log n k ).,..,.4.,.4.: T (n k, p k ) = O(n k /p k ) = O(log n k ) if p k = Ω(n k / log n k ) T..,.4.: T (n k, p k ) = O(log n k ) if p k = Ω(n k / log n k ), 4: T (n, p) = O(log n log log n) : T (n/ log n, p) = O(log n) C, 4: C(n, p) = O(n log log n) : C(n/ log n, p) = O(n).: W (n k, p k ) = O(n k log n k ) = O(n k ).,..,.4.,.4.: W (n k, p k ) = O(n k ) W..,.4.: W (n k, p k ) = O(n k + p k log p k ) = O(n k ), 4: W (n, p) = O( log log n k= n k ) = O( log log n k= (/4) k n) = O(n) : W (n/ log n, p) = O(n) Note that p = n/ log n > n k / log n k for all k =,..., log log n. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

22 Example S: P: R: F: N: U: i 5 6 S[i] 7 8 P[i] 4 R[i] R: L: R: L: F: N: S: P: R: R: L: R: L: prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /

Parallel Connected Components

Parallel Connected Components Parallel Connected Components prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy

More information

11. Security Techniques on Smart Cards

11. Security Techniques on Smart Cards 11. Security Techniques on Smart Cards Dr.-Ing. Martin Novotný Katedra číslicového návrhu Fakulta informačních technologií České vysoké učení technické v Praze c Martin Novotný, 2011 MI-BHW Bezpečnost

More information

Sequential Logic Synthesis

Sequential Logic Synthesis Sequential Logic Synthesis Logic Circuits Design Seminars WS2010/2011, Lecture 9 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology Czech Technical University in Prague

More information

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2

The PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 The PRAM model A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 Introduction The Parallel Random Access Machine (PRAM) is one of the simplest ways to model a parallel computer. A PRAM consists of

More information

Real parallel computers

Real parallel computers CHAPTER 30 (in old edition) Parallel Algorithms The PRAM MODEL OF COMPUTATION Abbreviation for Parallel Random Access Machine Consists of p processors (PEs), P 0, P 1, P 2,, P p-1 connected to a shared

More information

List Ranking. Chapter 4

List Ranking. Chapter 4 List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ

More information

List Ranking. Chapter 4

List Ranking. Chapter 4 List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ

More information

Servers I. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc.

Servers I. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Servers I. MI-POA, 2011, Lecture 5 1/17 Servers I. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems Faculty of Information Technology Czech

More information

Intrusion Techniques

Intrusion Techniques Intrusion Techniques Mgr. Rudolf B. Blažek, Ph.D. Department of Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network Security MI-SIB, ZS 2011/12,

More information

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS 2

More information

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS

THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS 2

More information

CS256 Applied Theory of Computation

CS256 Applied Theory of Computation CS256 Applied Theory of Computation Parallel Computation IV John E Savage Overview PRAM Work-time framework for parallel algorithms Prefix computations Finding roots of trees in a forest Parallel merging

More information

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day

: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day 184.727: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day Jesper Larsson Träff, Francesco Versaci Parallel Computing Group TU Wien October 16,

More information

Chapter 6. Parallel Algorithms. Chapter by M. Ghaari. Last update 1 : January 2, 2019.

Chapter 6. Parallel Algorithms. Chapter by M. Ghaari. Last update 1 : January 2, 2019. Chapter 6 Parallel Algorithms Chapter by M. Ghaari. Last update 1 : January 2, 2019. This chapter provides an introduction to parallel algorithms. Our highlevel goal is to present \how to think in parallel"

More information

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:

Analyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]: CSE 101 Homework 1 Background (Order and Recurrence Relations), correctness proofs, time analysis, and speeding up algorithms with restructuring, preprocessing and data structures. Due Thursday, April

More information

Servers II. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc.

Servers II. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Servers II. MI-POA, 2011, Lecture 6 1/20 Servers II. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems Faculty of Information Technology

More information

Sorting (Chapter 9) Alexandre David B2-206

Sorting (Chapter 9) Alexandre David B2-206 Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =

More information

CSE 638: Advanced Algorithms. Lectures 10 & 11 ( Parallel Connected Components )

CSE 638: Advanced Algorithms. Lectures 10 & 11 ( Parallel Connected Components ) CSE 6: Advanced Algorithms Lectures & ( Parallel Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Symmetry Breaking: List Ranking break symmetry: t h

More information

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms

CSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms Dr Izadi CSE-40533 Introduction to Parallel Processing Chapter 5 PRAM and Basic Algorithms Define PRAM and its various submodels Show PRAM to be a natural extension of the sequential computer (RAM) Develop

More information

Network Intrusion Goals and Methods

Network Intrusion Goals and Methods Network Intrusion Goals and Methods Mgr. Rudolf B. Blažek, Ph.D. Department of Computer Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network

More information

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.

EE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem. EE/CSCI 451 Spring 2018 Homework 8 Total Points: 100 1 [10 points] Explain the following terms: EREW PRAM CRCW PRAM Brent s Theorem BSP model 1 2 [15 points] Assume two sorted sequences of size n can be

More information

Sorting (Chapter 9) Alexandre David B2-206

Sorting (Chapter 9) Alexandre David B2-206 Sorting (Chapter 9) Alexandre David B2-206 1 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =

More information

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD

Parallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD Parallel Algorithms Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD The PRAM Model Parallel Random Access Machine All processors

More information

FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 1. ÚVOD DO PŘEDMĚTU, LAMBDA CALCULUS

FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 1. ÚVOD DO PŘEDMĚTU, LAMBDA CALCULUS FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 1. ÚVOD DO PŘEDMĚTU, LAMBDA CALCULUS 2011 Jan Janoušek MI-FLP Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti Funkcionální a logické programování

More information

each processor can in one step do a RAM op or read/write to one global memory location

each processor can in one step do a RAM op or read/write to one global memory location Parallel Algorithms Two closely related models of parallel computation. Circuits Logic gates (AND/OR/not) connected by wires important measures PRAM number of gates depth (clock cycles in synchronous circuit)

More information

Algorithms & Data Structures 2

Algorithms & Data Structures 2 Algorithms & Data Structures 2 PRAM Algorithms WS2017 B. Anzengruber-Tanase (Institute for Pervasive Computing, JKU Linz) (Institute for Pervasive Computing, JKU Linz) RAM MODELL (AHO/HOPCROFT/ULLMANN

More information

Example of usage of Prefix Sum Compacting an Array. Example of usage of Prexix Sum Compacting an Array

Example of usage of Prefix Sum Compacting an Array. Example of usage of Prexix Sum Compacting an Array Example of usage of Prefix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Initialize B with zeroes Any idea on the solution (first in sequential)? If A[i]!= 0

More information

CSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components )

CSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components ) CSE 61: Parallel Programming Lecture ( Graph Algorithms: Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Graph Connectivity 1 1 1 6 5 Connected Components:

More information

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques

COMP Parallel Computing. PRAM (2) PRAM algorithm design techniques COMP 633 - Parallel Computing Lecture 3 Aug 29, 2017 PRAM algorithm design techniques Reading for next class (Thu Aug 31): PRAM handout secns 3.6, 4.1, skim section 5. Written assignment 1 is posted, due

More information

Modern Technology of Internet

Modern Technology of Internet Modern Technology of Internet Jiří Navrátil, Josef Vojtěch, Jan Furman, Tomáš Košnar, Sven Ubik, Milan Šárek, Jan Růžička, Martin Pustka, Laban Mwansa, Rudolf Blažek Katedra počítačových systémů FIT České

More information

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 4

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 4 Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 4 Advanced Optimal Solutions 1 8 5 11 2 6 10 4 3 7 12 9 General technique suggests that we solve a smaller problem and extend

More information

Parallel Sorting. Sathish Vadhiyar

Parallel Sorting. Sathish Vadhiyar Parallel Sorting Sathish Vadhiyar Parallel Sorting Problem The input sequence of size N is distributed across P processors The output is such that elements in each processor P i is sorted elements in P

More information

Fundamental Algorithms

Fundamental Algorithms Fundamental Algorithms Chapter 6: Parallel Algorithms The PRAM Model Jan Křetínský Winter 2017/18 Chapter 6: Parallel Algorithms The PRAM Model, Winter 2017/18 1 Example: Parallel Sorting Definition Sorting

More information

CSL 730: Parallel Programming

CSL 730: Parallel Programming CSL 73: Parallel Programming General Algorithmic Techniques Balance binary tree Partitioning Divid and conquer Fractional cascading Recursive doubling Symmetry breaking Pipelining 2 PARALLEL ALGORITHM

More information

Parallel Random-Access Machines

Parallel Random-Access Machines Parallel Random-Access Machines Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS3101 (Moreno Maza) Parallel Random-Access Machines CS3101 1 / 69 Plan 1 The PRAM Model 2 Performance

More information

Lecture 3: Sorting 1

Lecture 3: Sorting 1 Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:

More information

Optimal Parallel Randomized Renaming

Optimal Parallel Randomized Renaming Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously

More information

Representations of Terms Representations of Boolean Networks

Representations of Terms Representations of Boolean Networks Representations of Terms Representations of Boolean Networks Logic Circuits Design Seminars WS2010/2011, Lecture 4 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology

More information

Searching a Sorted Set of Strings

Searching a Sorted Set of Strings Department of Mathematics and Computer Science January 24, 2017 University of Southern Denmark RF Searching a Sorted Set of Strings Assume we have a set of n strings in RAM, and know their sorted order

More information

Lecture 8 Parallel Algorithms II

Lecture 8 Parallel Algorithms II Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel

More information

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms

CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining

More information

Parallel Distributed Memory String Indexes

Parallel Distributed Memory String Indexes Parallel Distributed Memory String Indexes Efficient Construction and Querying Patrick Flick & Srinivas Aluru Computational Science and Engineering Georgia Institute of Technology 1 In this talk Overview

More information

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14

Graph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14 CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

CSL 730: Parallel Programming. Algorithms

CSL 730: Parallel Programming. Algorithms CSL 73: Parallel Programming Algorithms First 1 problem Input: n-bit vector Output: minimum index of a 1-bit First 1 problem Input: n-bit vector Output: minimum index of a 1-bit Algorithm: Divide into

More information

Data Structure and Algorithm Midterm Reference Solution TA

Data Structure and Algorithm Midterm Reference Solution TA Data Structure and Algorithm Midterm Reference Solution TA email: dsa1@csie.ntu.edu.tw Problem 1. To prove log 2 n! = Θ(n log n), it suffices to show N N, c 1, c 2 > 0 such that c 1 n ln n ln n! c 2 n

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting

More information

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors

1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors 1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors on an EREW PRAM: See solution for the next problem. Omit the step where each processor sequentially computes the AND of

More information

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting

Parallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks

More information

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer CSC 8301- Design and Analysis of Algorithms Lecture 5 Decrease and Conquer Algorithm Design Technique Decrease-and-Conquer This algorithm design technique is based on exploiting a relationship between

More information

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2

Complexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2 Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 2 Trivia ISRO has a new supercomputer rated at 220 Tflops Can be extended to Pflops. Consumes only 150 KW of power. LINPACK is

More information

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar

Sorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel

More information

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms

Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Part 2 1 3 Maximum Selection Problem : Given n numbers, x 1, x 2,, x

More information

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans

Lesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans Lesson 1 4 Prefix Sum Definitions Prefix sum given an array...the prefix sum is the sum of all the elements in the array from the beginning to the position, including the value at the position. The sequential

More information

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.

Lecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model. U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture

More information

Search Trees. Undirected graph Directed graph Tree Binary search tree

Search Trees. Undirected graph Directed graph Tree Binary search tree Search Trees Undirected graph Directed graph Tree Binary search tree 1 Binary Search Tree Binary search key property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then

More information

15-750: Parallel Algorithms

15-750: Parallel Algorithms 5-750: Parallel Algorithms Scribe: Ilari Shafer March {8,2} 20 Introduction A Few Machine Models of Parallel Computation SIMD Single instruction, multiple data: one instruction operates on multiple data

More information

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost

Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R

More information

/463 Algorithms - Fall 2013 Solution to Assignment 3

/463 Algorithms - Fall 2013 Solution to Assignment 3 600.363/463 Algorithms - Fall 2013 Solution to Assignment 3 (120 points) I (30 points) (Hint: This problem is similar to parenthesization in matrix-chain multiplication, except the special treatment on

More information

HW Trends and Architectures

HW Trends and Architectures Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty

More information

arxiv: v1 [cs.it] 9 Feb 2009

arxiv: v1 [cs.it] 9 Feb 2009 On the minimum distance graph of an extended Preparata code C. Fernández-Córdoba K. T. Phelps arxiv:0902.1351v1 [cs.it] 9 Feb 2009 Abstract The minimum distance graph of an extended Preparata code P(m)

More information

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator

INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator EXAMINATION ( End Semester ) SEMESTER ( Autumn ) Roll Number Section Name Subject Number C S 6 0 0 2 6 Subject Name Parallel

More information

Models of distributed computing: port numbering and local algorithms

Models of distributed computing: port numbering and local algorithms Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February

More information

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer CSC 8301- Design and Analysis of Algorithms Lecture 5 Decrease and Conuer Algorithm Design Techniue Decrease-and-Conuer This algorithm design techniue is based on exploiting a relationship between a solution

More information

CSL 860: Modern Parallel

CSL 860: Modern Parallel CSL 860: Modern Parallel Computation PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE Reduction n operands => log n steps Total work = O(n) How do you map? Balance Binary tree technique Reduction n

More information

Programovatelné obvody a SoC. PI-PSC

Programovatelné obvody a SoC. PI-PSC Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti Programovatelné obvody a SoC. PI-PSC doc. Ing. Hana Kubátová, CSc. Katedra číslicového návrhu Fakulta informačních technologií ČVUT v

More information

More PRAM Algorithms. Techniques Covered

More PRAM Algorithms. Techniques Covered More PRAM Algorithms Arvind Krishnamurthy Fall 24 Analysis technique: Brent s scheduling lemma Techniques Covered Parallel algorithm is simply characterized by W(n) and S(n) Parallel techniques: Scans

More information

Seminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr.

Seminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr. Seminar on Edge Coloring Series Parallel Graphs Mohammmad Tawhidul Islam Masters of Computer Science Summer Semester 2002 Matrikel Nr. 9003378 Fachhochschule Bonn-Rhein-Sieg Contents 1. Introduction. 2.

More information

Euler Tours and Their Applications. Chris Moultrie CSc 8530

Euler Tours and Their Applications. Chris Moultrie CSc 8530 Euler Tours and Their Applications Chris Moultrie CSc 8530 Topics Covered History Terminology Sequential Algorithm Parallel Algorithm BSP/CGM Algorithm History Started with the famous Konigsberg problem.

More information

Randomized algorithms have several advantages over deterministic ones. We discuss them here:

Randomized algorithms have several advantages over deterministic ones. We discuss them here: CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then

More information

COMP Parallel Computing. PRAM (4) PRAM models and complexity

COMP Parallel Computing. PRAM (4) PRAM models and complexity COMP 633 - Parallel Computing Lecture 5 September 4, 2018 PRAM models and complexity Reading for Thursday Memory hierarchy and cache-based systems Topics Comparison of PRAM models relative performance

More information

Fractional Cascading

Fractional Cascading C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 11 Scribe: Darren Erik Vengroff Date: March 15, 1993 Fractional Cascading 1 Introduction Fractional cascading is a data

More information

Statistical Aspects of Intrusion Detection

Statistical Aspects of Intrusion Detection Statistical Aspects of Intrusion Detection Mgr. Rudolf B. Blažek, Ph.D. Department of Computer Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network

More information

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006 CS302 Topic: Algorithm Analysis #2 Thursday, Sept. 21, 2006 Analysis of Algorithms The theoretical study of computer program performance and resource usage What s also important (besides performance/resource

More information

Problem Set 7 Solutions

Problem Set 7 Solutions Design and Analysis of Algorithms March 0, 2015 Massachusetts Institute of Technology 6.046J/18.410J Profs. Erik Demaine, Srini Devadas, and Nancy Lynch Problem Set 7 Solutions Problem Set 7 Solutions

More information

Algorithms and Applications

Algorithms and Applications Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers

More information

Distributed and Cloud Computing

Distributed and Cloud Computing Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Distributed and Cloud Computing MI-POA, 2011, Lecture 12 1/28 Distributed and Cloud Computing Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems

More information

Burrows Wheeler Transform

Burrows Wheeler Transform Burrows Wheeler Transform The Burrows Wheeler transform (BWT) is an important technique for text compression, text indexing, and their combination compressed text indexing. Let T [0..n] be the text with

More information

NP-Completeness. Algorithms

NP-Completeness. Algorithms NP-Completeness Algorithms The NP-Completeness Theory Objective: Identify a class of problems that are hard to solve. Exponential time is hard. Polynomial time is easy. Why: Do not try to find efficient

More information

Sorting. Data structures and Algorithms

Sorting. Data structures and Algorithms Sorting Data structures and Algorithms Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++ Goodrich, Tamassia and Mount (Wiley, 2004) Outline Bubble

More information

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong

Hashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or

More information

1 Computing alignments in only linear space

1 Computing alignments in only linear space 1 Computing alignments in only linear space One of the defects of dynamic programming for all the problems we have discussed is that the dynamic programming tables use Θ(nm) space when the input strings

More information

Space vs Time, Cache vs Main Memory

Space vs Time, Cache vs Main Memory Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Space vs Time, Cache vs Main Memory CS 4435 - CS 9624 1 / 49

More information

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore

More information

Online Coloring Known Graphs

Online Coloring Known Graphs Online Coloring Known Graphs Magnús M. Halldórsson Science Institute University of Iceland IS-107 Reykjavik, Iceland mmh@hi.is, www.hi.is/ mmh. Submitted: September 13, 1999; Accepted: February 24, 2000.

More information

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48

Algorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48 Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem

More information

Strings. Zachary Friggstad. Programming Club Meeting

Strings. Zachary Friggstad. Programming Club Meeting Strings Zachary Friggstad Programming Club Meeting Outline Suffix Arrays Knuth-Morris-Pratt Pattern Matching Suffix Arrays (no code, see Comp. Prog. text) Sort all of the suffixes of a string lexicographically.

More information

Vertex Cover is Fixed-Parameter Tractable

Vertex Cover is Fixed-Parameter Tractable Vertex Cover is Fixed-Parameter Tractable CS 511 Iowa State University November 28, 2010 CS 511 (Iowa State University) Vertex Cover is Fixed-Parameter Tractable November 28, 2010 1 / 18 The Vertex Cover

More information

Answer any FIVE questions 5 x 10 = 50. Graph traversal algorithms process all the vertices of a graph in a systematic fashion.

Answer any FIVE questions 5 x 10 = 50. Graph traversal algorithms process all the vertices of a graph in a systematic fashion. PES Institute of Technology, Bangalore South Campus (Hosur Road, 1KM before Electronic City, Bangalore 560 100) Solution Set Test III Subject & Code: Design and Analysis of Algorithms(10MCA44) Name of

More information

Hypercubes. (Chapter Nine)

Hypercubes. (Chapter Nine) Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is

More information

PRAM ALGORITHMS: BRENT S LAW

PRAM ALGORITHMS: BRENT S LAW PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm PRAM ALGORITHMS: BRENT S LAW 2 1 MERGING TWO SORTED ARRAYS

More information

Cache-Oblivious Traversals of an Array s Pairs

Cache-Oblivious Traversals of an Array s Pairs Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious

More information

Indexing and Searching

Indexing and Searching Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)

More information

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item Randomized incremental construction Special sampling idea: Sample all except one item hope final addition makes small or no change Method: process items in order average case analysis randomize order to

More information

Clustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline

Clustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline Clustering cluster - set of patterns whose inter-pattern distances are smaller than inter-pattern distances for patterns not in the same cluster a homogeneity and uniformity criterion no connectivity little

More information

Fabian Kuhn. Nicla Bernasconi, Dan Hefetz, Angelika Steger

Fabian Kuhn. Nicla Bernasconi, Dan Hefetz, Angelika Steger Algorithms and Lower Bounds for Distributed Coloring Problems Fabian Kuhn Parts are joint work with Parts are joint work with Nicla Bernasconi, Dan Hefetz, Angelika Steger Given: Network = Graph G Distributed

More information

Sorting and Selection on a Linear Array with Optical Bus System

Sorting and Selection on a Linear Array with Optical Bus System Sorting and Selection on a Linear Array with Optical Bus System Hossam ElGindy Dept. of Elec. & Compt. Eng. Uni. of Newcastle, Australia Sanguthevar Rajasekaran Dept. of CISE Univ. of Florida Abstract

More information

Parallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems*

Parallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems* SOdhan& Vol. 22. Part 5, October 1997, pp. 62%636. Printed ill India. Parallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems* P THANGAVEL Department

More information

Minimum Spanning Trees

Minimum Spanning Trees Minimum Spanning Trees Overview Problem A town has a set of houses and a set of roads. A road connects and only houses. A road connecting houses u and v has a repair cost w(u, v). Goal: Repair enough (and

More information

Sparse Hypercube 3-Spanners

Sparse Hypercube 3-Spanners Sparse Hypercube 3-Spanners W. Duckworth and M. Zito Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3052, Australia Department of Computer Science, University of

More information