Parallel scan on linked lists
|
|
- Mark Berry
- 5 years ago
- Views:
Transcription
1 Parallel scan on linked lists prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy (PI-PPA) LS 00/, Seminář 4 Evropský sociální fond. Praha & EU: Investujeme do vaší budoucnosti prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
2 Linked lists An n-element single-linked list L is represented by an array S of successors. Each element has a unique identification from {,..., n}. L: S(ucc): prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
3 Conversion of a single-linked list to a double-linked list P [i] = j S[j] = i Fully parallel (i.e., linearly scalable) EREW PRAM(n, p) algorithm: T (n, p) = O(n/p). Algorithm Algorithm EREW PRAM SingleDouble (in: S[,..., n]; out: P [,..., n]) for all i :=,..., n do in parallel {P [i] := i; if (S[i] i) then P [S[i]] := i} prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
4 List ranking: sequential vs. parallel algorithms The rank = the distance (= the # of pointers) from the end of the list. Sequential algorithm: trivial by traversing pointers backwards. List ranking by pointer jumping (PJ) on CREW PRAM(n, n) Algorithm Algorithm CREW PRAM ListRanking (in,out:s[,..., n], out: R[,..., n]) for all i :=,..., n do in parallel { if (S[i] = i) then R[i] := 0 else R[i] := ; repeat log n times { R[i] := R[i] + R[S[i]]; S[i] := S[S[i]]}} ( pointer jumping ) prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 4 /
5 Parallel suffix sum Parallel suffix sum on linked lists on CREW PRAM(n, n) Algorithm Alg. CREW PRAM Par Suffix Sum (in,out: S[,..., n], out: V [,..., n]) for all i :=,..., n do in parallel { if (S[i] = i) then V [i] := 0 else V [i] := v i ; repeat log n times { V [i] := V [i] V [S[i]]; S[i] := S[S[i]] }; ( pointer jumping ) if (original V [last] 0) then V [i] := V [i] V [last]} prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 5 /
6 An example of CREW PRAM(6, 6) pointer jumping L: S(ucc): R(ank): prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 6 /
7 The importance of list ranking The ranking allows to transform (permute) a linked list into an array: each element is placed into the location equal to its rank. Since then, all prefix computations can be performed by previous PPSs on this array. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 7 /
8 Scalability issues of pointer jumping The CREW PRAM(n, n) PJ is not cost optimal: C(n, n) = O(n log n). The PJ is oblivious: it keeps jumping even over pointers to the last element. Hence, W (n, n) = O(n log n), too. However, a cost optimization cannot be based on reducing p and assigning n/p elements of S to each processor, since Θ(n/) elements require Θ(log n) jumps to get to the end. The number of elements which are done after step i is i + for i (see the yellow nodes x on the previous figure). CREW PRAM(n, p) PJ algorithm can be made work-optimal only if each processor keeps a list of and jumps only over active (=not done yet) elements. Even this optimized non-oblivious approach does not improve the cost, since one processor can have n/p active elements in all log n steps and hence T (n, p) = O((n/p) log n) and C(n, p) = O(n log n). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 8 /
9 Observation Consider a linked list of n = n/ log n elements and apply pointer jumping using p = n/ log n processors T (n, p) = O(log n) and C(n, p) = O(n). The idea of a scalable list ranking Let p = Θ(n/ log n) and L = n elements. Using p processors, shrink L to L of size n = O(n/ log n) with C(n, p) = O(n), i.e., in T (n, p) = O(log n). Apply pointer jumping to L. Then T (n, p) = O(log n ) = O(log n) and C(n, p) = O(n). 4 Restore L from L and finish ranking computations for elements in L L with the same complexity as in step. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 9 /
10 Symmetry breaking: Independent sets Definition 4 A subset I L is an independent set of a linked list L if i I; if S[i] i then S[i] I. Lemma 5 I L is an independent set i I can be deleted from L in parallel. Proof. If I L is an independent set, then i I; P [i] I. L: L : prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 0 /
11 Local minima of a linked list coloring Lemma 6 The set of local minima of a k-coloring c of an n-element list L is an independent set of size Ω(n/k) and a work-optimal parallel algorithm to determine the local minima. Proof. Let u, v = local minima of c with no other local minima in between. Then u and v cannot be adjacent (If u = S[v], then c[u] < c[v] and c[v] < c[u].) Colors of elements between u and v must form a bitonic sequence of at most k colors I n/(k ) = Ω(n/k). Determining local minima on EREW PRAM is inherently parallel: u in parallel: compare c[s[u]], c[u], and c[p [u]]. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
12 Lemma 7 The best possible coloring is a -coloring. Proof. Two colors do not allow parallel coloring by processors working independently on disjoint sublists of L. Lemma 8 After Θ(log log n) removals of local minima of -colorings of an n-element list L L reduces to L with L n/ log n. Proof. Let m = the number of iterations needed to reduce L to L. Let L k = L after k reductions and I k = the set of local minima of a -coloring of L k. Then I k L k /4 and L k+ = L k I k (/4) L k. By recursion, L k ( k 4) n. L m n/ log n m.4 log log n. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
13 Symmetry breaking and deterministic coin tossing (DCT) Definition 9 Symmetry breaking = an initial n-coloring. If t log i, let bin t (i) = i t... i 0 = the binary representation of i on t bits. DiffPos(t, i, j) = the least bit number in which bin t (i) and bin t (j) differ. Example, DiffPos(5,, 9) = (since bin 5 () = 0 and bin 5 (9) = 00). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
14 Reduction of an n-coloring to a 6-coloring by DCT for i :=,..., n do in parallel c[i] := bin t (i), where t = log n repeat log n times { for i :=,..., n do in parallel { π[i] := DiffPos(t, c[i], c[s[i]]); c [i] := bin t (π[i] + c[i] π[i] ), where t = log t + c[i] := c [i]; }; t := t } prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 4 /
15 Proof of correctness Lemma 0 If c is a valid coloring of L, then c is a coloring of L, too. Proof. c is a coloring c[i] c[s[i]] π[i] is well defined. Assume c is not a valid coloring. Then i such that c [i] = c [S[i]]. c[i]π[i] π[i] = π[s[i]] and c[i] π[i] = c[s[i]] π[s[i]]. But then c[i]π[i] = c[s[i]] π[i] : contradiction. Lemma DCT reduces a t-bit coloring c to t = ( log t + )-bit coloring c. Proof. If c is a t-bit coloring, then for any i, 0 π[i] t the greatest color number in c can be (t ) + = t to encode colors from 0 to t, we need log(t) bits. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 5 /
16 Corollary DCT must be applied O(log n) times, where the log-star function is defined as log (n) = min{i log (i) (x) }. For any realistic value of n, the number of iterations of DCT is at most 5. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 6 /
17 n log n < n < n < n 4 4 < n < n Lemma Iterations of DCT can reduce the number of colors of a coloring only to 6. Proof. Assume t = log n =. Then t = log(t) = and 0 c [i] t = 5. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 7 /
18 6-coloring -coloring for i :=,..., n do in parallel if c[i] = 5 then c[i] {0,, } {c[s[i]], c[p [i]]}; for i :=,..., n do in parallel if c[i] = 4 then c[i] {0,, } {c[s[i]], c[p [i]]}; for i :=,..., n do in parallel if c[i] = then c[i] {0,, } {c[s[i]], c[p [i]]}; Theorem 4 Using DCT, a -coloring on p processors takes T (n, p) = O((n log n)/p) and C(n, p) = O(n log n). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 8 /
19 Input: L = {n, S[,..., n], P [,..., n], R[,..., n]}; Output: R[,..., n]; ( final values of ranks of elements ) Aux: F [,..., n], N[,..., n], c[,..., n];. k := 0; L k := L; n k := L k ;. while n k > n/ log n do {.. apply -coloring c to L k ;.. ( identify I k = the set of local minima of c in L k ); for i :=,..., n k do in parallel if (c[i] < min(c[p [i]], c[s[i]])) then F [i] := N[i] := else F [i] := N[i] := 0;.. ( remove I k from L k );... apply parallel scan to N[,..., n k ];... for i :=,..., n k do in parallel if F [i] = then {U[N[i]] := {i, S[i], P [i], R[i], k}; R[P [i]] := R[P [i]] + R[i]; S[P [i]] := S[i]; P [S[i]] := P [i] };.4. ( compact L k+ = L k I k into consecutive mem. locations );.4.. for i :=,..., n k do in parallel N[i] := F [i];.4.. apply parallel scan to N[,..., n k ];.4.. for i :=,..., n k do in parallel if F [i] = 0 then { S[N[i]] := N[S[i]]; P [N[i]] := N[P [i]]; R[N[i]] := R[i];}.5. n k+ := n k I k ; k := k + ; };. apply pointer jumping to compute ranking in L k ; 4. restore L from L k and rank all removed nodes by reversing steps. and.4; prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 9 /
20 A more scalable list ranking - performance Theorem 5 The previous list ranking algorithm takes on EREW PRAM with p = Θ(n/ log n) T (n, p) = O(log n log log n), C(n, p) = O(n log log n), and W (n, p) = O(n), supposing that we use approximation log n = O(). prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 0 /
21 A more scalable list ranking - performance Proof..: T (n k, p k ) = O(log n k ) if p k = Ω(n k / log n k ).,..,.4.,.4.: T (n k, p k ) = O(n k /p k ) = O(log n k ) if p k = Ω(n k / log n k ) T..,.4.: T (n k, p k ) = O(log n k ) if p k = Ω(n k / log n k ), 4: T (n, p) = O(log n log log n) : T (n/ log n, p) = O(log n) C, 4: C(n, p) = O(n log log n) : C(n/ log n, p) = O(n).: W (n k, p k ) = O(n k log n k ) = O(n k ).,..,.4.,.4.: W (n k, p k ) = O(n k ) W..,.4.: W (n k, p k ) = O(n k + p k log p k ) = O(n k ), 4: W (n, p) = O( log log n k= n k ) = O( log log n k= (/4) k n) = O(n) : W (n/ log n, p) = O(n) Note that p = n/ log n > n k / log n k for all k =,..., log log n. prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
22 Example S: P: R: F: N: U: i 5 6 S[i] 7 8 P[i] 4 R[i] R: L: R: L: F: N: S: P: R: R: L: R: L: prof. Pavel Tvrdík (FIT ČVUT) Linked List ParScan PI-PPA, 0, Seminář 4 /
Parallel Connected Components
Parallel Connected Components prof. Ing. Pavel Tvrdík CSc. Katedra počítačových systémů Fakulta informačních technologií České vysoké učení technické v Praze c Pavel Tvrdík, 00 Pokročilé paralelní algoritmy
More information11. Security Techniques on Smart Cards
11. Security Techniques on Smart Cards Dr.-Ing. Martin Novotný Katedra číslicového návrhu Fakulta informačních technologií České vysoké učení technické v Praze c Martin Novotný, 2011 MI-BHW Bezpečnost
More informationSequential Logic Synthesis
Sequential Logic Synthesis Logic Circuits Design Seminars WS2010/2011, Lecture 9 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology Czech Technical University in Prague
More informationThe PRAM model. A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2
The PRAM model A. V. Gerbessiotis CIS 485/Spring 1999 Handout 2 Week 2 Introduction The Parallel Random Access Machine (PRAM) is one of the simplest ways to model a parallel computer. A PRAM consists of
More informationReal parallel computers
CHAPTER 30 (in old edition) Parallel Algorithms The PRAM MODEL OF COMPUTATION Abbreviation for Parallel Random Access Machine Consists of p processors (PEs), P 0, P 1, P 2,, P p-1 connected to a shared
More informationList Ranking. Chapter 4
List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ
More informationList Ranking. Chapter 4
List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ
More informationServers I. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc.
Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Servers I. MI-POA, 2011, Lecture 5 1/17 Servers I. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems Faculty of Information Technology Czech
More informationIntrusion Techniques
Intrusion Techniques Mgr. Rudolf B. Blažek, Ph.D. Department of Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network Security MI-SIB, ZS 2011/12,
More informationTHE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS
PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS 2
More informationTHE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS
PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm THE EULER TOUR TECHNIQUE: EVALUATION OF TREE FUNCTIONS 2
More informationCS256 Applied Theory of Computation
CS256 Applied Theory of Computation Parallel Computation IV John E Savage Overview PRAM Work-time framework for parallel algorithms Prefix computations Finding roots of trees in a forest Parallel merging
More information: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day
184.727: Parallel Algorithms Exercises, Batch 1. Exercise Day, Tuesday 18.11, 10:00. Hand-in before or at Exercise Day Jesper Larsson Träff, Francesco Versaci Parallel Computing Group TU Wien October 16,
More informationChapter 6. Parallel Algorithms. Chapter by M. Ghaari. Last update 1 : January 2, 2019.
Chapter 6 Parallel Algorithms Chapter by M. Ghaari. Last update 1 : January 2, 2019. This chapter provides an introduction to parallel algorithms. Our highlevel goal is to present \how to think in parallel"
More informationAnalyze the obvious algorithm, 5 points Here is the most obvious algorithm for this problem: (LastLargerElement[A[1..n]:
CSE 101 Homework 1 Background (Order and Recurrence Relations), correctness proofs, time analysis, and speeding up algorithms with restructuring, preprocessing and data structures. Due Thursday, April
More informationServers II. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc.
Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Servers II. MI-POA, 2011, Lecture 6 1/20 Servers II. Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems Faculty of Information Technology
More informationSorting (Chapter 9) Alexandre David B2-206
Sorting (Chapter 9) Alexandre David B2-206 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =
More informationCSE 638: Advanced Algorithms. Lectures 10 & 11 ( Parallel Connected Components )
CSE 6: Advanced Algorithms Lectures & ( Parallel Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Symmetry Breaking: List Ranking break symmetry: t h
More informationCSE Introduction to Parallel Processing. Chapter 5. PRAM and Basic Algorithms
Dr Izadi CSE-40533 Introduction to Parallel Processing Chapter 5 PRAM and Basic Algorithms Define PRAM and its various submodels Show PRAM to be a natural extension of the sequential computer (RAM) Develop
More informationNetwork Intrusion Goals and Methods
Network Intrusion Goals and Methods Mgr. Rudolf B. Blažek, Ph.D. Department of Computer Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network
More informationEE/CSCI 451 Spring 2018 Homework 8 Total Points: [10 points] Explain the following terms: EREW PRAM CRCW PRAM. Brent s Theorem.
EE/CSCI 451 Spring 2018 Homework 8 Total Points: 100 1 [10 points] Explain the following terms: EREW PRAM CRCW PRAM Brent s Theorem BSP model 1 2 [15 points] Assume two sorted sequences of size n can be
More informationSorting (Chapter 9) Alexandre David B2-206
Sorting (Chapter 9) Alexandre David B2-206 1 Sorting Problem Arrange an unordered collection of elements into monotonically increasing (or decreasing) order. Let S = . Sort S into S =
More informationParallel Models. Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD
Parallel Algorithms Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD The PRAM Model Parallel Random Access Machine All processors
More informationFUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 1. ÚVOD DO PŘEDMĚTU, LAMBDA CALCULUS
FUNKCIONÁLNÍ A LOGICKÉ PROGRAMOVÁNÍ 1. ÚVOD DO PŘEDMĚTU, LAMBDA CALCULUS 2011 Jan Janoušek MI-FLP Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti Funkcionální a logické programování
More informationeach processor can in one step do a RAM op or read/write to one global memory location
Parallel Algorithms Two closely related models of parallel computation. Circuits Logic gates (AND/OR/not) connected by wires important measures PRAM number of gates depth (clock cycles in synchronous circuit)
More informationAlgorithms & Data Structures 2
Algorithms & Data Structures 2 PRAM Algorithms WS2017 B. Anzengruber-Tanase (Institute for Pervasive Computing, JKU Linz) (Institute for Pervasive Computing, JKU Linz) RAM MODELL (AHO/HOPCROFT/ULLMANN
More informationExample of usage of Prefix Sum Compacting an Array. Example of usage of Prexix Sum Compacting an Array
Example of usage of Prefix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Example of usage of Prexix um A 0 0 0 e 1 0 0 0 0 0 B e 1 Initialize B with zeroes Any idea on the solution (first in sequential)? If A[i]!= 0
More informationCSE 613: Parallel Programming. Lecture 11 ( Graph Algorithms: Connected Components )
CSE 61: Parallel Programming Lecture ( Graph Algorithms: Connected Components ) Rezaul A. Chowdhury Department of Computer Science SUNY Stony Brook Spring 01 Graph Connectivity 1 1 1 6 5 Connected Components:
More informationCOMP Parallel Computing. PRAM (2) PRAM algorithm design techniques
COMP 633 - Parallel Computing Lecture 3 Aug 29, 2017 PRAM algorithm design techniques Reading for next class (Thu Aug 31): PRAM handout secns 3.6, 4.1, skim section 5. Written assignment 1 is posted, due
More informationModern Technology of Internet
Modern Technology of Internet Jiří Navrátil, Josef Vojtěch, Jan Furman, Tomáš Košnar, Sven Ubik, Milan Šárek, Jan Růžička, Martin Pustka, Laban Mwansa, Rudolf Blažek Katedra počítačových systémů FIT České
More informationComplexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 4
Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 4 Advanced Optimal Solutions 1 8 5 11 2 6 10 4 3 7 12 9 General technique suggests that we solve a smaller problem and extend
More informationParallel Sorting. Sathish Vadhiyar
Parallel Sorting Sathish Vadhiyar Parallel Sorting Problem The input sequence of size N is distributed across P processors The output is such that elements in each processor P i is sorted elements in P
More informationFundamental Algorithms
Fundamental Algorithms Chapter 6: Parallel Algorithms The PRAM Model Jan Křetínský Winter 2017/18 Chapter 6: Parallel Algorithms The PRAM Model, Winter 2017/18 1 Example: Parallel Sorting Definition Sorting
More informationCSL 730: Parallel Programming
CSL 73: Parallel Programming General Algorithmic Techniques Balance binary tree Partitioning Divid and conquer Fractional cascading Recursive doubling Symmetry breaking Pipelining 2 PARALLEL ALGORITHM
More informationParallel Random-Access Machines
Parallel Random-Access Machines Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS3101 (Moreno Maza) Parallel Random-Access Machines CS3101 1 / 69 Plan 1 The PRAM Model 2 Performance
More informationLecture 3: Sorting 1
Lecture 3: Sorting 1 Sorting Arranging an unordered collection of elements into monotonically increasing (or decreasing) order. S = a sequence of n elements in arbitrary order After sorting:
More informationOptimal Parallel Randomized Renaming
Optimal Parallel Randomized Renaming Martin Farach S. Muthukrishnan September 11, 1995 Abstract We consider the Renaming Problem, a basic processing step in string algorithms, for which we give a simultaneously
More informationRepresentations of Terms Representations of Boolean Networks
Representations of Terms Representations of Boolean Networks Logic Circuits Design Seminars WS2010/2011, Lecture 4 Ing. Petr Fišer, Ph.D. Department of Digital Design Faculty of Information Technology
More informationSearching a Sorted Set of Strings
Department of Mathematics and Computer Science January 24, 2017 University of Southern Denmark RF Searching a Sorted Set of Strings Assume we have a set of n strings in RAM, and know their sorted order
More informationLecture 8 Parallel Algorithms II
Lecture 8 Parallel Algorithms II Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Original slides from Introduction to Parallel
More informationCS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms
CS 598: Communication Cost Analysis of Algorithms Lecture 15: Communication-optimal sorting and tree-based algorithms Edgar Solomonik University of Illinois at Urbana-Champaign October 12, 2016 Defining
More informationParallel Distributed Memory String Indexes
Parallel Distributed Memory String Indexes Efficient Construction and Querying Patrick Flick & Srinivas Aluru Computational Science and Engineering Georgia Institute of Technology 1 In this talk Overview
More informationGraph Contraction. Graph Contraction CSE341T/CSE549T 10/20/2014. Lecture 14
CSE341T/CSE549T 10/20/2014 Lecture 14 Graph Contraction Graph Contraction So far we have mostly talking about standard techniques for solving problems on graphs that were developed in the context of sequential
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. November Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. November 2014 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationCSL 730: Parallel Programming. Algorithms
CSL 73: Parallel Programming Algorithms First 1 problem Input: n-bit vector Output: minimum index of a 1-bit First 1 problem Input: n-bit vector Output: minimum index of a 1-bit Algorithm: Divide into
More informationData Structure and Algorithm Midterm Reference Solution TA
Data Structure and Algorithm Midterm Reference Solution TA email: dsa1@csie.ntu.edu.tw Problem 1. To prove log 2 n! = Θ(n log n), it suffices to show N N, c 1, c 2 > 0 such that c 1 n ln n ln n! c 2 n
More informationCSC 447: Parallel Programming for Multi- Core and Cluster Systems
CSC 447: Parallel Programming for Multi- Core and Cluster Systems Parallel Sorting Algorithms Instructor: Haidar M. Harmanani Spring 2016 Topic Overview Issues in Sorting on Parallel Computers Sorting
More information1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors
1. (a) O(log n) algorithm for finding the logical AND of n bits with n processors on an EREW PRAM: See solution for the next problem. Omit the step where each processor sequentially computes the AND of
More informationParallel Systems Course: Chapter VIII. Sorting Algorithms. Kumar Chapter 9. Jan Lemeire ETRO Dept. Fall Parallel Sorting
Parallel Systems Course: Chapter VIII Sorting Algorithms Kumar Chapter 9 Jan Lemeire ETRO Dept. Fall 2017 Overview 1. Parallel sort distributed memory 2. Parallel sort shared memory 3. Sorting Networks
More informationCSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer
CSC 8301- Design and Analysis of Algorithms Lecture 5 Decrease and Conquer Algorithm Design Technique Decrease-and-Conquer This algorithm design technique is based on exploiting a relationship between
More informationComplexity and Advanced Algorithms Monsoon Parallel Algorithms Lecture 2
Complexity and Advanced Algorithms Monsoon 2011 Parallel Algorithms Lecture 2 Trivia ISRO has a new supercomputer rated at 220 Tflops Can be extended to Pflops. Consumes only 150 KW of power. LINPACK is
More informationSorting Algorithms. Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Sorting Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003. Topic Overview Issues in Sorting on Parallel
More informationParallel Algorithms for (PRAM) Computers & Some Parallel Algorithms. Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms
Parallel Algorithms for (PRAM) Computers & Some Parallel Algorithms Reference : Horowitz, Sahni and Rajasekaran, Computer Algorithms Part 2 1 3 Maximum Selection Problem : Given n numbers, x 1, x 2,, x
More informationLesson 1 4. Prefix Sum Definitions. Scans. Parallel Scans. A Naive Parallel Scans
Lesson 1 4 Prefix Sum Definitions Prefix sum given an array...the prefix sum is the sum of all the elements in the array from the beginning to the position, including the value at the position. The sequential
More informationLecture 18. Today, we will discuss developing algorithms for a basic model for parallel computing the Parallel Random Access Machine (PRAM) model.
U.C. Berkeley CS273: Parallel and Distributed Theory Lecture 18 Professor Satish Rao Lecturer: Satish Rao Last revised Scribe so far: Satish Rao (following revious lecture notes quite closely. Lecture
More informationSearch Trees. Undirected graph Directed graph Tree Binary search tree
Search Trees Undirected graph Directed graph Tree Binary search tree 1 Binary Search Tree Binary search key property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then
More information15-750: Parallel Algorithms
5-750: Parallel Algorithms Scribe: Ilari Shafer March {8,2} 20 Introduction A Few Machine Models of Parallel Computation SIMD Single instruction, multiple data: one instruction operates on multiple data
More informationTraveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost
Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R + Goal: find a tour (Hamiltonian cycle) of minimum cost Traveling Salesman Problem (TSP) Input: undirected graph G=(V,E), c: E R
More information/463 Algorithms - Fall 2013 Solution to Assignment 3
600.363/463 Algorithms - Fall 2013 Solution to Assignment 3 (120 points) I (30 points) (Hint: This problem is similar to parenthesization in matrix-chain multiplication, except the special treatment on
More informationHW Trends and Architectures
Pavel Tvrdík, Jiří Kašpar (ČVUT FIT) HW Trends and Architectures MI-POA, 2011, Lecture 1 1/29 HW Trends and Architectures prof. Ing. Pavel Tvrdík CSc. Ing. Jiří Kašpar Department of Computer Systems Faculty
More informationarxiv: v1 [cs.it] 9 Feb 2009
On the minimum distance graph of an extended Preparata code C. Fernández-Córdoba K. T. Phelps arxiv:0902.1351v1 [cs.it] 9 Feb 2009 Abstract The minimum distance graph of an extended Preparata code P(m)
More informationINDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator
INDIAN INSTITUTE OF TECHNOLOGY KHARAGPUR Stamp / Signature of the Invigilator EXAMINATION ( End Semester ) SEMESTER ( Autumn ) Roll Number Section Name Subject Number C S 6 0 0 2 6 Subject Name Parallel
More informationModels of distributed computing: port numbering and local algorithms
Models of distributed computing: port numbering and local algorithms Jukka Suomela Adaptive Computing Group Helsinki Institute for Information Technology HIIT University of Helsinki FMT seminar, 26 February
More informationCSC Design and Analysis of Algorithms. Lecture 5. Decrease and Conquer Algorithm Design Technique. Decrease-and-Conquer
CSC 8301- Design and Analysis of Algorithms Lecture 5 Decrease and Conuer Algorithm Design Techniue Decrease-and-Conuer This algorithm design techniue is based on exploiting a relationship between a solution
More informationCSL 860: Modern Parallel
CSL 860: Modern Parallel Computation PARALLEL ALGORITHM TECHNIQUES: BALANCED BINARY TREE Reduction n operands => log n steps Total work = O(n) How do you map? Balance Binary tree technique Reduction n
More informationProgramovatelné obvody a SoC. PI-PSC
Evropský sociální fond Praha & EU: Investujeme do vaší budoucnosti Programovatelné obvody a SoC. PI-PSC doc. Ing. Hana Kubátová, CSc. Katedra číslicového návrhu Fakulta informačních technologií ČVUT v
More informationMore PRAM Algorithms. Techniques Covered
More PRAM Algorithms Arvind Krishnamurthy Fall 24 Analysis technique: Brent s scheduling lemma Techniques Covered Parallel algorithm is simply characterized by W(n) and S(n) Parallel techniques: Scans
More informationSeminar on. Edge Coloring Series Parallel Graphs. Mohammmad Tawhidul Islam. Masters of Computer Science Summer Semester 2002 Matrikel Nr.
Seminar on Edge Coloring Series Parallel Graphs Mohammmad Tawhidul Islam Masters of Computer Science Summer Semester 2002 Matrikel Nr. 9003378 Fachhochschule Bonn-Rhein-Sieg Contents 1. Introduction. 2.
More informationEuler Tours and Their Applications. Chris Moultrie CSc 8530
Euler Tours and Their Applications Chris Moultrie CSc 8530 Topics Covered History Terminology Sequential Algorithm Parallel Algorithm BSP/CGM Algorithm History Started with the famous Konigsberg problem.
More informationRandomized algorithms have several advantages over deterministic ones. We discuss them here:
CS787: Advanced Algorithms Lecture 6: Randomized Algorithms In this lecture we introduce randomized algorithms. We will begin by motivating the use of randomized algorithms through a few examples. Then
More informationCOMP Parallel Computing. PRAM (4) PRAM models and complexity
COMP 633 - Parallel Computing Lecture 5 September 4, 2018 PRAM models and complexity Reading for Thursday Memory hierarchy and cache-based systems Topics Comparison of PRAM models relative performance
More informationFractional Cascading
C.S. 252 Prof. Roberto Tamassia Computational Geometry Sem. II, 1992 1993 Lecture 11 Scribe: Darren Erik Vengroff Date: March 15, 1993 Fractional Cascading 1 Introduction Fractional cascading is a data
More informationStatistical Aspects of Intrusion Detection
Statistical Aspects of Intrusion Detection Mgr. Rudolf B. Blažek, Ph.D. Department of Computer Systems Faculty of Information Technologies Czech Technical University in Prague Rudolf Blažek 2010-2011 Network
More informationCS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006
CS302 Topic: Algorithm Analysis #2 Thursday, Sept. 21, 2006 Analysis of Algorithms The theoretical study of computer program performance and resource usage What s also important (besides performance/resource
More informationProblem Set 7 Solutions
Design and Analysis of Algorithms March 0, 2015 Massachusetts Institute of Technology 6.046J/18.410J Profs. Erik Demaine, Srini Devadas, and Nancy Lynch Problem Set 7 Solutions Problem Set 7 Solutions
More informationAlgorithms and Applications
Algorithms and Applications 1 Areas done in textbook: Sorting Algorithms Numerical Algorithms Image Processing Searching and Optimization 2 Chapter 10 Sorting Algorithms - rearranging a list of numbers
More informationDistributed and Cloud Computing
Jiří Kašpar, Pavel Tvrdík (ČVUT FIT) Distributed and Cloud Computing MI-POA, 2011, Lecture 12 1/28 Distributed and Cloud Computing Ing. Jiří Kašpar prof. Ing. Pavel Tvrdík CSc. Department of Computer Systems
More informationBurrows Wheeler Transform
Burrows Wheeler Transform The Burrows Wheeler transform (BWT) is an important technique for text compression, text indexing, and their combination compressed text indexing. Let T [0..n] be the text with
More informationNP-Completeness. Algorithms
NP-Completeness Algorithms The NP-Completeness Theory Objective: Identify a class of problems that are hard to solve. Exponential time is hard. Polynomial time is easy. Why: Do not try to find efficient
More informationSorting. Data structures and Algorithms
Sorting Data structures and Algorithms Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++ Goodrich, Tamassia and Mount (Wiley, 2004) Outline Bubble
More informationHashing. Yufei Tao. Department of Computer Science and Engineering Chinese University of Hong Kong
Department of Computer Science and Engineering Chinese University of Hong Kong In this lecture, we will revisit the dictionary search problem, where we want to locate an integer v in a set of size n or
More information1 Computing alignments in only linear space
1 Computing alignments in only linear space One of the defects of dynamic programming for all the problems we have discussed is that the dynamic programming tables use Θ(nm) space when the input strings
More informationSpace vs Time, Cache vs Main Memory
Space vs Time, Cache vs Main Memory Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 (Moreno Maza) Space vs Time, Cache vs Main Memory CS 4435 - CS 9624 1 / 49
More information1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth
Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore
More informationOnline Coloring Known Graphs
Online Coloring Known Graphs Magnús M. Halldórsson Science Institute University of Iceland IS-107 Reykjavik, Iceland mmh@hi.is, www.hi.is/ mmh. Submitted: September 13, 1999; Accepted: February 24, 2000.
More informationAlgorithm Analysis. (Algorithm Analysis ) Data Structures and Programming Spring / 48
Algorithm Analysis (Algorithm Analysis ) Data Structures and Programming Spring 2018 1 / 48 What is an Algorithm? An algorithm is a clearly specified set of instructions to be followed to solve a problem
More informationStrings. Zachary Friggstad. Programming Club Meeting
Strings Zachary Friggstad Programming Club Meeting Outline Suffix Arrays Knuth-Morris-Pratt Pattern Matching Suffix Arrays (no code, see Comp. Prog. text) Sort all of the suffixes of a string lexicographically.
More informationVertex Cover is Fixed-Parameter Tractable
Vertex Cover is Fixed-Parameter Tractable CS 511 Iowa State University November 28, 2010 CS 511 (Iowa State University) Vertex Cover is Fixed-Parameter Tractable November 28, 2010 1 / 18 The Vertex Cover
More informationAnswer any FIVE questions 5 x 10 = 50. Graph traversal algorithms process all the vertices of a graph in a systematic fashion.
PES Institute of Technology, Bangalore South Campus (Hosur Road, 1KM before Electronic City, Bangalore 560 100) Solution Set Test III Subject & Code: Design and Analysis of Algorithms(10MCA44) Name of
More informationHypercubes. (Chapter Nine)
Hypercubes (Chapter Nine) Mesh Shortcomings: Due to its simplicity and regular structure, the mesh is attractive, both theoretically and practically. A problem with the mesh is that movement of data is
More informationPRAM ALGORITHMS: BRENT S LAW
PARALLEL AND DISTRIBUTED ALGORITHMS BY DEBDEEP MUKHOPADHYAY AND ABHISHEK SOMANI http://cse.iitkgp.ac.in/~debdeep/courses_iitkgp/palgo/index.htm PRAM ALGORITHMS: BRENT S LAW 2 1 MERGING TWO SORTED ARRAYS
More informationCache-Oblivious Traversals of an Array s Pairs
Cache-Oblivious Traversals of an Array s Pairs Tobias Johnson May 7, 2007 Abstract Cache-obliviousness is a concept first introduced by Frigo et al. in [1]. We follow their model and develop a cache-oblivious
More informationIndexing and Searching
Indexing and Searching Introduction How to retrieval information? A simple alternative is to search the whole text sequentially Another option is to build data structures over the text (called indices)
More informationRandomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item
Randomized incremental construction Special sampling idea: Sample all except one item hope final addition makes small or no change Method: process items in order average case analysis randomize order to
More informationClustering. Pattern Recognition IX. Michal Haindl. Clustering. Outline
Clustering cluster - set of patterns whose inter-pattern distances are smaller than inter-pattern distances for patterns not in the same cluster a homogeneity and uniformity criterion no connectivity little
More informationFabian Kuhn. Nicla Bernasconi, Dan Hefetz, Angelika Steger
Algorithms and Lower Bounds for Distributed Coloring Problems Fabian Kuhn Parts are joint work with Parts are joint work with Nicla Bernasconi, Dan Hefetz, Angelika Steger Given: Network = Graph G Distributed
More informationSorting and Selection on a Linear Array with Optical Bus System
Sorting and Selection on a Linear Array with Optical Bus System Hossam ElGindy Dept. of Elec. & Compt. Eng. Uni. of Newcastle, Australia Sanguthevar Rajasekaran Dept. of CISE Univ. of Florida Abstract
More informationParallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems*
SOdhan& Vol. 22. Part 5, October 1997, pp. 62%636. Printed ill India. Parallel algorithms for generating combinatorial objects on linear processor arrays with reconfigurable bus systems* P THANGAVEL Department
More informationMinimum Spanning Trees
Minimum Spanning Trees Overview Problem A town has a set of houses and a set of roads. A road connects and only houses. A road connecting houses u and v has a repair cost w(u, v). Goal: Repair enough (and
More informationSparse Hypercube 3-Spanners
Sparse Hypercube 3-Spanners W. Duckworth and M. Zito Department of Mathematics and Statistics, University of Melbourne, Parkville, Victoria 3052, Australia Department of Computer Science, University of
More information