Multithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson

Size: px
Start display at page:

Download "Multithreaded Programming in. Cilk LECTURE 2. Charles E. Leiserson"

Transcription

1 Multithreaded Programming in Cilk LECTURE 2 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

2 Minicourse Outline LECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. LECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. LABORATORY Programming matrix multiplication in Cilk Dr. Bradley C. Kuszmaul LECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more. July 14,

3 LECTURE 2 Recurrences (Review) Matrix Multiplication Merge Sort Tableau Construction Conclusion July 14,

4 The Master Method The Master Method for solving recurrences applies to recurrences of the form T(n) = at(n/b) + f (n),* where a 1, b > 1, and f is asymptotically positive. IDEA: Compare n log ba b a with f (n).. *The unstated base case is T(n) = Θ(1) for sufficiently small n. July 14,

5 Master Method CASE 1 T(n) = at(n/b) + f (n) n log ba ba À f (n) Specifically, f(n) = O(n log b a ε ) for some constant ε > 0. Solution: T(n) = Θ(n log ba ). July 14,

6 Master Method CASE 2 T(n) = at(n/b) + f (n) n log ba ba f (n) Specifically, f(n) = Θ(n log b a lg k n) for some constant k 0. Solution: T(n) = Θ(n log ba lg k+1 n). July 14,

7 Master Method CASE 3 T(n) = at(n/b) + f (n) n log ba ba f (n) Specifically, f(n) = Ω(n log b a + ε ) for some constant ε > 0 and f (n) satisfies the regularity condition that af(n/b) cf(n) for some constant c < 1. Solution: T(n) = Θ(f (n)). July 14,

8 Master Method Summary T(n) = at(n/b) + f (n) CASE 1: f(n) = O(n log ba ε ), constant ε > 0 T(n) = Θ(n log ba ). CASE 2: f (n) = Θ(n log ba lg k n), constant k 0 T(n) = Θ(n log ba lg k+1 n). CASE 3: f (n) = Ω(n log ba + ε ), constant ε > 0, and regularity condition T(n) = Θ( f (n)). July 14,

9 Master Method Quiz T(n) = 4 T(n/2) + n n log b a = n 2 À n CASE 1: T(n) = Θ(n 2 ). T(n) = 4 T(n/2) + n 2 n log b a = n 2 = n 2 lg 0 n CASE 2: T(n) = Θ(n 2 lg n). T(n) = 4 T(n/2) + n 3 n log b a = n 2 n 3 CASE 3: T(n) = Θ(n 3 ). T(n) = 4 T(n/2) + n 2 /lg n Master method does not apply! July 14,

10 LECTURE 2 Recurrences (Review) Matrix Multiplication Merge Sort Tableau Construction Conclusion July 14,

11 Square-Matrix Multiplication c 11 c 12 L c 1n c 21 c 22 L c 2n M M O M c n1 c n2 L c nn a 11 a 12 L a 1n a 21 a 22 L a 2n = M M O M a n1 a n2 L a nn b 11 b 12 L b 1n b 21 b 22 L b 2n M M O M b n1 b n2 L b nn C A B n c ij = k = 1 a ik b kj Assume for simplicity that n = 2 k. July 14,

12 Recursive Matrix Multiplication Divide and conquer C 11 C 12 A 11 A 12 C 21 C 22 = A 21 A 22 B 11 B 12 B 21 B 22 = A 11 B 11 A 11 B 12 + A 21 B 11 A 21 B 12 A 12 B 21 A 12 B 22 A 22 B 21 A 22 B 22 8 multiplications of (n/2) (n/2) matrices. 1 addition of n n matrices. July 14,

13 Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); return; C = A B Absence of type declarations. July 14,

14 Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); return; C = A B Coarsen base cases for efficiency. July 14,

15 Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); Submatrices are return; C = A B Also need a rowsize argument for array indexing. produced by pointer calculation, not copying of elements. July 14,

16 Matrix Multiply in Pseudo-Cilk cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); spawn Mult(C22,A21,B12,n/2); spawn Mult(C21,A21,B11,n/2); spawn Mult(T11,A12,B21,n/2); spawn Mult(T12,A12,B22,n/2); spawn Mult(T22,A22,B22,n/2); spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); cilk void Add(*C, *T, n) n) { h h base case & partition matrices ii return; spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn C = A B Add(C22,T22,n/2); return; C = C + T July 14,

17 Work of Matrix Addition cilk void Add(*C, *T, n) { hhbase case & partition matrices ii spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); return; Work: A 1 (n) = 4A? 1 (n/2) + Θ(1) = Θ(n 2 ) CASE 1 n log b a = n log 2 4 = n 2 À Θ(1). July 14,

18 maximum Span of Matrix Addition cilk void Add(*C, *T, n) { h h base case & partition matrices i i spawn Add(C11,T11,n/2); spawn Add(C12,T12,n/2); spawn Add(C21,T21,n/2); spawn Add(C22,T22,n/2); return; Span: A (n)= A? (n/2) + Θ(1) = Θ(lg n) CASE 2 n log b a = n log 2 1 = 1 f (n) = Θ(n log b a lg 0 n). July 14,

19 Work of Matrix Multiplication 8 cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); MM spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); return; Work: M 1 (n)= 8? M 1 (n/2) +A 1 (n) + Θ(1) =8M 1 (n/2) + Θ(n 2 ) = Θ(n 3 ) CASE 1 n log b a = n log 2 8 = n 3 À Θ(n 2 ). July 14,

20 Span of Matrix Multiplication 8 cilk void Mult(*C, *A, *B, n) n) { float *T *T = Cilk_alloca(n*n*sizeof(float)); h h base case & partition matrices i i spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); M M spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); return; Span: M (n)= M? (n/2) + A (n) + Θ(1) = M (n/2) + Θ(lg n) = Θ(lg 2 n) CASE 2 n log b a = n log 2 1 = 1 f (n) = Θ(n log b a lg 1 n). July 14,

21 Parallelism of Matrix Multiply Work: Span: M 1 (n)= Θ(n 3 ) M (n)= Θ(lg 2 n) Parallelism: M 1 (n) M (n) = Θ(n 3 /lg 2 n) For matrices, parallelism (10 3 ) 3 /10 2 = July 14,

22 Stack Temporaries cilk void Mult(*C, *A, *B, n) n) { float *T = Cilk_alloca(n*n*sizeof(float)); hhbase case & partition matrices ii spawn Mult(C11,A11,B11,n/2); spawn Mult(C12,A11,B12,n/2); MM spawn Mult(T21,A22,B21,n/2); spawn Add(C,T,n); return; In hierarchical-memory machines (especially chip multiprocessors), memory accesses are so expensive that minimizing storage often yields higher performance. IDEA: Trade off parallelism for less storage. July 14,

23 No-Temp Matrix Multiplication cilk void MultA(*C, *A, *B, n) n) { // // C = C + A * B hhbase case & partition matrices ii spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); return; Saves space, but at what expense? July 14,

24 Work of No-Temp Multiply cilk void MultA(*C, *A, *B, n) n) { // // C = C + A * B hhbase case & partition matrices ii spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); return; Work: M 1 (n)= 8? M 1 (n/2) + Θ(1) = Θ(n 3 ) CASE 1 July 14,

25 Span of No-Temp Multiply maximum maximum cilk void MultA(*C, *A, *B, n) n) { // // C = C + A * B h h base case & partition matrices i i spawn MultA(C11,A11,B11,n/2); spawn MultA(C12,A11,B12,n/2); spawn MultA(C22,A21,B12,n/2); spawn MultA(C21,A21,B11,n/2); spawn MultA(C21,A22,B21,n/2); spawn MultA(C22,A22,B22,n/2); spawn MultA(C12,A12,B22,n/2); spawn MultA(C11,A12,B21,n/2); return; Span: M (n)= 2? M (n/2) + Θ(1) = Θ(n) CASE 1 July 14,

26 Parallelism of No-Temp Multiply Work: Span: M 1 (n)=θ(n 3 ) M (n)=θ(n) Parallelism: M 1 (n) M (n) = Θ(n 2 ) For matrices, parallelism (10 3 ) 3 /10 3 = Faster in practice! July 14,

27 Testing Synchronization Cilk language feature: A programmer can check whether a Cilk procedure is synched (without actually performing a sync) by testing the pseudovariable SYNCHED: SYNCHED = 0 some spawned children might not have returned. SYNCHED = 1 all spawned children have definitely returned. July 14,

28 Best of Both Worlds cilk void Mult1(*C, *A, *A, *B, *B, n) n) {// {// multiply & store hhbase base case case & partition matrices ii spawn Mult1(C11,A11,B11,n/2); // // multiply & store spawn Mult1(C12,A11,B12,n/2); spawn Mult1(C22,A21,B12,n/2); spawn Mult1(C21,A21,B11,n/2); if if (SYNCHED) { spawn MultA1(C11,A12,B21,n/2); // // multiply & add add spawn MultA1(C12,A12,B22,n/2); spawn MultA1(C22,A22,B22,n/2); spawn MultA1(C21,A22,B21,n/2); else { float *T *T = Cilk_alloca(n*n*sizeof(float)); spawn Mult1(T11,A12,B21,n/2); // // multiply & store spawn Mult1(T12,A12,B22,n/2); This code is is just as parallel spawn Mult1(T22,A22,B22,n/2); spawn Mult1(T21,A22,B21,n/2); as the original, but it it only spawn Add(C,T,n); uses // // C more = C + space T if if runtime parallelism actually exists. return; July 14,

29 Ordinary Matrix Multiplication n c ij = k = 1 a ik b kj IDEA: Spawn n 2 inner products in parallel. Compute each inner product in parallel. Work: Θ(n 3 ) Span: Θ(lg n) Parallelism: Θ(n 3 /lg n) BUT, this algorithm exhibits poor locality and does not exploit the cache hierarchy of modern microprocessors, especially CMP s. July 14,

30 LECTURE 2 Recurrences (Review) Matrix Multiplication Merge Sort Tableau Construction Conclusion July 14,

31 Merging Two Sorted Arrays void Merge(int *C, *C, int int *A, *A, int int *B, *B, int int na, na, int int nb) nb) { while (na>0 && && nb>0) { if if (*A (*A <= <= *B) *B) { *C++ = *A++; na--; else { *C++ = *B++; nb--; Time to merge n while (na>0) { elements *C++ = *A++; na--; = Θ(n).? while (nb>0) { *C++ = *B++; nb--; July 14,

32 merge merge merge Merge Sort cilk void MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); Merge(B, C, C, C+n/2, n/2, n-n/2); July 14,

33 Work of Merge Sort cilk void MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); Merge(B, C, C, C+n/2, n/2, n-n/2); Work: T 1 (n)= 2T? 1 (n/2) + Θ(n) = Θ(n lg n) CASE 2 n log b a = n log 2 2 = n f (n) = Θ(n log b a lg 0 n). July 14,

34 Span of Merge Sort cilk void MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn MergeSort(C, A, A, n/2); spawn MergeSort(C+n/2, A+n/2, n-n/2); Merge(B, C, C, C+n/2, n/2, n-n/2); Span: T (n)= T?(n/2) + Θ(n) = Θ(n) CASE 3 n log b a = n log 2 1 = 1 Θ(n). July 14,

35 Parallelism of Merge Sort Work: Span: T 1 (n)=θ(n lg n) T (n)=θ(n) Parallelism: T 1 (n) T (n) = Θ(lg n) We need to parallelize the merge! July 14,

36 Parallel Merge A 0 na/2 na A[na/2] A[na/2] Recursive merge Binary search Recursive merge B A[na/2] A[na/2] 0 j j+1 nb na nb KEY IDEA: If the total number of elements to be merged in the two arrays is n = na + nb, the total number of elements in the larger of the two recursive merges is at most (3/4)? n. July 14,

37 Parallel Merge cilk void P_Merge(int *C, *C, int int *A, *A, int int *B, *B, int int na, na, int int nb) nb) { if if (na (na < nb) nb) { spawn P_Merge(C, B, B, A, A, nb, nb, na); else if if (na==1) { if if (nb (nb == == 0) 0) { C[0] = A[0]; else { C[0] = (A[0]<B[0])? A[0] : B[0]; /* /* minimum */ */ C[1] = (A[0]<B[0])? B[0] : A[0]; /* /* maximum */ */ else { int int ma ma = na/2; int int mb mb = BinarySearch(A[ma], B, B, nb); spawn P_Merge(C, A, A, B, B, ma, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); Coarsen base cases for efficiency. July 14,

38 Span of P_Merge cilk void P_Merge(int *C, *C, int int *A, *A, int int *B, *B, int int na, na, int int nb) nb) { if if (na (na < nb) nb) { M else { int int ma ma = na/2; int int mb mb = BinarySearch(A[ma], B, B, nb); spawn P_Merge(C, A, A, B, B, ma, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); Span: T (n)= T?(3n/4) + Θ(lg n) = Θ(lg 2 n) CASE 2 n log b a = n log 4/3 1 = 1 f (n) = Θ(n log b a lg 1 n). July 14,

39 Work of P_Merge cilk void P_Merge(int *C, *C, int int *A, *A, int int *B, *B, int int na, na, int int nb) nb) { if if (na (na < nb) nb) { M else { int int ma ma = na/2; int int mb mb = BinarySearch(A[ma], B, B, nb); spawn P_Merge(C, A, A, B, B, ma, ma, mb); spawn P_Merge(C+ma+mb, A+ma, B+mb, na-ma, nb-mb); Work: T 1 (n) = T? 1 (αn) + T 1 ((1 α)n) + Θ(lg n), where 1/4 α 3/4. CLAIM: T 1 (n) = Θ(n). July 14,

40 Analysis of Work Recurrence T 1 (n) = T 1 (αn) + T 1 ((1 α)n) + Θ(lg n), where 1/4 α 3/4. Substitution method: Inductive hypothesis is T 1 (k) c 1 k c 2 lg k, where c 1,c 2 > 0. Prove that the relation holds, and solve for c 1 and c 2. T 1 (n) =T 1 (αn) + T 1 ((1 α)n) + Θ(lg n) c 1 (αn) c 2 lg(αn) + c 1 ((1 α)n) c 2 lg((1 α)n) + Θ(lg n) July 14,

41 Analysis of Work Recurrence T 1 (n) = T 1 (αn) + T 1 ((1 α)n) + Θ(lg n), where 1/4 α 3/4. T 1 (n) =T 1 (αn) + T 1 ((1 α)n) + Θ(lg n) c 1 (αn) c 2 lg(αn) + c 1 (1 α)n c 2 lg((1 α)n) + Θ(lg n) July 14,

42 Analysis of Work Recurrence T 1 (n) = T 1 (αn) + T 1 ((1 α)n) + Θ(lg n), where 1/4 α 3/4. T 1 (n) =T 1 (αn) + T 1 ((1 α)n) + Θ(lg n) c 1 (αn) c 2 lg(αn) + c 1 (1 α)n c 2 lg((1 α)n) + Θ(lg n) c 1 n c 2 lg(αn) c 2 lg((1 α)n) + Θ(lg n) c 1 n c 2 ( lg(α(1 α)) + 2 lg n ) + Θ(lg n) c 1 n c 2 lg n (c 2 (lg n + lg(α(1 α))) Θ(lg n)) c 1 n c 2 lg n by choosing c 1 and c 2 large enough. July 14,

43 Parallelism of P_Merge Work: Span: T 1 (n)=θ(n) T (n)=θ(lg 2 n) Parallelism: T 1 (n) T (n) = Θ(n/lg 2 n) July 14,

44 Parallel Merge Sort cilk void P_MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); spawn P_Merge(B, C, C, C+n/2, n/2, n-n/2); July 14,

45 Work of Parallel Merge Sort cilk void P_MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); spawn P_Merge(B, C, C, C+n/2, n/2, n-n/2); Work: T 1 (n)=2 T 1 (n/2) + Θ(n) = Θ(n lg n) CASE 2 July 14,

46 Span of Parallel Merge Sort cilk void P_MergeSort(int *B, *B, int int *A, *A, int int n) n) { if if (n==1) { B[0] = A[0]; else { int int *C; *C; C = (int*) Cilk_alloca(n*sizeof(int)); spawn P_MergeSort(C, A, A, n/2); spawn P_MergeSort(C+n/2, A+n/2, n-n/2); spawn P_Merge(B, C, C, C+n/2, n/2, n-n/2); Span: T (n)= T?(n/2) + Θ(lg 2 n) = Θ(lg 3 n) CASE 2 n log b a = n log 2 1 = 1 f (n) = Θ(n log b a lg 2 n). July 14,

47 Parallelism of Merge Sort Work: Span: T 1 (n)=θ(n lg n) T (n)=θ(lg 3 n) Parallelism: T 1 (n) T (n) = Θ(n/lg 2 n) July 14,

48 LECTURE 2 Recurrences (Review) Matrix Multiplication Merge Sort Tableau Construction Conclusion July 14,

49 Tableau Construction Problem: Fill in an n n tableau A, where A[i, j] = f ( A[i, j 1], A[i 1, j], A[i 1, j 1]) Dynamic programming Longest common subsequence Edit distance Time warping Work: Θ(n 2 ). July 14,

50 Recursive Construction n Cilk code n II III II IV spawn I; spawn II; spawn III; spawn IV; July 14,

51 Recursive Construction n Cilk code n II III II IV spawn I; spawn II; spawn III; spawn IV; Work: T 1 (n) = 4T? 1 (n/2) + Θ(1) = Θ(n 2 ) CASE 1 July 14,

52 Recursive Construction n Cilk code n II III II IV spawn I; spawn II; spawn III; spawn IV; Span: T (n) = 3T? (n/2) + Θ(1) = Θ(n lg3 ) CASE 1 July 14,

53 Analysis of Tableau Construction Work: T 1 (n) = Θ(n 2 ) Span: T (n) = Θ(n lg3 ) Θ(n 1.58 ) Parallelism: T 1 (n) T (n) Θ(n 0.42 ) July 14,

54 A More-Parallel Construction n II III VI n spawn I; II VV VIII IV VII IX spawn II; spawn III; spawn IV; spawn V; spawn VI spawn VII; spawn VIII; spawn IX; July 14,

55 A More-Parallel Construction n II III VI n spawn I; II VV VIII IV VII IX spawn II; spawn III; spawn IV; spawn V; spawn VI spawn VII; spawn VIII; spawn IX; Work: T 1 (n) = 9T? 1 (n/3) + Θ(1) = Θ(n 2 ) CASE 1 July 14,

56 A More-Parallel Construction n II III VI n spawn I; II VV VIII IV VII IX spawn II; spawn III; spawn IV; spawn V; spawn VI spawn VII; spawn VIII; spawn IX; Span: T (n) =? 5T (n/3) + Θ(1) = Θ(n log 3 5 ) CASE 1 July 14,

57 Analysis of Revised Construction Work: T 1 (n) = Θ(n 2 ) Span: T (n) = Θ(n log 3 5 ) Θ(n 1.46 ) Parallelism: T 1 (n) T (n) Θ(n 0.54 ) More parallel by a factor of Θ(n 0.54 )/Θ(n 0.42 ) = Θ(n 0.12 ). July 14,

58 Puzzle What is is the largest parallelism that can be obtained for the tableauconstruction problem using Cilk? You may only use basic Cilk control constructs (spawn, sync) for synchronization. No locks, synchronizing through memory, etc. July 14,

59 LECTURE 2 Recurrences (Review) Matrix Multiplication Merge Sort Tableau Construction Conclusion July 14,

60 Key Ideas Cilk is simple: cilk, spawn, sync, SYNCHED Recurrences, recurrences, recurrences, Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span Work & span July 14,

61 Minicourse Outline LECTURE 1 Basic Cilk programming: Cilk keywords, performance measures, scheduling. LECTURE 2 Analysis of Cilk algorithms: matrix multiplication, sorting, tableau construction. LABORATORY Programming matrix multiplication in Cilk Dr. Bradley C. Kuszmaul LECTURE 3 Advanced Cilk programming: inlets, abort, speculation, data synchronization, & more. July 14,

Analysis of Multithreaded Algorithms

Analysis of Multithreaded Algorithms Analysis of Multithreaded Algorithms Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS4402-9535 (Moreno Maza) Analysis of Multithreaded Algorithms CS4402-9535 1 / 27 Plan 1 Matrix

More information

CS 140 : Non-numerical Examples with Cilk++

CS 140 : Non-numerical Examples with Cilk++ CS 140 : Non-numerical Examples with Cilk++ Divide and conquer paradigm for Cilk++ Quicksort Mergesort Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap) T P = execution time

More information

Cilk, Matrix Multiplication, and Sorting

Cilk, Matrix Multiplication, and Sorting 6.895 Theory of Parallel Systems Lecture 2 Lecturer: Charles Leiserson Cilk, Matrix Multiplication, and Sorting Lecture Summary 1. Parallel Processing With Cilk This section provides a brief introduction

More information

Parallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication

Parallel Algorithms CSE /22/2015. Outline of this lecture: 1 Implementation of cilk for. 2. Parallel Matrix Multiplication CSE 539 01/22/2015 Parallel Algorithms Lecture 3 Scribe: Angelina Lee Outline of this lecture: 1. Implementation of cilk for 2. Parallel Matrix Multiplication 1 Implementation of cilk for We mentioned

More information

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson

Multithreaded Programming in. Cilk LECTURE 1. Charles E. Leiserson Multithreaded Programming in Cilk LECTURE 1 Charles E. Leiserson Supercomputing Technologies Research Group Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Table of Contents. Cilk

Table of Contents. Cilk Table of Contents 212 Introduction to Parallelism Introduction to Programming Models Shared Memory Programming Message Passing Programming Shared Memory Models Cilk TBB HPF Chapel Fortress Stapl PGAS Languages

More information

A Minicourse on Dynamic Multithreaded Algorithms

A Minicourse on Dynamic Multithreaded Algorithms Introduction to Algorithms December 5, 005 Massachusetts Institute of Technology 6.046J/18.410J Professors Erik D. Demaine and Charles E. Leiserson Handout 9 A Minicourse on Dynamic Multithreaded Algorithms

More information

CS473 - Algorithms I

CS473 - Algorithms I CS473 - Algorithms I Lecture 4 The Divide-and-Conquer Design Paradigm View in slide-show mode 1 Reminder: Merge Sort Input array A sort this half sort this half Divide Conquer merge two sorted halves Combine

More information

CS 140 : Numerical Examples on Shared Memory with Cilk++

CS 140 : Numerical Examples on Shared Memory with Cilk++ CS 140 : Numerical Examples on Shared Memory with Cilk++ Matrix-matrix multiplication Matrix-vector multiplication Hyperobjects Thanks to Charles E. Leiserson for some of these slides 1 Work and Span (Recap)

More information

Parallel Algorithms. 6/12/2012 6:42 PM gdeepak.com 1

Parallel Algorithms. 6/12/2012 6:42 PM gdeepak.com 1 Parallel Algorithms 1 Deliverables Parallel Programming Paradigm Matrix Multiplication Merge Sort 2 Introduction It involves using the power of multiple processors in parallel to increase the performance

More information

Multithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 2. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 12 Multithreaded Algorithms Part 2 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Outline: Multithreaded Algorithms Part

More information

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 5. The Towers of Hanoi. Divide and Conquer

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 5. The Towers of Hanoi. Divide and Conquer Taking Stock IE170: Algorithms in Systems Engineering: Lecture 5 Jeff Linderoth Department of Industrial and Systems Engineering Lehigh University January 24, 2007 Last Time In-Place, Out-of-Place Count

More information

CSE 260 Lecture 19. Parallel Programming Languages

CSE 260 Lecture 19. Parallel Programming Languages CSE 260 Lecture 19 Parallel Programming Languages Announcements Thursday s office hours are cancelled Office hours on Weds 2p to 4pm Jing will hold OH, too, see Moodle Scott B. Baden /CSE 260/ Winter 2014

More information

Divide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems

Divide & Conquer. 2. Conquer the sub-problems by solving them recursively. 1. Divide the problem into number of sub-problems Divide & Conquer Divide & Conquer The Divide & Conquer approach breaks down the problem into multiple smaller sub-problems, solves the sub-problems recursively, then combines the solutions of the sub-problems

More information

CS 140 : Feb 19, 2015 Cilk Scheduling & Applications

CS 140 : Feb 19, 2015 Cilk Scheduling & Applications CS 140 : Feb 19, 2015 Cilk Scheduling & Applications Analyzing quicksort Optional: Master method for solving divide-and-conquer recurrences Tips on parallelism and overheads Greedy scheduling and parallel

More information

Divide and Conquer Algorithms

Divide and Conquer Algorithms CSE341T 09/13/2017 Lecture 5 Divide and Conquer Algorithms We have already seen a couple of divide and conquer algorithms in this lecture. The reduce algorithm and the algorithm to copy elements of the

More information

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1

DIVIDE & CONQUER. Problem of size n. Solution to sub problem 1 DIVIDE & CONQUER Definition: Divide & conquer is a general algorithm design strategy with a general plan as follows: 1. DIVIDE: A problem s instance is divided into several smaller instances of the same

More information

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1

Plotting run-time graphically. Plotting run-time graphically. CS241 Algorithmics - week 1 review. Prefix Averages - Algorithm #1 CS241 - week 1 review Special classes of algorithms: logarithmic: O(log n) linear: O(n) quadratic: O(n 2 ) polynomial: O(n k ), k 1 exponential: O(a n ), a > 1 Classifying algorithms is generally done

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 4435 - CS 9624 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5

More information

Data Structures and Algorithms CSE 465

Data Structures and Algorithms CSE 465 Data Structures and Algorithms CSE 465 LECTURE 4 More Divide and Conquer Binary Search Exponentiation Multiplication Sofya Raskhodnikova and Adam Smith Review questions How long does Merge Sort take on

More information

Parallelism and Performance

Parallelism and Performance 6.172 erformance Engineering of Software Systems LECTURE 13 arallelism and erformance Charles E. Leiserson October 26, 2010 2010 Charles E. Leiserson 1 Amdahl s Law If 50% of your application is parallel

More information

CSC Design and Analysis of Algorithms

CSC Design and Analysis of Algorithms CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two

More information

EECS 2011M: Fundamentals of Data Structures

EECS 2011M: Fundamentals of Data Structures M: Fundamentals of Data Structures Instructor: Suprakash Datta Office : LAS 3043 Course page: http://www.eecs.yorku.ca/course/2011m Also on Moodle Note: Some slides in this lecture are adopted from James

More information

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conquer Algorithm Design Technique Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide a problem instance into two

More information

Advanced Algorithms. Problem solving Techniques. Divide and Conquer הפרד ומשול

Advanced Algorithms. Problem solving Techniques. Divide and Conquer הפרד ומשול Advanced Algorithms Problem solving Techniques. Divide and Conquer הפרד ומשול 1 Divide and Conquer A method of designing algorithms that (informally) proceeds as follows: Given an instance of the problem

More information

Lecture 3. Recurrences / Heapsort

Lecture 3. Recurrences / Heapsort Lecture 3. Recurrences / Heapsort T. H. Cormen, C. E. Leiserson and R. L. Rivest Introduction to Algorithms, 3rd Edition, MIT Press, 2009 Sungkyunkwan University Hyunseung Choo choo@skku.edu Copyright

More information

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa

Multithreaded Algorithms Part 1. Dept. of Computer Science & Eng University of Moratuwa CS4460 Advanced d Algorithms Batch 08, L4S2 Lecture 11 Multithreaded Algorithms Part 1 N. H. N. D. de Silva Dept. of Computer Science & Eng University of Moratuwa Announcements Last topic discussed is

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 30 August 2018 Outline for Today Threaded programming

More information

INTRODUCTION. Analysis: Determination of time and space requirements of the algorithm

INTRODUCTION. Analysis: Determination of time and space requirements of the algorithm INTRODUCTION A. Preliminaries: Purpose: Learn the design and analysis of algorithms Definition of Algorithm: o A precise statement to solve a problem on a computer o A sequence of definite instructions

More information

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment.

CSE 3101: Introduction to the Design and Analysis of Algorithms. Office hours: Wed 4-6 pm (CSEB 3043), or by appointment. CSE 3101: Introduction to the Design and Analysis of Algorithms Instructor: Suprakash Datta (datta[at]cse.yorku.ca) ext 77875 Lectures: Tues, BC 215, 7 10 PM Office hours: Wed 4-6 pm (CSEB 3043), or by

More information

Computer Science & Engineering 423/823 Design and Analysis of Algorithms

Computer Science & Engineering 423/823 Design and Analysis of Algorithms Computer Science & Engineering 423/823 Design and Analysis of s Lecture 01 Medians and s (Chapter 9) Stephen Scott (Adapted from Vinodchandran N. Variyam) 1 / 24 Spring 2010 Given an array A of n distinct

More information

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer

Computer Science 385 Analysis of Algorithms Siena College Spring Topic Notes: Divide and Conquer Computer Science 385 Analysis of Algorithms Siena College Spring 2011 Topic Notes: Divide and Conquer Divide and-conquer is a very common and very powerful algorithm design technique. The general idea:

More information

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT.

Chapter 3:- Divide and Conquer. Compiled By:- Sanjay Patel Assistant Professor, SVBIT. Chapter 3:- Divide and Conquer Compiled By:- Assistant Professor, SVBIT. Outline Introduction Multiplying large Integers Problem Problem Solving using divide and conquer algorithm - Binary Search Sorting

More information

4. Sorting and Order-Statistics

4. Sorting and Order-Statistics 4. Sorting and Order-Statistics 4. Sorting and Order-Statistics The sorting problem consists in the following : Input : a sequence of n elements (a 1, a 2,..., a n ). Output : a permutation (a 1, a 2,...,

More information

Introduction to Multithreaded Algorithms

Introduction to Multithreaded Algorithms Introduction to Multithreaded Algorithms CCOM5050: Design and Analysis of Algorithms Chapter VII Selected Topics T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein. Introduction to algorithms, 3 rd

More information

Recurrences and Divide-and-Conquer

Recurrences and Divide-and-Conquer Recurrences and Divide-and-Conquer Frits Vaandrager Institute for Computing and Information Sciences 16th November 2017 Frits Vaandrager 16th November 2017 Lecture 9 1 / 35 Algorithm Design Strategies

More information

Cache-Efficient Algorithms

Cache-Efficient Algorithms 6.172 Performance Engineering of Software Systems LECTURE 8 Cache-Efficient Algorithms Charles E. Leiserson October 5, 2010 2010 Charles E. Leiserson 1 Ideal-Cache Model Recall: Two-level hierarchy. Cache

More information

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Chapter 4. Divide-and-Conquer. Copyright 2007 Pearson Addison-Wesley. All rights reserved. Chapter 4 Divide-and-Conquer Copyright 2007 Pearson Addison-Wesley. All rights reserved. Divide-and-Conquer The most-well known algorithm design strategy: 2. Divide instance of problem into two or more

More information

CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk

CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk CS 6463: AT Computational Geometry Spring 2006 Convex Hulls Carola Wenk 8/29/06 CS 6463: AT Computational Geometry 1 Convex Hull Problem Given a set of pins on a pinboard and a rubber band around them.

More information

CSE 421 Closest Pair of Points, Master Theorem, Integer Multiplication

CSE 421 Closest Pair of Points, Master Theorem, Integer Multiplication CSE 421 Closest Pair of Points, Master Theorem, Integer Multiplication Shayan Oveis Gharan 1 Finding the Closest Pair of Points Closest Pair of Points (non geometric) Given n points and arbitrary distances

More information

Multithreaded Parallelism on Multicore Architectures

Multithreaded Parallelism on Multicore Architectures Multithreaded Parallelism on Multicore Architectures Marc Moreno Maza University of Western Ontario, Canada CS2101 November 2012 Plan 1 Multicore programming Multicore architectures 2 Cilk / Cilk++ / Cilk

More information

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer

17/05/2018. Outline. Outline. Divide and Conquer. Control Abstraction for Divide &Conquer. Outline. Module 2: Divide and Conquer Module 2: Divide and Conquer Divide and Conquer Control Abstraction for Divide &Conquer 1 Recurrence equation for Divide and Conquer: If the size of problem p is n and the sizes of the k sub problems are

More information

Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101

More information

Algorithm Design and Analysis

Algorithm Design and Analysis Algorithm Design and Analysis LECTURE 13 Divide and Conquer Closest Pair of Points Convex Hull Strassen Matrix Mult. Adam Smith 9/24/2008 A. Smith; based on slides by E. Demaine, C. Leiserson, S. Raskhodnikova,

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 02 - CS 9535 arallelism Complexity Measures 2 cilk for Loops 3 Measuring

More information

Divide-and-Conquer. Dr. Yingwu Zhu

Divide-and-Conquer. Dr. Yingwu Zhu Divide-and-Conquer Dr. Yingwu Zhu Divide-and-Conquer The most-well known algorithm design technique: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances independently

More information

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006

CS302 Topic: Algorithm Analysis #2. Thursday, Sept. 21, 2006 CS302 Topic: Algorithm Analysis #2 Thursday, Sept. 21, 2006 Analysis of Algorithms The theoretical study of computer program performance and resource usage What s also important (besides performance/resource

More information

Multithreaded Programming in Cilk. Matteo Frigo

Multithreaded Programming in Cilk. Matteo Frigo Multithreaded Programming in Cilk Matteo Frigo Multicore challanges Development time: Will you get your product out in time? Where will you find enough parallel-programming talent? Will you be forced to

More information

CS 303 Design and Analysis of Algorithms

CS 303 Design and Analysis of Algorithms Mid-term CS 303 Design and Analysis of Algorithms Review For Midterm Dong Xu (Based on class note of David Luebke) 12:55-1:55pm, Friday, March 19 Close book Bring your calculator 30% of your final score

More information

CS Divide and Conquer

CS Divide and Conquer CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/

More information

List Ranking. Chapter 4

List Ranking. Chapter 4 List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ

More information

List Ranking. Chapter 4

List Ranking. Chapter 4 List Ranking Chapter 4 Problem on linked lists 2-level memory model List Ranking problem Given a (mono directional) linked list L of n items, compute the distance of each item from the tail of L. Id Succ

More information

Case Study: Matrix Multiplication. 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017

Case Study: Matrix Multiplication. 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017 Case Study: Matrix Multiplication 6.S898: Advanced Performance Engineering for Multicore Applications February 22, 2017 1 4k-by-4k Matrix Multiplication Version Implementation Running time (s) GFLOPS Absolute

More information

Divide and Conquer 4-0

Divide and Conquer 4-0 Divide and Conquer 4-0 Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain

More information

Shared-memory Parallel Programming with Cilk Plus

Shared-memory Parallel Programming with Cilk Plus Shared-memory Parallel Programming with Cilk Plus John Mellor-Crummey Department of Computer Science Rice University johnmc@rice.edu COMP 422/534 Lecture 4 19 January 2017 Outline for Today Threaded programming

More information

The DAG Model; Analysis of For-Loops; Reduction

The DAG Model; Analysis of For-Loops; Reduction CSE341T 09/06/2017 Lecture 3 The DAG Model; Analysis of For-Loops; Reduction We will now formalize the DAG model. We will also see how parallel for loops are implemented and what are reductions. 1 The

More information

UNIT-2 DIVIDE & CONQUER

UNIT-2 DIVIDE & CONQUER Overview: Divide and Conquer Master theorem Master theorem based analysis for Binary Search Merge Sort Quick Sort Divide and Conquer UNIT-2 DIVIDE & CONQUER Basic Idea: 1. Decompose problems into sub instances.

More information

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice

Plan. 1 Parallelism Complexity Measures. 2 cilk for Loops. 3 Scheduling Theory and Implementation. 4 Measuring Parallelism in Practice lan Multithreaded arallelism and erformance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 2 cilk for Loops 3 4 Measuring arallelism in ractice 5 Announcements

More information

Midterm solutions. n f 3 (n) = 3

Midterm solutions. n f 3 (n) = 3 Introduction to Computer Science 1, SE361 DGIST April 20, 2016 Professors Min-Soo Kim and Taesup Moon Midterm solutions Midterm solutions The midterm is a 1.5 hour exam (4:30pm 6:00pm). This is a closed

More information

Data Structures and Algorithms Week 4

Data Structures and Algorithms Week 4 Data Structures and Algorithms Week. About sorting algorithms. Heapsort Complete binary trees Heap data structure. Quicksort a popular algorithm very fast on average Previous Week Divide and conquer Merge

More information

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 6. The Master Theorem. Some More Examples...

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 6. The Master Theorem. Some More Examples... Taking Stock IE170: Algorithms in Systems Engineering: Lecture 6 Jeff Linderoth Last Time Divide-and-Conquer The Master-Theorem When the World Will End Department of Industrial and Systems Engineering

More information

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms

CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms CS584/684 Algorithm Analysis and Design Spring 2017 Week 3: Multithreaded Algorithms Additional sources for this lecture include: 1. Umut Acar and Guy Blelloch, Algorithm Design: Parallel and Sequential

More information

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs

Algorithms in Systems Engineering IE172. Midterm Review. Dr. Ted Ralphs Algorithms in Systems Engineering IE172 Midterm Review Dr. Ted Ralphs IE172 Midterm Review 1 Textbook Sections Covered on Midterm Chapters 1-5 IE172 Review: Algorithms and Programming 2 Introduction to

More information

Merge Sort. Algorithm Analysis. November 15, 2017 Hassan Khosravi / Geoffrey Tien 1

Merge Sort. Algorithm Analysis. November 15, 2017 Hassan Khosravi / Geoffrey Tien 1 Merge Sort Algorithm Analysis November 15, 2017 Hassan Khosravi / Geoffrey Tien 1 The story thus far... CPSC 259 topics up to this point Priority queue Abstract data types Stack Queue Dictionary Tools

More information

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018

Data Structures and Algorithms Running time, Divide and Conquer January 25, 2018 Data Structures and Algorithms Running time, Divide and Conquer January 5, 018 Consider the problem of computing n for any non-negative integer n. similar looking algorithms to solve this problem. Below

More information

University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 2017

University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 2017 University of Waterloo Department of Electrical and Computer Engineering ECE250 Algorithms and Data Structures Fall 207 Midterm Examination Instructor: Dr. Ladan Tahvildari, PEng, SMIEEE Date: Wednesday,

More information

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions

Divide-and-Conquer. The most-well known algorithm design strategy: smaller instances. combining these solutions Divide-and-Conquer The most-well known algorithm design strategy: 1. Divide instance of problem into two or more smaller instances 2. Solve smaller instances recursively 3. Obtain solution to original

More information

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016

SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY. Lecture 11 CS2110 Spring 2016 1 SEARCHING, SORTING, AND ASYMPTOTIC COMPLEXITY Lecture 11 CS2110 Spring 2016 Time spent on A2 2 Histogram: [inclusive:exclusive) [0:1): 0 [1:2): 24 ***** [2:3): 84 ***************** [3:4): 123 *************************

More information

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer

CSC Design and Analysis of Algorithms. Lecture 6. Divide and Conquer Algorithm Design Technique. Divide-and-Conquer CSC 8301- Design and Analysis of Algorithms Lecture 6 Divide and Conuer Algorithm Design Techniue Divide-and-Conuer The most-well known algorithm design strategy: 1. Divide a problem instance into two

More information

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment

Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology. Assignment Class: V - CE Sankalchand Patel College of Engineering - Visnagar Department of Computer Engineering and Information Technology Sub: Design and Analysis of Algorithms Analysis of Algorithm: Assignment

More information

Recursion. COMS W1007 Introduction to Computer Science. Christopher Conway 26 June 2003

Recursion. COMS W1007 Introduction to Computer Science. Christopher Conway 26 June 2003 Recursion COMS W1007 Introduction to Computer Science Christopher Conway 26 June 2003 The Fibonacci Sequence The Fibonacci numbers are: 1, 1, 2, 3, 5, 8, 13, 21, 34,... We can calculate the nth Fibonacci

More information

Writing Parallel Programs; Cost Model.

Writing Parallel Programs; Cost Model. CSE341T 08/30/2017 Lecture 2 Writing Parallel Programs; Cost Model. Due to physical and economical constraints, a typical machine we can buy now has 4 to 8 computing cores, and soon this number will be

More information

CSC236H Lecture 5. October 17, 2018

CSC236H Lecture 5. October 17, 2018 CSC236H Lecture 5 October 17, 2018 Runtime of recursive programs def fact1(n): if n == 1: return 1 else: return n * fact1(n-1) (a) Base case: T (1) = c (constant amount of work) (b) Recursive call: T

More information

CS Divide and Conquer

CS Divide and Conquer CS483-07 Divide and Conquer Instructor: Fei Li Room 443 ST II Office hours: Tue. & Thur. 1:30pm - 2:30pm or by appointments lifei@cs.gmu.edu with subject: CS483 http://www.cs.gmu.edu/ lifei/teaching/cs483_fall07/

More information

Data Structures and Algorithms Chapter 4

Data Structures and Algorithms Chapter 4 Data Structures and Algorithms Chapter. About sorting algorithms. Heapsort Complete binary trees Heap data structure. Quicksort a popular algorithm very fast on average Previous Chapter Divide and conquer

More information

Divide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n.

Divide-and-Conquer. Combine solutions to sub-problems into overall solution. Break up problem of size n into two equal parts of size!n. Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 25 Pearson-Addon Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each part recursively.

More information

g(n) time to computer answer directly from small inputs. f(n) time for dividing P and combining solution to sub problems

g(n) time to computer answer directly from small inputs. f(n) time for dividing P and combining solution to sub problems .2. Divide and Conquer Divide and conquer (D&C) is an important algorithm design paradigm. It works by recursively breaking down a problem into two or more sub-problems of the same (or related) type, until

More information

Recall from Last Time: Big-Oh Notation

Recall from Last Time: Big-Oh Notation CSE 326 Lecture 3: Analysis of Algorithms Today, we will review: Big-Oh, Little-Oh, Omega (Ω), and Theta (Θ): (Fraternities of functions ) Examples of time and space efficiency analysis Covered in Chapter

More information

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides

Jana Kosecka. Linear Time Sorting, Median, Order Statistics. Many slides here are based on E. Demaine, D. Luebke slides Jana Kosecka Linear Time Sorting, Median, Order Statistics Many slides here are based on E. Demaine, D. Luebke slides Insertion sort: Easy to code Fast on small inputs (less than ~50 elements) Fast on

More information

Massive Data Algorithmics. Lecture 12: Cache-Oblivious Model

Massive Data Algorithmics. Lecture 12: Cache-Oblivious Model Typical Computer Hierarchical Memory Basics Data moved between adjacent memory level in blocks A Trivial Program A Trivial Program: d = 1 A Trivial Program: d = 1 A Trivial Program: n = 2 24 A Trivial

More information

ECE608 - Chapter 15 answers

ECE608 - Chapter 15 answers ¼ À ÈÌ Ê ½ ÈÊÇ Ä ÅË ½µ ½ º¾¹¾ ¾µ ½ º¾¹ µ ½ º¾¹ µ ½ º ¹½ µ ½ º ¹¾ µ ½ º ¹ µ ½ º ¹¾ µ ½ º ¹ µ ½ º ¹ ½¼µ ½ º ¹ ½½µ ½ ¹ ½ ECE608 - Chapter 15 answers (1) CLR 15.2-2 MATRIX CHAIN MULTIPLY(A, s, i, j) 1. if

More information

Analysis of Algorithms - Quicksort -

Analysis of Algorithms - Quicksort - Analysis of Algorithms - Quicksort - Andreas Ermedahl MRTC (Mälardalens Real-Time Reseach Center) andreas.ermedahl@mdh.se Autumn 2003 Quicksort Proposed by C.A.R. Hoare in 962. Divide- and- conquer algorithm

More information

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems.

The divide-and-conquer paradigm involves three steps at each level of the recursion: Divide the problem into a number of subproblems. 2.3 Designing algorithms There are many ways to design algorithms. Insertion sort uses an incremental approach: having sorted the subarray A[1 j - 1], we insert the single element A[j] into its proper

More information

The divide and conquer strategy has three basic parts. For a given problem of size n,

The divide and conquer strategy has three basic parts. For a given problem of size n, 1 Divide & Conquer One strategy for designing efficient algorithms is the divide and conquer approach, which is also called, more simply, a recursive approach. The analysis of recursive algorithms often

More information

CSE 160 Lecture 3. Applications with synchronization. Scott B. Baden

CSE 160 Lecture 3. Applications with synchronization. Scott B. Baden CSE 160 Lecture 3 Applications with synchronization Scott B. Baden Announcements Makeup session on 10/7 1:00 to 2:20, EBU3B 1202 Will be recorded and available for viewing 2013 Scott B. Baden / CSE 160

More information

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence

Plan. Introduction to Multicore Programming. Plan. University of Western Ontario, London, Ontario (Canada) Multi-core processor CPU Coherence Plan Introduction to Multicore Programming Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 1 Multi-core Architecture 2 Race Conditions and Cilkscreen (Moreno Maza) Introduction

More information

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov

Divide and Conquer. Design and Analysis of Algorithms Andrei Bulatov Divide and Conquer Design and Analysis of Algorithms Andrei Bulatov Algorithms Divide and Conquer 4-2 Divide and Conquer, MergeSort Recursive algorithms: Call themselves on subproblem Divide and Conquer

More information

Design and Analysis of Algorithms

Design and Analysis of Algorithms Design and Analysis of Algorithms CSE 5311 Lecture 8 Sorting in Linear Time Junzhou Huang, Ph.D. Department of Computer Science and Engineering CSE5311 Design and Analysis of Algorithms 1 Sorting So Far

More information

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS

PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS Lecture 03-04 PROGRAM EFFICIENCY & COMPLEXITY ANALYSIS By: Dr. Zahoor Jan 1 ALGORITHM DEFINITION A finite set of statements that guarantees an optimal solution in finite interval of time 2 GOOD ALGORITHMS?

More information

Lecture 4 CS781 February 3, 2011

Lecture 4 CS781 February 3, 2011 Lecture 4 CS78 February 3, 2 Topics: Data Compression-Huffman Trees Divide-and-Conquer Solving Recurrence Relations Counting Inversions Closest Pair Integer Multiplication Matrix Multiplication Data Compression

More information

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved.

Chapter 5. Divide and Conquer. Slides by Kevin Wayne. Copyright 2005 Pearson-Addison Wesley. All rights reserved. Chapter 5 Divide and Conquer Slides by Kevin Wayne. Copyright 25 Pearson-Addison Wesley. All rights reserved. Divide-and-Conquer Divide-and-conquer. Break up problem into several parts. Solve each part

More information

COMP 3403 Algorithm Analysis Part 2 Chapters 4 5. Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University

COMP 3403 Algorithm Analysis Part 2 Chapters 4 5. Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University COMP 3403 Algorithm Analysis Part 2 Chapters 4 5 Jim Diamond CAR 409 Jodrey School of Computer Science Acadia University Chapter 4 Decrease and Conquer Chapter 4 43 Chapter 4: Decrease and Conquer Idea:

More information

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once. Chapter 8. Sorting in Linear Time Types of Sort Algorithms The only operation that may be used to gain order information about a sequence is comparison of pairs of elements. Quick Sort -- comparison-based

More information

Data Structures and Algorithms CMPSC 465

Data Structures and Algorithms CMPSC 465 Data Structures and Algorithms CMPSC 465 LECTURES 7-8 More Divide and Conquer Multiplication Adam Smith S. Raskhodnikova and A. Smith; based on slides by E. Demaine and C. Leiserson John McCarthy (1927

More information

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort?

Comparisons. Θ(n 2 ) Θ(n) Sorting Revisited. So far we talked about two algorithms to sort an array of numbers. What is the advantage of merge sort? So far we have studied: Comparisons Insertion Sort Merge Sort Worst case Θ(n 2 ) Θ(nlgn) Best case Θ(n) Θ(nlgn) Sorting Revisited So far we talked about two algorithms to sort an array of numbers What

More information

COS 226 Lecture 4: Mergesort. Merging two sorted files. Prototypical divide-and-conquer algorithm

COS 226 Lecture 4: Mergesort. Merging two sorted files. Prototypical divide-and-conquer algorithm COS 226 Lecture 4: Mergesort Merging two sorted files Prototypical divide-and-conquer algorithm #define T Item merge(t c[], T a[], int N, T b[], int M ) { int i, j, k; for (i = 0, j = 0, k = 0; k < N+M;

More information

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case:

CS S-11 Sorting in Θ(nlgn) 1. Base Case: A list of length 1 or length 0 is already sorted. Recursive Case: CS245-2015S-11 Sorting in Θ(nlgn) 1 11-0: Merge Sort Recursive Sorting Base Case: A list of length 1 or length 0 is already sorted Recursive Case: Split the list in half Recursively sort two halves Merge

More information

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.

Week 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT. Week 10 1 2 3 4 5 6 Sorting 7 8 General remarks We return to sorting, considering and. Reading from CLRS for week 7 1 Chapter 6, Sections 6.1-6.5. 2 Chapter 7, Sections 7.1, 7.2. Discover the properties

More information

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers

Comparisons. Heaps. Heaps. Heaps. Sorting Revisited. Heaps. So far we talked about two algorithms to sort an array of numbers So far we have studied: Comparisons Tree is completely filled on all levels except possibly the lowest, which is filled from the left up to a point Insertion Sort Merge Sort Worst case Θ(n ) Θ(nlgn) Best

More information

Introduction to Algorithms

Introduction to Algorithms Introduction to Algorithms 6.046J/18.401J Lecture 5 Prof. Piotr Indyk Today Order statistics (e.g., finding median) Two O(n) time algorithms: Randomized: similar to Quicksort Deterministic: quite tricky

More information