Parallel Algorithms
Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD
The PRAM Model Parallel Random Access Machine All processors act in lock-step Number of processors is not limited All processors have local memory One global memory accessible to all processors Processors must read and write global memory
A Pram Algorithm Every Processor knows its own index (usually indicated by variable i) Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i];
Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For
Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For
Pointer Jumping Read M[i] Into Total; For k := 1 to lg n Read Next[i] into Ptr If Ptr 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For
Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If
Calculate Node Depth I If there is a Left Child 1-1 To 1 of Left Child 0 From -1 of Left Child
Calculate Node Depth 2 If there is no left child 1-1 0
Calculate Node Depth 3 1-1 If there is a Right Child 0 From -1 of Right Child To 1 of Right Child
Calculate Node Depth 4 1-1 0 If there is no right child
Concurrent Reads & Writes EREW - Exclusive Read, Exclusive Write CREW - Common Read, Exclusive Write CRCW - Common Read, Common Write All common writes must write the same thing Highest Priority Processor wins contest CREW is more powerful than EREW CRCW is more powerful than CREW
Finding Max Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If
CRCW V.S. CREW CRCW Max runs in constant time CREW Max runs in lg n time CRCW cannot be any better than lg p faster than EREW
EREW V.S. CREW Finding Roots by Shortcutting Pointers CREW Runs in lg lg n Time EREW Runs in lg n Time
Optimal Parallel Algorithms NC -- The class of algorithms that run in Θ(log m n) time using Θ(n k ) processors General Boolean Functions Cannot be Computed any Faster than Θ(lg n) Θ(lg n) is optimal for computing the sum of n integers
Parallel Algorithms
Parallel Models Hypercube Butterfly Fully Connected Other Networks Shared Memory v.s. Distributed Memory SIMD v.s. MIMD
The PRAM Model Parallel Random Access Machine All processors act in lock-step Number of processors is not limited All processors have local memory One global memory accessible to all processors Processors must read and write global memory
A Pram Algorithm Every Processor knows its own index (usually indicated by variable i) Vector Sum: Read M[i] Into x; Read M[i+n] Into y; x := x + y; Write x into M[i];
Binary Fan-In Read M[i] into Largest; Write M[i] into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Largest := Maximum(x,Largest); Write Largest into M[i]; Delta := Delta * 2; End For
Parallel Addition Read M[i] into Total; Write 0 into M[i+n]; Delta := 1; For k := 1 to lg n Read M[i+Delta] into x; Total := x + Total; Write Total into M[i]; Delta := Delta * 2; End For
Pointer Jumping Read M[i] Into Total; For k := 1 to lg n Read Next[i] into Ptr If Ptr 0 Then Read M[Ptr] Into x; Total := Total + x; Write Total into M[i]; Read Next[Ptr] Into NewPtr Write NewPtr into Next[i] End If End For
Initialization of Next[i] If i = n Then Write 0 Into Next[i]; Else Write i+1 Into Next[i]; End If
Calculate Node Depth I If there is a Left Child 1-1 To 1 of Left Child 0 From -1 of Left Child
Calculate Node Depth 2 If there is no left child 1-1 0
Calculate Node Depth 3 1-1 If there is a Right Child 0 From -1 of Right Child To 1 of Right Child
Calculate Node Depth 4 1-1 0 If there is no right child
Concurrent Reads & Writes EREW - Exclusive Read, Exclusive Write CREW - Common Read, Exclusive Write CRCW - Common Read, Common Write All common writes must write the same thing Highest Priority Processor wins contest CREW is more powerful than EREW CRCW is more powerful than CREW
Finding Max Square Array of Processors Indexed by i,j Write True into R[i]; Read M[i] into x; Read M[j] into y; If x < y Then Write False Into R[i]; Else If y < x Then Write False Into R[j]; End If
CRCW V.S. CREW CRCW Max runs in constant time CREW Max runs in lg n time CRCW cannot be any better than lg p faster than EREW
EREW V.S. CREW Finding Roots by Shortcutting Pointers CREW Runs in lg lg n Time EREW Runs in lg n Time
Optimal Parallel Algorithms NC -- The class of algorithms that run in Θ(log m n) time using Θ(n k ) processors General Boolean Functions Cannot be Computed any Faster than Θ(lg n) Θ(lg n) is optimal for computing the sum of n integers