Models of Computation for Massive Data

Size: px

Start display at page:

Download "Models of Computation for Massive Data"

Asher Stafford
6 years ago
Views:

1 Models of Computation for Massive Data Jeff M. Phillips August 28, 2013

2 Outline Sequential: External Memory / (I/O)-Efficient Streaming Parallel: P and BSP MapReduce GP-GPU Distributed Computing runtime data size

3 Model model (Von Neumann Architecture): and Memory Operations (+,,,...) constant time Data stored as words, not bits. Read, Write take constant time.

4 Today s Reality Disk What your computer actually looks like: 164 GB 3+ layers of memory hierarchy. Small number of s. Many variations! bigger 2 GB 4 MB L2 faster L1 L1

5 Model model (Von Neumann Architecture): and Memory Operations (+,,,...) constant time Data stored as words, not bits. Read, Write take constant time.

6 External Memory Model D M block I/O P N = size of problem instance B = size of disk block M = number of items that fits in Memory T = number of items in output I/O = block move between Memory and Disk

7 External Memory Model D M block I/O P N = size of problem instance B = size of disk block M = number of items that fits in Memory T = number of items in output I/O = block move between Memory and Disk Advanced Data Structures: Sorting, Searching

8 Streaming Model memory length m word 2 [n] makes one pass on data Ordered set A = a 1, a 2,..., a m Each a i [n], size log n Compute f (A) or maintain f (A i ) for A i = a 1, a 2,..., a i. Space restricted to S = O(poly(log m, log n)). Updates O(poly(S)) for each a i.

9 Streaming Model memory length m word 2 [n] makes one pass on data Ordered set A = a 1, a 2,..., a m Each a i [n], size log n Compute f (A) or maintain f (A i ) for A i = a 1, a 2,..., a i. Space restricted to S = O(poly(log m, log n)). Updates O(poly(S)) for each a i. Advanced Algorithms: Approximate, Randomized

10 P Many (p) processors. Access shared memory: EREW : Exclusive Read Exclusive Write CREW : Concurrent Read Exclusive Write CRCW : Concurrent Read Concurrent Write Simple model, but has shortcomings......such as Synchronization. 1 2 p

11 P Many (p) processors. Access shared memory: EREW : Exclusive Read Exclusive Write CREW : Concurrent Read Exclusive Write CRCW : Concurrent Read Concurrent Write Simple model, but has shortcomings......such as Synchronization. 1 2 p Advanced Algorithms

12 Bulk Synchronous Parallel Each Processor has its own Memory Parallelism Procedes in Rounds: 1. Compute: Each processor computes on its own Data: w i. 2. Synchronize: Each processor sends messages to others: s i = MessSize CommCost. 3. Barrier: All processors wait until others done. Runtime: max w i + max s i Pro: Captures Parallelism and Synchronization Con: Ignores Locality.

13 Bulk Synchronous Parallel Each Processor has its own Memory Parallelism Procedes in Rounds: 1. Compute: Each processor computes on its own Data: w i. 2. Synchronize: Each processor sends messages to others: s i = MessSize CommCost. 3. Barrier: All processors wait until others done. Runtime: max w i + max s i Pro: Captures Parallelism and Synchronization Con: Ignores Locality.

14 Bulk Synchronous Parallel Each Processor has its own Memory Parallelism Procedes in Rounds: 1. Compute: Each processor computes on its own Data: w i. 2. Synchronize: Each processor sends messages to others: s i = MessSize CommCost. 3. Barrier: All processors wait until others done. Runtime: max w i + max s i Pro: Captures Parallelism and Synchronization Con: Ignores Locality.

15 MapReduce Each Processor has full hard drive, data items < key, value >. Parallelism Procedes in Rounds: Map: assigns items to processor by key. Reduce: processes all items using value. Usually combines many items with same key. Repeat M+R a constant number of times, often only one round. Optional post-processing step. MAP REDUCE Pro: Robust (duplication) and simple. Can harness Locality Con: Somewhat restrictive model

16 MapReduce Each Processor has full hard drive, data items < key, value >. Parallelism Procedes in Rounds: Map: assigns items to processor by key. Reduce: processes all items using value. Usually combines many items with same key. Repeat M+R a constant number of times, often only one round. Optional post-processing step. MAP REDUCE Pro: Robust (duplication) and simple. Can harness Locality Con: Somewhat restrictive model Advanced Algorithms

17 General Purpose GPU Massive parallelism on your desktop. Uses Graphics Processing Unit. Designed for efficient video rasterizing. Each processor corresponds to pixel p depth buffer: D(p) = min i x w i w i color buffer: C(p) = i α iχ i p... Pro: Fine grain, massive parallelism. Cheap. Harnesses Locality. Con: Somewhat restrictive model, hierarchy. Small memory. X

18 Distributed Computing Many small slow processors with data. Communication very expensive. Report to base station Merge tree base station Unorganized (peer-to-peer) Data collection or Distribution

19 Distributed Computing Many small slow processors with data. Communication very expensive. Report to base station Merge tree Unorganized (peer-to-peer) base station Data collection or Distribution

20 Distributed Computing Many small slow processors with data. Communication very expensive. Report to base station Merge tree Unorganized (peer-to-peer) Data collection or Distribution

21 Distributed Computing Many small slow processors with data. Communication very expensive. Report to base station Merge tree Unorganized (peer-to-peer) Data collection or Distribution Advanced Algorithms: Approximate, Randomized

22 Themes What are course goals? How to analyze algorithms in each model Taste of how to use each model When to use each model

23 Themes What are course goals? How to analyze algorithms in each model Taste of how to use each model When to use each model Work Plan: 1-3 weeks each model. Background and Model. Example algorithms analysis in each model. I/O Stream Parallel MapReduce GPU Distributed

24 Class Work 1 Credit Students: Attend Class. (some Fridays less important) Ask Questions. If above lacking, may have quizzes. Scribing Notes, Video-taping Lectures, or Giving Lectures. 3 Credit Students: Must also do a project! Project Proposal (Aug 30). Approved or Rejected by Sept 4. Intermediate Report (Oct 23). Presentations (Dec 11 or 13).

25 Sequential Review

26 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory)

27 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time

28 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max):

29 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max): TM: O(n) VNA: O(n)

30 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max): TM: O(n) VNA: O(n) Sorting:

31 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max): TM: O(n) VNA: O(n) Sorting: TM: O(n 2 ) VNA: O(n log n)

32 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max): TM: O(n) VNA: O(n) Sorting: TM: O(n 2 ) VNA: O(n log n) Searching:

33 Sequential Review Turing Machines (Alan Turing 1936) Single Tape: MoveL, MoveR, read, write each constant time content pointer memory, infinite tape (memory) Von Neumann Architecture (Von Neumann + Eckert + Mauchly 1945) based on ENIAC + Memory (): read, write, op = constant time How fast are the following? Scanning (max): TM: O(n) VNA: O(n) Sorting: TM: O(n 2 ) VNA: O(n log n) Searching: TM: O(n) VNA: O(log n)

34 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search

35 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15

36 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge

37 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge Bubble hour 9 days -

38 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge Bubble hour 9 days - Complexity Theory: log: poly log(n) = log c n (... need to load data)

39 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge Bubble hour 9 days - Complexity Theory: log: poly log(n) = log c n (... need to load data) P : poly(n) = n c (many cool algorithms)

40 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge Bubble hour 9 days - Complexity Theory: log: poly log(n) = log c n (... need to load data) P : poly(n) = n c (many cool algorithms) EXP: exp(n) = c n (usually hopeless... but n not bad)

41 Asymptotics How large (in seconds) is: Searching (log n) Max (n) Merge-Sort (n log n) Bubble-Sort (n 2 )... or Dynamic Programming n = Search Max >15 Merge Bubble hour 9 days - Complexity Theory: log: poly log(n) = log c n (... need to load data) P : poly(n) = n c (many cool algorithms) EXP: exp(n) = c n (usually hopeless... but n not bad) NP: verify solution in P, find solution conjectured EXP (If EXP number parallel machines, then in P time)

42 Data Group Data Group Meeting 12:15-1:30pm in LCR (to be confirmed)

Introduction to Parallel Algorithm Analysis

Introduction to Parallel Algorithm Analysis Jeff M. Phillips October 4, 2013 Petri Nets C. A. Petri [1962] introduced analysis model for concurrent systems. Flow chart Described data flow and dependencies.