Introduction to Data Management CSE 344
|
|
- Valerie Jordan
- 5 years ago
- Views:
Transcription
1 Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE Winter 215 1
2 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Due next Thursday evening Will send out reimbursement codes later CSE Winter 215 2
3 Parallel Data 199 CSE Winter 215 3
4 Parallel Join Data: R(K1,A, B), S(K2, B, C) Query: R(K1,A,B) S(K2,B,C) Shuffle R on R.B and S on S.B Initially, both R and S are horizontally partitioned on K1 and K2 R 1, S 1 R 2, S 2... R P, S P Each server computes the join locally R 1, S 1 R 2, S 2... R P, S P CSE Winter 215 4
5 Data: R(K1,A, B), S(K2, B, C) Query: R(K1,A,B) S(K2,B,C) R1 S1 R2 S2 K1 B K2 B K1 B K2 B Partition M1 M2 Shuffle R1 S1 R2 S2 K1 B K2 B K1 B K2 B Local Join M1 M CSE Winter 215 5
6 Parallel Data 2 CSE Winter 215 6
7 Optional Reading Parallel Data Processing at Massive Scale (MapReduce) Chapter 2 (Sections 1,2,3 only) of Mining of Massive Datasets, by Rajaraman and Ullman CSE Winter 215 7
8 Distributed File System (DFS) For very large files: TBs, PBs Each file is partitioned into chunks, typically 64MB Each chunk is replicated several times ( 3), on different racks, for fault tolerance Implementations: Google s DFS: GFS, proprietary Hadoop s DFS: HDFS, open source CSE Winter 215 8
9 MapReduce Google: paper published 24 Free variant: Hadoop MapReduce = high-level programming model and implementation for large-scale parallel data processing CSE Winter 215 9
10 Typical Problems Solved by MR Read a lot of data Map: extract something you care about from each record Shuffle and Sort Reduce: aggregate, summarize, filter, transform Write the results Paradigm stays the same, change map and reduce functions for different problems CSE Winter slide source: Jeff Dean
11 Data Model Files! A file = a bag of (key, value) pairs A MapReduce program: Input: a bag of (inputkey, value) pairs Output: a bag of (outputkey, value) pairs CSE Winter
12 Step 1: the MAP Phase User provides the MAP-function: Input: (input key, value) Ouput: bag of (intermediate key, value) System applies the map function in parallel to all (input key, value) pairs in the input file CSE Winter
13 Step 2: the REDUCE Phase User provides the REDUCE function: Input: (intermediate key, bag of values) Output: bag of output (values) System groups all pairs with the same intermediate key, and passes the bag of values to the REDUCE function CSE Winter
14 Example Counting the number of occurrences of each word in a large collection of documents Each Document The key = document id (did) The value = set of words (word) map(string key, String value): // key: document name // value: document contents for each word w in value: EmitIntermediate(w, 1 ); reduce(string key, Iterator values): // key: a word // values: a list of counts int result = ; for each v in values: result += ParseInt(v); Emit(AsString(result));
15 MAP REDUCE (did1,v1) (w1,1) (w2,1) (w3,1) Shuffle (w1, (1,1,1,,1)) (w1, 25) (did2,v2) (w1,1) (w2,1) (w2, (1,1, )) (w3,(1 )) (w2, 77) (w3, 12) (did3,v3).... CSE Winter
16 Jobs v.s. Tasks A MapReduce Job One single query, e.g. count the words in all docs More complex queries may consists of multiple jobs A Map Task, or a Reduce Task A group of instantiations of the map-, or reducefunction, which are scheduled on a single worker CSE Winter
17 Workers A worker is a process that executes one task at a time Typically there is one worker per processor, hence 4 or 8 per node CSE Winter
18 MAP Tasks REDUCE Tasks (did1,v1) (w1,1) (w2,1) (w3,1) Shuffle (w1, (1,1,1,,1)) (w1, 25) (did2,v2) (w1,1) (w2,1) (w2, (1,1, )) (w3,(1 )) (w2, 77) (w3, 12) (did3,v3)....
19 MapReduce Execution Details Reduce (Shuffle) Task Output to disk, replicated in cluster Intermediate data goes to local disk Map Task Data not necessarily local File system: GFS or HDFS CSE Winter
20 Implementation There is one master node Master partitions input file into M splits, by key Master assigns workers (=servers) to the M map tasks, keeps track of their progress Workers write their output to local disk, partition into R regions Master assigns workers to the R reduce tasks Reduce workers read regions from the map workers local disks CSE Winter 215 2
21 Interesting Implementation Details Worker failure: Master pings workers periodically, If down then reassigns the task to another worker CSE Winter
22 Interesting Implementation Details Backup tasks: Straggler = a machine that takes unusually long time to complete one of the last tasks. Eg: Bad disk forces frequent correctable errors (3MB/s à 1MB/s) The cluster scheduler has scheduled other tasks on that machine Stragglers are a main reason for slowdown Solution: pre-emptive backup execution of the last few remaining in-progress tasks CSE Winter
23 Executing a Large MapReduce Job CSE Winter
24 Anatomy of a Query Execution Running problem #4 2 nodes = 1 master + 19 workers Using PARALLEL 5 CSE Winter
25 March 213 3/9/13 Hadoop job_ _1 on domu a1 Hadoop job_ _1 on domu a1 User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: domu a1.compute-1.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Succeeded Started at: Sat Mar 9 19:49:21 UTC 213 Finished at: Sat Mar 9 23:33:14 UTC 213 Finished in: 3hrs, 43mins, 52sec Job Cleanup: Successful Black-listed TaskTrackers: 1 Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 16 reduce 1.% 5 5 / 8 Counter Map Reduce Total SLOTS_MILLIS_MAPS 454,162,761 Launched reduce tasks 58 Total time spent by all reduces waiting after reserving slots (ms) Job Counters Rack-local map tasks 7,938 Total time spent by all maps waiting after reserving slots
26 Some other time (March 212) Let s see what happened CSE Winter
27 1h 16min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / duce Completion Graph - close copy sort reduce
28 1h 16min Only 19 reducers active, out of 5. Why? Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / duce Completion Graph - close Copying by 19 reducers in parallel with mappers. When will the other 31 reducers be scheduled? copy sort reduce
29 1h 16min Only 19 reducers active, out of 5. Why? Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 1hrs, 16mins, 33sec Job Cleanup: Pending 3h 5min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 32.42% / Counter Map Reduce Total SLOTS_MILLIS_MAPS 164,62,372 1 Launched reduce tasks Job Counters 9 Rack-local map tasks 5, Launched map tasks 5, File Input Format 5 Bytes Read 4 4 Counters 3 3 S3N_BYTES_READ 175,523,148,98 175,523,148, FileSystemCounters HDFS_BYTES_READ 1,845,837 1,845, FILE_BYTES_WRITTEN ,26,62, ,356, ,351,958,245 Map output materialized 2,444,314,273 2,444,314,273 bytes Reduce Completion Graph - close duce Completion Graph - close 1 Map input records 85,225,193 85,225, Reduce shuffle bytes 99,468,723 copy99,468, Spilled Records When will 173,82,131 the other sort 173,82, Copying by 19 reducers Map output bytes 31 reducers 62,732,457,83 be scheduled? reduce 62,732,457, in parallel with mappers. 4 4 CPU time spent (ms) 55,277,52 2,656,94 57,934, Total committed heap usage 1,956,86,312,96 3,42,83,712 1,959,129,116, (bytes) Map-Reduce Combine 2 input 25 records ,225, ,442, ,668,9 Framework SPLIT_RAW_BYTES 1,845,837 1,845,837 Go back to JobTracker This is Apache Hadoop release.2.25 copy sort reduce
30 1h 16min Only 19 reducers active, out of 5. Why? Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 1hrs, 16mins, 33sec Job Cleanup: Pending 3h 5min Completed. Sorting, and the rest of Reduce may proceed now Speculative Execution Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 33.17% / reduce 4.17% / Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 32.42% / Counter Map Reduce Total SLOTS_MILLIS_MAPS 164,62,372 1 Launched reduce tasks Job Counters 9 Rack-local map tasks 5, Launched map tasks 5, File Input Format 5 Bytes Read 4 4 Counters 3 3 S3N_BYTES_READ 175,523,148,98 175,523,148, FileSystemCounters HDFS_BYTES_READ 1,845,837 1,845, FILE_BYTES_WRITTEN ,26,62, ,356, ,351,958,245 Map output materialized 2,444,314,273 2,444,314,273 bytes Reduce Completion Graph - close duce Completion Graph - close 1 Map input records 85,225,193 85,225, Reduce shuffle bytes 99,468,723 copy99,468, Spilled Records When will 173,82,131 the other sort 173,82, Copying by 19 reducers Map output bytes 31 reducers 62,732,457,83 be scheduled? reduce 62,732,457, in parallel with mappers. 4 4 CPU time spent (ms) 55,277,52 2,656,94 57,934, Total committed heap usage 1,956,86,312,96 3,42,83,712 1,959,129,116, (bytes) Map-Reduce Combine 2 input 25 records ,225, ,442, ,668,9 Framework SPLIT_RAW_BYTES 1,845,837 1,845,837 Go back to JobTracker This is Apache Hadoop release.2.25 copy sort reduce
31 3h 51min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / p Completion Graph - close duce Completion Graph - close copy sort reduce
32 3h 51min Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / p Completion Graph - close duce Completion Graph - close Some of the 19 reducers have finished Next Batch of Reducers started copy sort reduce
33 3h 51min Hadoop job_ _1 on ip Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: 3h 52min User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host: ip ec2.internal Submit Host Address: Submit Host Address: Job-ACLs: All users are allowed Job-ACLs: All users are allowed Job Setup: Successful Job Setup: Successful Status: Running Status: Running Started at: Sun Mar 4 19:8:29 UTC 212 Started at: Sun Mar 4 19:8:29 UTC 212 Running for: 3hrs, 51mins, 19sec Running for: 3hrs, 52mins, 51sec Job Cleanup: Pending Job Cleanup: Pending Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 18 reduce 37.72% / map 1.% / 18 reduce 42.35% / p Completion Graph - close Launched map tasks 15,834 Launched map tasks 15, SLOTS_MILLIS_REDUCES 118,328, File Output Format 8 File Output Format Bytes Written 7 7 Counters Counters 6 6 File Input Format File Input Format 5 Bytes Read 5 Counters Counters 4 4 SLOTS_MILLIS_REDUCES Bytes Written Bytes Read 25,4,19 3 S3N_BYTES_READ 53,591,875,823 53,591,875,823 S3N_BYTES_READ 53,591,875,823 53,591,875, FILE_BYTES_READ 754,835,48 754,835, FILE_BYTES_READ 847,821, ,821,126 FileSystemCounters HDFS_BYTES_READ ,587, ,587,893 FileSystemCounters HDFS_BYTES_READ ,587, ,587,893 FILE_BYTES_WRITTEN 9,616,982,133 85,567,984 1,467,55,117 FILE_BYTES_WRITTEN 9,616,982, ,512,16 1,481,494,149 duce Completion Graph - close HDFS_BYTES_WRITTEN 3,4,371,86 3,4,371,86 Reduce Completion Graph - close HDFS_BYTES_WRITTEN 3,967,197,533 3,967,197, Job Counters Counter Map Reduce Total SLOTS_MILLIS_MAPS 495,799,522 Launched reduce tasks 31 Rack-local map tasks 15,834 Job Counters Some of the 19 reducers have finished Counter Map Reduce Total SLOTS_MILLIS_MAPS 495,799,522 Launched reduce tasks 39 Rack-local map tasks 15,834 Map output materialized 1 Map output materialized 7,311,35,131 7,311,35, ,311,35,131 7,311,35,131 bytes copy bytes copy 8 Map input records 2,51,793,3 sort 2,51,793,3 7 Map input records 2,51,793,3 2,51,793,3 sort Reduce shuffle bytes 2,755,65,871 6 reduce 2,755,65,871 Reduce shuffle Next bytes Batch of 19 reducers 3,489,678,276 3,489,678,276 reduce 5 Spilled Records Next Batch of Reducers 465,817,71 started 26,163, ,981,248 4 Spilled Records 465,817,71 54,94,866 52,758,576 Map output bytes 199,575,247, ,575,247,17 2 Map output bytes 199,575,247,17 199,575,247, Go back to JobTracker
34 4h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close copy sort reduce Go back to JobTracker
35 4h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close Why did we lose some reducers? copy sort reduce Go back to JobTracker
36 Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 99.88% / 3337 reduce 48.42% / Reduce Completion Graph - close h 18min Several servers failed: fetch error. Their map tasks need to be rerun. All reducers are waiting. Why did we lose some reducers? copy sort reduce 7h 1min Hadoop job_ _1 on ip Mappers finished, User: hadoop CPU time spent (ms) 165,59,32 36,329,45 21,388,77 Job Name: PigLatin:DefaultJobName Job File: Total reducers committed heap usage resumed. hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml 5,92,284,372,992 15,76,56,896 5,935,36,933,888 (bytes) Submit Host: ip ec2.internal Submit Host Address: Map-Reduce Combine input records 2,51,793,3 437,117,972 2,938,911,2 Job-ACLs: All users are allowed Framework Job Setup: Successful SPLIT_RAW_BYTES 5,587,893 5,587,893 Status: Running Started at: Sun Mar Reduce 4 19:8:29 input records UTC 212 Running for: 7hrs, Reduce 1mins, input 54sec groups Job Cleanup: Pending Black-listed TaskTrackers: Combine output 3 records 465,817,71 126,918,315 16,55,13 117,266, ,918,315 16,55,13 583,84,327 Physical memory (bytes) 5,781,194,698,752 17,89,435,72 Failed/Killed 5,799,85,133,824 Kind % Complete snapshot Num Tasks Pending Running Complete Killed Task Attempts map 1.% Reduce output records 16,55,11 16,55, / 5968 Virtual memory (bytes) 8,999,333,4,128 29,498,195,968 9,28,831,236,96 reduce 94.15% snapshot / 8 Map output records 2,51,793,3 2,51,793,3 Counter Map Reduce Total Map Completion Graph - close SLOTS_MILLIS_MAPS 676,845, Launched reduce tasks 62 8 Job Counters Rack-local map tasks 21, Launched map tasks 21, SLOTS_MILLIS_REDUCES 39,18,556 3 File Output Format 2 Counters Bytes Written 1 File Input Format Bytes Read Counters S3N_BYTES_READ 53,591,952,796 53,591,952,796 Reduce Completion Graph - close FILE_BYTES_READ 1,921,632,69 1,921,632,69 1 FileSystemCounters HDFS_BYTES_READ 5,587,893 5,587,893 9 copy 8 FILE_BYTES_WRITTEN 9,616,982,133 2,51,943,74 11,668,925,873 7 sort HDFS_BYTES_WRITTEN 9,411,137,927 9,411,137,927 6 reduce 5 Map output materialized 7,311,35,131 7,311,35,131 4 bytes 3 2 Map input records 2,51,793,3 2,51,793,3 1 Reduce shuffle bytes 7,226,95,915 7,226,95, Spilled 15 Records ,817, ,997, ,815,297 Map output bytes 199,575,247,17 199,575,247,17 Go back to JobTracker Go back to JobTracker
37 7h 2min Success! 7hrs, 2mins. Hadoop job_ _1 on ip User: hadoop Job Name: PigLatin:DefaultJobName Job File: hdfs:// :9/mnt/var/lib/hadoop/tmp/mapred/staging/hadoop/.staging/job_ _1/job.xml Submit Host: ip ec2.internal Submit Host Address: Job-ACLs: All users are allowed Job Setup: Successful Status: Succeeded Started at: Sun Mar 4 19:8:29 UTC 212 Finished at: Mon Mar 5 2:28:39 UTC 212 Finished in: 7hrs, 2mins, 1sec Job Cleanup: Successful Black-listed TaskTrackers: 3 Kind % Complete Num Tasks Pending Running Complete Killed Failed/Killed Task Attempts map 1.% / 5968 reduce 1.% 5 5 / copy sort reduce
38 Parallel Data 21 CSE Winter
39 Issues with MapReduce Hides scheduling and parallelization details However, very limited queries Difficult to write more complex queries Need multiple MapReduce jobs Solution: declarative query language CSE Winter
40 Declarative Languages on MR PIG Latin (Yahoo!) New language, like Relational Algebra Open source HiveQL (Facebook) SQL-like language Open source SQL / Tenzing (Google) SQL on MR Proprietary CSE Winter 215 4
41 Implementing PIG Latin Over Hadoop! Parse query: Everything between LOAD and STORE à one logical plan Logical plan à graph of MapReduce ops All statements between two (CO)GROUPs à one MapReduce job CSE Winter
42 [Olston 28] Implementation CSE Winter
43 Review: MapReduce Data is typically a file in the Google File System HDFS for Hadoop File system partitions file into chunks Each chunk is replicated on k (typically 3) machines Each machine can run a few map and reduce tasks simultaneously Each map task consumes one chunk Can adjust how much data goes into each map task using splits Scheduler tries to schedule map task where its input data is located Map output is partitioned across reducers Map output is also written locally to disk Number of reduce tasks is configurable System shuffles data between map and reduce tasks Reducers sort-merge data before consuming it CSE Winter
44 MapReduce Phases Local storage ` CSE Winter
Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344
Where We Are Introduction to Data Management CSE 344 Lecture 22: MapReduce We are talking about parallel query processing There exist two main types of engines: Parallel DBMSs (last lecture + quick review)
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 27: Map Reduce and Pig Latin CSE 344 - Fall 214 1 Announcements HW8 out now, due last Thursday of the qtr You should have received AWS credit code via email.
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationAnnouncements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm
Announcements HW5 is due tomorrow 11pm Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before
More informationDatabase Systems CSE 414
Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) CSE 414 - Fall 2017 1 Announcements HW5 is due tomorrow 11pm HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationAnnouncements. Parallel Data Processing in the 20 th Century. Parallel Join Illustration. Introduction to Database Systems CSE 414
Introduction to Database Systems CSE 414 Lecture 17: MapReduce and Spark Announcements Midterm this Friday in class! Review session tonight See course website for OHs Includes everything up to Monday s
More informationCSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark
CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 10 Parallel Programming Models: Map Reduce and Spark Announcements HW2 due this Thursday AWS accounts Any success? Feel
More informationCSE 344 MAY 2 ND MAP/REDUCE
CSE 344 MAY 2 ND MAP/REDUCE ADMINISTRIVIA HW5 Due Tonight Practice midterm Section tomorrow Exam review PERFORMANCE METRICS FOR PARALLEL DBMSS Nodes = processors, computers Speedup: More nodes, same data
More informationParallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce
Parallel Programming Principle and Practice Lecture 10 Big Data Processing with MapReduce Outline MapReduce Programming Model MapReduce Examples Hadoop 2 Incredible Things That Happen Every Minute On The
More informationDistributed Computations MapReduce. adapted from Jeff Dean s slides
Distributed Computations MapReduce adapted from Jeff Dean s slides What we ve learnt so far Basic distributed systems concepts Consistency (sequential, eventual) Fault tolerance (recoverability, availability)
More informationThe MapReduce Abstraction
The MapReduce Abstraction Parallel Computing at Google Leverages multiple technologies to simplify large-scale parallel computations Proprietary computing clusters Map/Reduce software library Lots of other
More informationMapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia
MapReduce Spark Some slides are adapted from those of Jeff Dean and Matei Zaharia What have we learnt so far? Distributed storage systems consistency semantics protocols for fault tolerance Paxos, Raft,
More informationMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat OSDI 2004 Presented by Zachary Bischof Winter '10 EECS 345 Distributed Systems 1 Motivation Summary Example Implementation
More informationParallel Computing: MapReduce Jin, Hai
Parallel Computing: MapReduce Jin, Hai School of Computer Science and Technology Huazhong University of Science and Technology ! MapReduce is a distributed/parallel computing framework introduced by Google
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 11 Parallel DBMSs and MapReduce
CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 11 Parallel DBMSs and MapReduce References Parallel Database Systems: The Future of High Performance Database
More informationL22: SC Report, Map Reduce
L22: SC Report, Map Reduce November 23, 2010 Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance Google version = Map Reduce; Hadoop = Open source
More informationWhy compute in parallel?
HW 6 releases tonight Announcements Due Nov. 20th Waiting for AWS credit can take up to two days Sign up early: https://aws.amazon.com/education/awseducate/apply/ https://piazza.com/class/jmftm54e88t2kk?cid=452
More informationMI-PDB, MIE-PDB: Advanced Database Systems
MI-PDB, MIE-PDB: Advanced Database Systems http://www.ksi.mff.cuni.cz/~svoboda/courses/2015-2-mie-pdb/ Lecture 10: MapReduce, Hadoop 26. 4. 2016 Lecturer: Martin Svoboda svoboda@ksi.mff.cuni.cz Author:
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationClustering Lecture 8: MapReduce
Clustering Lecture 8: MapReduce Jing Gao SUNY Buffalo 1 Divide and Conquer Work Partition w 1 w 2 w 3 worker worker worker r 1 r 2 r 3 Result Combine 4 Distributed Grep Very big data Split data Split data
More informationThe MapReduce Framework
The MapReduce Framework In Partial fulfilment of the requirements for course CMPT 816 Presented by: Ahmed Abdel Moamen Agents Lab Overview MapReduce was firstly introduced by Google on 2004. MapReduce
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Unit 5: Parallel Data Processing Parallel RDBMS MapReduce Spark (4 lectures) Introduction to Data Management CSE 344 Spark CSE 344-2018au 2 Announcement HW6 posted
More informationDatabase Management Systems CSEP 544. Lecture 6: Query Execution and Optimization Parallel Data processing
Database Management Systems CSEP 544 Lecture 6: Query Execution and Optimization Parallel Data processing CSEP 544 - Fall 2017 1 HW5 due today Announcements HW6 released Please start early! You need to
More informationMapReduce, Hadoop and Spark. Bompotas Agorakis
MapReduce, Hadoop and Spark Bompotas Agorakis Big Data Processing Most of the computations are conceptually straightforward on a single machine but the volume of data is HUGE Need to use many (1.000s)
More informationCompSci 516: Database Systems
CompSci 516 Database Systems Lecture 12 Map-Reduce and Spark Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements Practice midterm posted on sakai First prepare and
More informationMapReduce: Recap. Juliana Freire & Cláudio Silva. Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec
MapReduce: Recap Some slides borrowed from Jimmy Lin, Jeff Ullman, Jerome Simeon, and Jure Leskovec MapReduce: Recap Sequentially read a lot of data Why? Map: extract something we care about map (k, v)
More informationDatabase Applications (15-415)
Database Applications (15-415) Hadoop Lecture 24, April 23, 2014 Mohammad Hammoud Today Last Session: NoSQL databases Today s Session: Hadoop = HDFS + MapReduce Announcements: Final Exam is on Sunday April
More informationAnnouncements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems
Announcements CompSci 516 Database Systems Lecture 12 - and Spark Practice midterm posted on sakai First prepare and then attempt! Midterm next Wednesday 10/11 in class Closed book/notes, no electronic
More informationHadoop Map Reduce 10/17/2018 1
Hadoop Map Reduce 10/17/2018 1 MapReduce 2-in-1 A programming paradigm A query execution engine A kind of functional programming We focus on the MapReduce execution engine of Hadoop through YARN 10/17/2018
More informationIntroduction to MapReduce. Adapted from Jimmy Lin (U. Maryland, USA)
Introduction to MapReduce Adapted from Jimmy Lin (U. Maryland, USA) Motivation Overview Need for handling big data New programming paradigm Review of functional programming mapreduce uses this abstraction
More informationCS 61C: Great Ideas in Computer Architecture. MapReduce
CS 61C: Great Ideas in Computer Architecture MapReduce Guest Lecturer: Justin Hsia 3/06/2013 Spring 2013 Lecture #18 1 Review of Last Lecture Performance latency and throughput Warehouse Scale Computing
More informationMapReduce-II. September 2013 Alberto Abelló & Oscar Romero 1
MapReduce-II September 2013 Alberto Abelló & Oscar Romero 1 Knowledge objectives 1. Enumerate the different kind of processes in the MapReduce framework 2. Explain the information kept in the master 3.
More informationOutline. MapReduce Data Model. MapReduce. Step 2: the REDUCE Phase. Step 1: the MAP Phase 11/29/11. Introduction to Data Management CSE 344
Outline Introduction to Data Management CSE 344 Review of MapReduce Introduction to Pig System Pig Latin tutorial Lecture 23: Pig Latin Some slides are courtesy of Alan Gates, Yahoo!Research 1 2 MapReduce
More informationLarge-Scale GPU programming
Large-Scale GPU programming Tim Kaldewey Research Staff Member Database Technologies IBM Almaden Research Center tkaldew@us.ibm.com Assistant Adjunct Professor Computer and Information Science Dept. University
More informationBig Data Management and NoSQL Databases
NDBI040 Big Data Management and NoSQL Databases Lecture 2. MapReduce Doc. RNDr. Irena Holubova, Ph.D. holubova@ksi.mff.cuni.cz http://www.ksi.mff.cuni.cz/~holubova/ndbi040/ Framework A programming model
More informationParallel Programming Concepts
Parallel Programming Concepts MapReduce Frank Feinbube Source: MapReduce: Simplied Data Processing on Large Clusters; Dean et. Al. Examples for Parallel Programming Support 2 MapReduce 3 Programming model
More informationPart A: MapReduce. Introduction Model Implementation issues
Part A: Massive Parallelism li with MapReduce Introduction Model Implementation issues Acknowledgements Map-Reduce The material is largely based on material from the Stanford cources CS246, CS345A and
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationCloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe
Cloud Programming Programming Environment Oct 29, 2015 Osamu Tatebe Cloud Computing Only required amount of CPU and storage can be used anytime from anywhere via network Availability, throughput, reliability
More informationParallel Nested Loops
Parallel Nested Loops For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on (S 1,T 1 ), (S 1,T 2 ),
More informationParallel Partition-Based. Parallel Nested Loops. Median. More Join Thoughts. Parallel Office Tools 9/15/2011
Parallel Nested Loops Parallel Partition-Based For each tuple s i in S For each tuple t j in T If s i =t j, then add (s i,t j ) to output Create partitions S 1, S 2, T 1, and T 2 Have processors work on
More informationCSE 414: Section 7 Parallel Databases. November 8th, 2018
CSE 414: Section 7 Parallel Databases November 8th, 2018 Agenda for Today This section: Quick touch up on parallel databases Distributed Query Processing In this class, only shared-nothing architecture
More informationBig Data Programming: an Introduction. Spring 2015, X. Zhang Fordham Univ.
Big Data Programming: an Introduction Spring 2015, X. Zhang Fordham Univ. Outline What the course is about? scope Introduction to big data programming Opportunity and challenge of big data Origin of Hadoop
More informationMapReduce & Resilient Distributed Datasets. Yiqing Hua, Mengqi(Mandy) Xia
MapReduce & Resilient Distributed Datasets Yiqing Hua, Mengqi(Mandy) Xia Outline - MapReduce: - - Resilient Distributed Datasets (RDD) - - Motivation Examples The Design and How it Works Performance Motivation
More information"Big Data" Open Source Systems. CS347: Map-Reduce & Pig. Motivation for Map-Reduce. Building Text Index - Part II. Building Text Index - Part I
"Big Data" Open Source Systems CS347: Map-Reduce & Pig Hector Garcia-Molina Stanford University Infrastructure for distributed data computations Map-Reduce, S4, Hyracks, Pregel [Storm, Mupet] Components
More informationIntroduction to Map Reduce
Introduction to Map Reduce 1 Map Reduce: Motivation We realized that most of our computations involved applying a map operation to each logical record in our input in order to compute a set of intermediate
More informationBatch Processing Basic architecture
Batch Processing Basic architecture in big data systems COS 518: Distributed Systems Lecture 10 Andrew Or, Mike Freedman 2 1 2 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 64GB RAM 32 cores 3
More informationCS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University
CS 555: DISTRIBUTED SYSTEMS [MAPREDUCE] Shrideep Pallickara Computer Science Colorado State University Frequently asked questions from the previous class survey Bit Torrent What is the right chunk/piece
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationDistributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015
Distributed Systems 18. MapReduce Paul Krzyzanowski Rutgers University Fall 2015 November 21, 2016 2014-2016 Paul Krzyzanowski 1 Credit Much of this information is from Google: Google Code University [no
More informationDatabases 2 (VU) ( / )
Databases 2 (VU) (706.711 / 707.030) MapReduce (Part 3) Mark Kröll ISDS, TU Graz Nov. 27, 2017 Mark Kröll (ISDS, TU Graz) MapReduce Nov. 27, 2017 1 / 42 Outline 1 Problems Suited for Map-Reduce 2 MapReduce:
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2013/14 MapReduce & Hadoop The new world of Big Data (programming model) Overview of this Lecture Module Background Cluster File
More information6.830 Lecture Spark 11/15/2017
6.830 Lecture 19 -- Spark 11/15/2017 Recap / finish dynamo Sloppy Quorum (healthy N) Dynamo authors don't think quorums are sufficient, for 2 reasons: - Decreased durability (want to write all data at
More informationPLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS
PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS By HAI JIN, SHADI IBRAHIM, LI QI, HAIJUN CAO, SONG WU and XUANHUA SHI Prepared by: Dr. Faramarz Safi Islamic Azad
More informationMapReduce-style data processing
MapReduce-style data processing Software Languages Team University of Koblenz-Landau Ralf Lämmel and Andrei Varanovich Related meanings of MapReduce Functional programming with map & reduce An algorithmic
More informationIntroduction to MapReduce
Basics of Cloud Computing Lecture 4 Introduction to MapReduce Satish Srirama Some material adapted from slides by Jimmy Lin, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet, Google Distributed
More informationCS6030 Cloud Computing. Acknowledgements. Today s Topics. Intro to Cloud Computing 10/20/15. Ajay Gupta, WMU-CS. WiSe Lab
CS6030 Cloud Computing Ajay Gupta B239, CEAS Computer Science Department Western Michigan University ajay.gupta@wmich.edu 276-3104 1 Acknowledgements I have liberally borrowed these slides and material
More informationCloud Computing CS
Cloud Computing CS 15-319 Programming Models- Part III Lecture 6, Feb 1, 2012 Majd F. Sakr and Mohammad Hammoud 1 Today Last session Programming Models- Part II Today s session Programming Models Part
More informationScaling Up 1 CSE 6242 / CX Duen Horng (Polo) Chau Georgia Tech. Hadoop, Pig
CSE 6242 / CX 4242 Scaling Up 1 Hadoop, Pig Duen Horng (Polo) Chau Georgia Tech Some lectures are partly based on materials by Professors Guy Lebanon, Jeffrey Heer, John Stasko, Christos Faloutsos, Le
More informationBig Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing
Big Data Analytics Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Big Data "The world is crazy. But at least it s getting regular analysis." Izabela
More informationMapReduce. Kiril Valev LMU Kiril Valev (LMU) MapReduce / 35
MapReduce Kiril Valev LMU valevk@cip.ifi.lmu.de 23.11.2013 Kiril Valev (LMU) MapReduce 23.11.2013 1 / 35 Agenda 1 MapReduce Motivation Definition Example Why MapReduce? Distributed Environment Fault Tolerance
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationHDFS: Hadoop Distributed File System. CIS 612 Sunnie Chung
HDFS: Hadoop Distributed File System CIS 612 Sunnie Chung What is Big Data?? Bulk Amount Unstructured Introduction Lots of Applications which need to handle huge amount of data (in terms of 500+ TB per
More informationMap Reduce Group Meeting
Map Reduce Group Meeting Yasmine Badr 10/07/2014 A lot of material in this presenta0on has been adopted from the original MapReduce paper in OSDI 2004 What is Map Reduce? Programming paradigm/model for
More informationProgramming Models MapReduce
Programming Models MapReduce Majd Sakr, Garth Gibson, Greg Ganger, Raja Sambasivan 15-719/18-847b Advanced Cloud Computing Fall 2013 Sep 23, 2013 1 MapReduce In a Nutshell MapReduce incorporates two phases
More informationPrinciples of Data Management. Lecture #16 (MapReduce & DFS for Big Data)
Principles of Data Management Lecture #16 (MapReduce & DFS for Big Data) Instructor: Mike Carey mjcarey@ics.uci.edu Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Today s News Bulletin
More informationHADOOP FRAMEWORK FOR BIG DATA
HADOOP FRAMEWORK FOR BIG DATA Mr K. Srinivas Babu 1,Dr K. Rameshwaraiah 2 1 Research Scholar S V University, Tirupathi 2 Professor and Head NNRESGI, Hyderabad Abstract - Data has to be stored for further
More informationSurvey Paper on Traditional Hadoop and Pipelined Map Reduce
International Journal of Computational Engineering Research Vol, 03 Issue, 12 Survey Paper on Traditional Hadoop and Pipelined Map Reduce Dhole Poonam B 1, Gunjal Baisa L 2 1 M.E.ComputerAVCOE, Sangamner,
More informationMapReduce. Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan
MapReduce Simplified Data Processing on Large Clusters (Without the Agonizing Pain) Presented by Aaron Nathan The Problem Massive amounts of data >100TB (the internet) Needs simple processing Computers
More informationMapReduce: Simplified Data Processing on Large Clusters 유연일민철기
MapReduce: Simplified Data Processing on Large Clusters 유연일민철기 Introduction MapReduce is a programming model and an associated implementation for processing and generating large data set with parallel,
More information2/26/2017. For instance, consider running Word Count across 20 splits
Based on the slides of prof. Pietro Michiardi Hadoop Internals https://github.com/michiard/disc-cloud-course/raw/master/hadoop/hadoop.pdf Job: execution of a MapReduce application across a data set Task:
More informationTITLE: PRE-REQUISITE THEORY. 1. Introduction to Hadoop. 2. Cluster. Implement sort algorithm and run it using HADOOP
TITLE: Implement sort algorithm and run it using HADOOP PRE-REQUISITE Preliminary knowledge of clusters and overview of Hadoop and its basic functionality. THEORY 1. Introduction to Hadoop The Apache Hadoop
More informationDatabase System Architectures Parallel DBs, MapReduce, ColumnStores
Database System Architectures Parallel DBs, MapReduce, ColumnStores CMPSCI 445 Fall 2010 Some slides courtesy of Yanlei Diao, Christophe Bisciglia, Aaron Kimball, & Sierra Michels- Slettvet Motivation:
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationMotivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters
Motivation MapReduce: Simplified Data Processing on Large Clusters These are slides from Dan Weld s class at U. Washington (who in turn made his slides based on those by Jeff Dean, Sanjay Ghemawat, Google,
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2012/13
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2012/13 MapReduce & Hadoop The new world of Big Data (programming model) Overview of this Lecture Module Background Google MapReduce
More informationAdvanced Data Management Technologies
ADMT 2017/18 Unit 16 J. Gamper 1/53 Advanced Data Management Technologies Unit 16 MapReduce J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Acknowledgements: Much of the information
More informationHadoop MapReduce Framework
Hadoop MapReduce Framework Contents Hadoop MapReduce Framework Architecture Interaction Diagram of MapReduce Framework (Hadoop 1.0) Interaction Diagram of MapReduce Framework (Hadoop 2.0) Hadoop MapReduce
More informationCS 345A Data Mining. MapReduce
CS 345A Data Mining MapReduce Single-node architecture CPU Machine Learning, Statistics Memory Classical Data Mining Disk Commodity Clusters Web data sets can be very large Tens to hundreds of terabytes
More informationChapter 5. The MapReduce Programming Model and Implementation
Chapter 5. The MapReduce Programming Model and Implementation - Traditional computing: data-to-computing (send data to computing) * Data stored in separate repository * Data brought into system for computing
More informationApache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context
1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes
More information1. Introduction to MapReduce
Processing of massive data: MapReduce 1. Introduction to MapReduce 1 Origins: the Problem Google faced the problem of analyzing huge sets of data (order of petabytes) E.g. pagerank, web access logs, etc.
More informationMap Reduce. Yerevan.
Map Reduce Erasmus+ @ Yerevan dacosta@irit.fr Divide and conquer at PaaS 100 % // Typical problem Iterate over a large number of records Extract something of interest from each Shuffle and sort intermediate
More informationCS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.
Credit Much of this information is from Google: Google Code University [no longer supported] http://code.google.com/edu/parallel/mapreduce-tutorial.html Distributed Systems 18. : The programming model
More informationProgramming Systems for Big Data
Programming Systems for Big Data CS315B Lecture 17 Including material from Kunle Olukotun Prof. Aiken CS 315B Lecture 17 1 Big Data We ve focused on parallel programming for computational science There
More informationPage 1. Goals for Today" Background of Cloud Computing" Sources Driving Big Data" CS162 Operating Systems and Systems Programming Lecture 24
Goals for Today" CS162 Operating Systems and Systems Programming Lecture 24 Capstone: Cloud Computing" Distributed systems Cloud Computing programming paradigms Cloud Computing OS December 2, 2013 Anthony
More informationIntroduction to MapReduce
732A54 Big Data Analytics Introduction to MapReduce Christoph Kessler IDA, Linköping University Towards Parallel Processing of Big-Data Big Data too large to be read+processed in reasonable time by 1 server
More informationIntroduction to MapReduce. Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng.
Introduction to MapReduce Instructor: Dr. Weikuan Yu Computer Sci. & Software Eng. Before MapReduce Large scale data processing was difficult! Managing hundreds or thousands of processors Managing parallelization
More informationBatch Inherence of Map Reduce Framework
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.287
More informationImproving the MapReduce Big Data Processing Framework
Improving the MapReduce Big Data Processing Framework Gistau, Reza Akbarinia, Patrick Valduriez INRIA & LIRMM, Montpellier, France In collaboration with Divyakant Agrawal, UCSB Esther Pacitti, UM2, LIRMM
More informationGoogle: A Computer Scientist s Playground
Google: A Computer Scientist s Playground Jochen Hollmann Google Zürich und Trondheim joho@google.com Outline Mission, data, and scaling Systems infrastructure Parallel programming model: MapReduce Googles
More informationGoogle: A Computer Scientist s Playground
Outline Mission, data, and scaling Google: A Computer Scientist s Playground Jochen Hollmann Google Zürich und Trondheim joho@google.com Systems infrastructure Parallel programming model: MapReduce Googles
More informationCSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA
CSE 3302 Lecture 11: Map/Reduce 7 October 2010 Nate Nystrom UTA 378,000 results in 0.17 seconds including images and video communicates with 1000s of machines web server index servers document servers
More informationThe amount of data increases every day Some numbers ( 2012):
1 The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect
More information3. Monitoring Scenarios
3. Monitoring Scenarios This section describes the following: Navigation Alerts Interval Rules Navigation Ambari SCOM Use the Ambari SCOM main navigation tree to browse cluster, HDFS and MapReduce performance
More informationMapReduce: A Programming Model for Large-Scale Distributed Computation
CSC 258/458 MapReduce: A Programming Model for Large-Scale Distributed Computation University of Rochester Department of Computer Science Shantonu Hossain April 18, 2011 Outline Motivation MapReduce Overview
More information2/26/2017. The amount of data increases every day Some numbers ( 2012):
The amount of data increases every day Some numbers ( 2012): Data processed by Google every day: 100+ PB Data processed by Facebook every day: 10+ PB To analyze them, systems that scale with respect to
More informationIn the news. Request- Level Parallelism (RLP) Agenda 10/7/11
In the news "The cloud isn't this big fluffy area up in the sky," said Rich Miller, who edits Data Center Knowledge, a publicaeon that follows the data center industry. "It's buildings filled with servers
More informationMapReduce and Friends
MapReduce and Friends Craig C. Douglas University of Wyoming with thanks to Mookwon Seo Why was it invented? MapReduce is a mergesort for large distributed memory computers. It was the basis for a web
More information