Agenda. Request- Level Parallelism. Agenda. Anatomy of a Web Search. Google Query- Serving Architecture 9/20/10

Agenda CS 61C: Great Ideas in Computer Architecture (Machine Structures) Instructors: Randy H. Katz David A. PaHerson hhp://inst.eecs.berkeley.edu/~cs61c/fa10 Request and Data Level Parallelism Administrivia Technology Break Map- Reduce Examples 9/20/10 Fall 2010 - - Lecture #10 1 9/20/10 Fall 2010 - - Lecture #10 2 Agenda Request- Level Parallelism Request and Data Level Parallelism Administrivia Technology Break Map- Reduce Examples Hundreds or thousands of requests per second Not your laptop or cell- phone, but popular Internet services like Google search Such requests are largely independent LiHle read- write (aka producer- consumer ) sharing Mostly involve read- only databases Rarely involve read write data sharing or synchroniza_on across requests Computa_on easily par oned within a request and across different requests 9/20/10 Fall 2010 - - Lecture #10 3 9/20/10 Fall 2010 - - Lecture #10 4 Google Query- Serving Architecture Anatomy of a Web Search Google David A. PaHerson Direct request to closest Google datacenter Front- end load balancer directs request to one of many clusters within the datacenter Within cluster, select one of many Google Web Servers (GWS) to handle the request and compose the response pages GWS communicates with Index Servers to find documents that contain the search words, David, PaHerson Return document list with associated relevance score 9/20/10 Fall 2010 - - Lecture #10 5 9/20/10 Fall 2010 - - Lecture #10 6 1

Anatomy of a Web Search In parallel, Spell checker: Did you mean David Paterson? Ad system: books by PaHerson at Amazon.com Images of David PaHerson Use docids to access indexed documents Compose the page Result document extracts (with keyword in context) ordered by relevance score Sponsored links (along the top) and adver_sements (along the sides) 9/20/10 Fall 2010 - - Lecture #10 7 9/20/10 Fall 2010 - - Lecture #10 8 Anatomy of a Web Search Implementa_on strategy Randomly distribute the entries Make many copies (aka replicas) Load balance requests across replicas Redundant copies of indices and documents Breaks up hot spots, e.g., Jus_n Bieber Increases opportuni_es for request- level parallelism Makes the system more tolerant of failures Data- Parallel Divide and Conquer (Map- Reduce Processing) Map: Slice data into shards, distribute these to workers, compute sub- problem solu_ons map(in_key,in_value)->list(out_key,intermediate value)! Processes input key/value pair Produces set of intermediate pairs Reduce: Collect and combine sub- problem solu_ons reduce(out_key,list(intermediate_value))->list(out_value)! Combines all intermediate values for a par_cular key Produces a set of merged output values (usually just one)! 9/20/10 Fall 2010 - - Lecture #10 9 9/20/10 Fall 2010 - - Lecture #10 10 Google Uses MR For E.g.: Extrac_ng the set of outgoing links from a collec_on of HTML documents and aggrega_ng by target document S_tching together overlapping satellite images to remove seams and to select high- quality imagery for Google Earth Genera_ng a collec_on of inverted index files using a compression scheme tuned for efficient support of Google search queries Processing all road segments in the world and rendering map _le images that display these segments for Google Maps Fault- tolerant parallel execu_on of programs wrihen in higher- level languages across a collec_on of input data Map- Reduce Processing Time Line Master assigns map + reduce tasks to worker machines As soon as a map task finishes, it can be assigned a new map or reduce task Data shuffle begins as soon as a given Map finishes Reduce task begins as soon as all data shuffles finish 9/20/10 Fall 2010 - - Lecture #10 11 9/20/10 Fall 2010 - - Lecture #10 12 2

9/20/10 Fall 2010 - - Lecture #10 13 9/20/10 Fall 2010 - - Lecture #10 14 9/20/10 Fall 2010 - - Lecture #10 15 9/20/10 Fall 2010 - - Lecture #10 16 9/20/10 Fall 2010 - - Lecture #10 17 9/20/10 Fall 2010 - - Lecture #10 18 3

9/20/10 Fall 2010 - - Lecture #10 19 9/20/10 Fall 2010 - - Lecture #10 20 9/20/10 Fall 2010 - - Lecture #10 21 9/20/10 Fall 2010 - - Lecture #10 22 Agenda Request and Data- Level Parallelism Administrivia Technology Break Map- Reduce Examples 9/20/10 Fall 2010 - - Lecture #10 23 9/20/10 Fall 2010 - - Lecture #10 24 4

hhp://www.ny_mes.com/2010/09/20/technology/20arm.html 25 Agenda 27 Data Level Parallelism Administrivia Technology Break Map- Reduce Examples Map- Reduce Processing Example: Count Word Occurrences 26 Agenda Data Level Parallelism Administrivia Technology Break Map- Reduce Examples 28 Map- Reduce Processing: Execu_on Pseudo Code: map(string input_key, String input_value):! // input_key: document name! // input_value: document contents! for each word w in input_value:! EmitIntermediate(w, "1");! Map reduce(string output_key, Iterator intermediate_values):! // output_key: a word! // output_values: a list of counts! int result = 0;! for each v in intermediate_values:! result += ParseInt(v);! Emit(AsString(result));! 29 Reduce Map key k1 hashes to Reduce Task 2 Map key k2 hashes to Reduce Task 1 Map key k3 hashes to Reduce Task 2 Map key k4 hashes to Reduce Task 1 Map key k5 hashes to Reduce Task 1 Shuﬄe Reduce/Combine Collect Results 30 5

Another Example: Word Index (How Oqen Does a Word Appear?) Distribute Shuffle Collect that that is is that that is not is not is that it it is Map 1 Map 2 Map 3 Map 4 is 1, that 2 is 1, that 2 is 2, not 2 is 2, it 2, that 1 is 1,1,2,2 that 2,2,1 it 2 not 2 Reduce 1 Reduce 2 is 6; it 2 not 2; that 5 is 6; it 2; not 2; that 5 9/20/10 Fall 2010 - - Lecture #10 31 Map- Reduce Failure Handling On worker failure: Detect failure via periodic heartbeats Re- execute completed and in- progress map tasks Re- execute in progress reduce tasks Task comple_on commihed through master Master failure: Could handle, but don't yet (master failure unlikely) Robust: lost 1600 of 1800 machines once, but finished fine 9/20/10 Fall 2010 - - Lecture #10 32 Map- Reduce Redundant Execu_on Slow workers significantly lengthen comple_on _me Other jobs consuming resources on machine Bad disks with soq errors transfer data very slowly Weird things: processor caches disabled (!!) Solu_on: Near end of phase, spawn backup copies of tasks Whichever one finishes first "wins" Effect: Drama_cally shortens job comple_on _me Map- Reduce Locality Op_miza_on Master scheduling policy: Asks GFS for loca_ons of replicas of input file blocks Map tasks typically split into 64MB (== GFS block size) Map tasks scheduled so GFS input block replica are on same machine or same rack Effect: Thousands of machines read input at local disk speed Without this, rack switches limit read rate 9/19/10 Fall 2010 - - Lecture #10 33 9/19/10 Fall 2010 - - Lecture #10 34 Summary Request- Level Parallelism High request volume, each largely independent of the other Use replica_on for beher request throughput and availability Map- Reduce Data Parallelism Divide large data set into pieces for independent parallel processing Combine and process intermediate results to obtain final result 9/20/10 Fall 2010 - - Lecture #10 39 6