COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing

Similar documents
Motivation. Map in Lisp (Scheme) Map/Reduce. MapReduce: Simplified Data Processing on Large Clusters

Worse Is Better. Vijay Gill/Bikash Koley/Paul Schultz for Google Network Architecture Google. June 14, 2010 NANOG 49, San Francisco

Data Centers. Tom Anderson

Agenda. 1 Web search. 2 Web search engines. 3 Web robots, crawler, focused crawling. 4 Web search vs Browsing. 5 Privacy, Filter bubble.

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

image credit Fabien Hermenier Cloud compting 101

image credit Fabien Hermenier Cloud compting 101

Agenda. 1 Web search. 2 Web search engines. 3 Web robots, crawler. 4 Focused Web crawling. 5 Web search vs Browsing. 6 Privacy, Filter bubble

Datacenters. Mendel Rosenblum. CS142 Lecture Notes - Datacenters

CS427 Multicore Architecture and Parallel Computing

ECE5610/CSC6220 Introduction to Parallel and Distribution Computing. Lecture 6: MapReduce in Parallel Computing

Memory-Based Cloud Architectures

The Technology Behind. Slides organized by: Sudhanva Gurumurthi

Clustering Documents. Case Study 2: Document Retrieval

Lecture 2 Distributed Filesystems

Data Centers and Cloud Computing

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Big Data Management and NoSQL Databases

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Data Centers and Cloud Computing. Data Centers

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

CS535 Big Data Fall 2017 Colorado State University 9/19/2017 Sangmi Lee Pallickara Week 5- A.

CMSC 330: Organization of Programming Languages

Availability, Reliability, and Fault Tolerance

The MapReduce Abstraction

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Distributed computing: index building and use

IoT Platform using Geode and ActiveMQ

Ruby Threads Thread Creation. CMSC 330: Organization of Programming Languages. Ruby Threads Condition. Ruby Threads Locks.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

Bringing code to the data: from MySQL to RocksDB for high volume searches

CS 6240: Parallel Data Processing in MapReduce: Module 1. Mirek Riedewald

1. HPC & I/O 2. BioPerl

Distributed File Systems II

A (quick) retrospect. COMPSCI210 Recitation 22th Apr 2013 Vamsi Thummala

Distributed Computations MapReduce. adapted from Jeff Dean s slides

Introduction to Hadoop. Owen O Malley Yahoo!, Grid Team

image credit Fabien Hermenier Cloud compting 101

3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Lecture 3 Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, SOSP 2003

Information Retrieval

Warehouse-Scale Computing

Introduction to MapReduce

Index construction CE-324: Modern Information Retrieval Sharif University of Technology

Building Large-Scale Internet Services. Jeff Dean

Distributed computing: index building and use

Map-Reduce. Marco Mura 2010 March, 31th

Distributed Systems [COMP9243] Session 1, 2018

Programming Systems for Big Data

Information Retrieval

Index Construction. Slides by Manning, Raghavan, Schutze

GFS: The Google File System. Dr. Yingwu Zhu

MapReduce Spark. Some slides are adapted from those of Jeff Dean and Matei Zaharia

Information Retrieval II

Bottleneck Hunters: How Schooner increased MySQL throughput by more than 800% Jeremy Cole

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Parallel Programming Concepts

CS 345A Data Mining. MapReduce

MapReduce. Cloud Computing COMP / ECPE 293A

Lecture 11 Hadoop & Spark

index construct Overview Overview Recap How to construct index? Introduction Index construction Introduction to Recap

CS MapReduce. Vitaly Shmatikov

Distributed Filesystem

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

CS510 Operating System Foundations. Jonathan Walpole

Index construction CE-324: Modern Information Retrieval Sharif University of Technology

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Part II: Software Infrastructure in Data Centers: Distributed Execution Engines

Hadoop/MapReduce Computing Paradigm

Index Construction 1

I/O CANNOT BE IGNORED

I/O CANNOT BE IGNORED

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L3b: Cloud Computing 1

Outline. Lecture 11: EIT090 Computer Architecture. Small-scale MIMD designs. Taxonomy. Anders Ardö. November 25, 2009

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

Lessons Learned While Building Infrastructure Software at Google

Index construction CE-324: Modern Information Retrieval Sharif University of Technology

Introduction to Information Retrieval (Manning, Raghavan, Schutze)

CS November 2018

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

Batch Processing Basic architecture

CS November 2017

CompSci 516: Database Systems

Map Reduce. Yerevan.

EECS 395/495 Lecture 3 Scalable Indexing, Searching, and Crawling

CSCI 5417 Information Retrieval Systems Jim Martin!

Announcements. Reading Material. Map Reduce. The Map-Reduce Framework 10/3/17. Big Data. CompSci 516: Database Systems

Jeffrey D. Ullman Stanford University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Cloud Computing. Leonidas Fegaras University of Texas at Arlington. Web Data Management and XML L12: Cloud Computing 1

The amount of data increases every day Some numbers ( 2012):

Scaling for Humongous amounts of data with MongoDB

2/26/2017. The amount of data increases every day Some numbers ( 2012):

CLOUD-SCALE FILE SYSTEMS

Scaling Distributed Machine Learning with the Parameter Server

Transcription:

- Parallel Computing Lecture 22 November 29, 2018 Datacenters and Large Scale Data Processing

Topics Parallel memory hierarchy extend to include disk storage Google web search Large parallel application Distributed over a large clusters Programming models for large data collections MapReduce Spark 2

Extending the parallel memory hierarchy Incorporate disk storage Parallel transfers to disks Global access to all data distributed Storage Disk distributed Memory Disk Local Memory Local Memory Cache Cache 3

Google data processing Search and other services require parallel processing processing and/or request volume are too large for a single machine Data storage requires replication to tolerate and recover from storage errors for parallel throughput to reduce latency multiple datacenters around the world to reduce latency and long-haul traffic to tolerate network or power failures or bigger disasters 4

Google web search 2002 web statistics (2002) 1 3+ Billion static web pages» doubles every 8 months (2012: 1 Trillion pages) 30% duplication Google usage statistics (2002) 1,2 260 million users» 80% do searches 150 million searches/day (2017: 3.5 billion searches/day)» ~2000 queries/sec average» ~1000 queries/sec minimum query response time» less than 0.25 secs typical» target 0.5 secs max uptime» target 100% Sources: [1] Monika Henzinger, Indexing the Web: A challenge for supercomputing, invited talk, ISC 2002 Heidelberg, June 2002. [2] Urs Hoelzle, Google Linux Cluster, Univ Washington Colloquium, November 2002. 5

Google Linux Cluster (2002) Overview 15,000+ PC cluster node rack» 5 PB disk storage 5 10 15 bytes = 50,000 100GB disks» 100 Mb Ethernet» 1-4 100 GB disks» mid-range processor (P III)» 256 MB - 2GB memory» runs Linux» 100 to 200 nodes» Ethernet switch router/switch» serves ~100 racks» distributes search requests 100Mb node 2 GigE Rack switch Internet 256 GB/s switch / router node node node 100-200 ~100 2 GigE Rack switch 100Mb 100Mb 100Mb 6

Google Web Search: 2010 vs. 1999* # docs: tens of millions to tens of billions queries processed/day: per doc info in index: update latency: months to tens of secs avg. query latency: <1s to <0.2s ~1,000 x ~1,000 x ~3 x ~50,000 x ~5 x more machines * faster machines ~1,000 x * Jeff Dean - Building Software Systems at Google and Lessons Learned 7

Google data centers (recent) 8

The Joys of Real Hardware* Typical first year for a new cluster: ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity loss) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc. Long distance links: wild dogs, sharks, dead horses, drunken hunters, etc. * Jeff Dean - Building Software Systems at Google and Lessons Learned 10

Google query processing steps (simplified) 1. secret sauce to map query to search terms detect query language + fix spelling errors 2. locate search terms in dictionary over 100 M words in dictionary per language 3. for each search term in dictionary use inverted index to locate web pages containing term ordered by page number 4. compute and order pages satisfying full query explicit rules» conjunction, disjunction, etc. of terms» document language implicit rules» search term proximity in documents» location of search terms in document structure» quality of page PAGE RANK 5. Construct synopsis reports from documents in order extract page from cache and highlight search terms in context 10 results returned per query 11

Challenges Query processing how to distribute data structures?» dictionary» inverted index» web pages how to implement query processing algorithms? Fault tolerance component count is very large» 10,000 servers with 3 year MTBF, expect to lose ten a day» 50,000 disks with 10% failing per year is a disk failure every couple of hours» 10-15 undetected bit error rate on I/O is ~50 incorrect bits in 5PB copy Scaling how can the system be designed to scale with» increasing number of queries» increasing size of web (number of pages and total text size)» increasing component failures (as a consequence of scaling up) 12

Google server architecture 13

When designing large distributed applications Numbers Everyone Should Know - Jeff Dean L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1K bytes with Zippy 10,000 ns Send 2K bytes over 1 Gbps network 20,000 ns Read 1 MB sequentially from memory 250,000 ns Round trip within same datacenter 500,000 ns Disk seek 10,000,000 ns Read 1 MB sequentially from network 10,000,000 ns Read 1 MB sequentially from disk 30,000,000 ns Send packet CA->Netherlands->CA 150,000,000 ns 14

Processing large data sets Process data distributed across thousands of disks Large datasets pose an I/O bottleneck» Attach disks to all nodes» Stripe data across disks» How to manage this? MapReduce provides Parallel disk bandwidth Automatic parallelization & distribution Fault tolerance I/O scheduling Monitoring & status updates 15

Map/Reduce Map/Reduce parallel programming schema name inspired by functional language view of the schema Many problems can be approached this way Easy to distribute across nodes Simple failure/retry semantics 16

Map in Lisp (Scheme) (map f list [list 2 list 3 ]) (map square (1 2 3 4)) (1 4 9 16) (reduce + (1 4 9 16)) (+ 16 (+ 9 (+ 4 1) ) ) = 30 17

Map/Reduce a la Google An input file contains a large list of items Each item is a (key,val) pair The file is distributed across disks on p nodes map(key, val) is run on each item in the list emits new-key / new-val pairs map: (k 1,v 1 ) -> list(k 2,v 2 ) reduce(key, vals) is run for each unique key emitted by map() reduce: (k 2, list(v 2 )) -> list(v 2 ) the result is written to a file distributed across disks attached to the nodes 18

Example 1: count words in docs Input consists of (url, contents) pairs map(key=url, val=contents):» For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts):» Sum all 1 s in values list» Emit result (word, sum) 19

Count words - example map(key=url, val=contents): For each word w in contents, emit (w, 1 ) reduce(key=word, values=uniq_counts): Sum all 1 s in values list Emit result (word, sum) see bob throw see spot run see 1 bob 1 run 1 see 1 spot 1 throw 1 bob 1 run 1 see 2 spot 1 throw 1 20

Execution 22

Parallel Execution 23

Experience (10-15 years ago) Rewrote Google's production indexing system using MapReduce Set of 10, 14, 17, 21, 24 MapReduce operations New code is simpler, easier to understand» 3800 lines C++ 700 MapReduce handles failures, slow machines Easy to make indexing faster Redux» add more machines MapReduce proved to be useful abstraction MapReduce has an open-source implementation» Hadoop Extensively used with large datasets» E.g. bioinformatics» focus on problem» let library deal w/ messy details 25

Improving MapReduce Problem: All computation is disk to disk No notion of locality Solution: Spark System for expressing computations on objects distributed on disks Computations are moved to data instead of vice-versa» In-memory data flow model» Decreases number of map/reduce steps 26