Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Similar documents
Distributed computing: index building and use

CS November 2017

CS November 2018

CS 347 Parallel and Distributed Data Processing

Distributed computing: index building and use

CS 347 Parallel and Distributed Data Processing

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Distributed Systems. 18. MapReduce. Paul Krzyzanowski. Rutgers University. Fall 2015

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

CS /5/18. Paul Krzyzanowski 1. Credit. Distributed Systems 18. MapReduce. Simplest environment for parallel processing. Background.

CS377: Database Systems Text data and information. Li Xiong Department of Mathematics and Computer Science Emory University

Google: A Computer Scientist s Playground

Google: A Computer Scientist s Playground

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

CA485 Ray Walshe Google File System

Map-Reduce. Marco Mura 2010 March, 31th

The MapReduce Abstraction

Distributed Systems 16. Distributed File Systems II

Index Construction. Dictionary, postings, scalable indexing, dynamic indexing. Web Search

Georgia Institute of Technology ECE6102 4/20/2009 David Colvin, Jimmy Vuong

The Google File System

The Google File System

CS /29/18. Paul Krzyzanowski 1. Question 1 (Bigtable) Distributed Systems 2018 Pre-exam 3 review Selected questions from past exams

Distributed File Systems II

Lessons Learned While Building Infrastructure Software at Google

Plumbing the Web. Narayanan Shivakumar. Google Distinguished Entrepreneur & Director

Distributed Systems Pre-exam 3 review Selected questions from past exams. David Domingo Paul Krzyzanowski Rutgers University Fall 2018

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

GFS: The Google File System. Dr. Yingwu Zhu

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Map Reduce. Yerevan.

index construct Overview Overview Recap How to construct index? Introduction Index construction Introduction to Recap

CS 345A Data Mining. MapReduce

The Google File System

The Google File System

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 12 Google Bigtable

CSE 124: Networked Services Lecture-17

MapReduce. U of Toronto, 2014

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Introduction to Map Reduce

Google File System and BigTable. and tiny bits of HDFS (Hadoop File System) and Chubby. Not in textbook; additional information

BigTable A System for Distributed Structured Storage

The Anatomy of a Large-Scale Hypertextual Web Search Engine

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

CSE Lecture 11: Map/Reduce 7 October Nate Nystrom UTA

Warehouse-Scale Computing

Distributed Systems. Fall 2017 Exam 3 Review. Paul Krzyzanowski. Rutgers University. Fall 2017

3-2. Index construction. Most slides were adapted from Stanford CS 276 course and University of Munich IR course.

Efficiency. Efficiency: Indexing. Indexing. Efficiency Techniques. Inverted Index. Inverted Index (COSC 488)

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

NPTEL Course Jan K. Gopinath Indian Institute of Science

Information Retrieval

big picture parallel db (one data center) mix of OLTP and batch analysis lots of data, high r/w rates, 1000s of cheap boxes thus many failures

Lecture 9: MIMD Architectures

Extreme Computing. NoSQL.

Cloud Programming. Programming Environment Oct 29, 2015 Osamu Tatebe

BigTable: A Distributed Storage System for Structured Data

CISC 7610 Lecture 2b The beginnings of NoSQL

Staggeringly Large File Systems. Presented by Haoyan Geng

Bigtable. Presenter: Yijun Hou, Yixiao Peng

BigData and Map Reduce VITMAC03

Cluster-Level Google How we use Colossus to improve storage efficiency

CSE 124: Networked Services Lecture-16

Information Retrieval II

Data Analysis Using MapReduce in Hadoop Environment

Distributed Filesystem

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

CS 61C: Great Ideas in Computer Architecture. MapReduce

CSE 124: Networked Services Fall 2009 Lecture-19

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

MapReduce & BigTable

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

18-hdfs-gfs.txt Thu Nov 01 09:53: Notes on Parallel File Systems: HDFS & GFS , Fall 2012 Carnegie Mellon University Randal E.

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

The Google File System

Tools for Social Networking Infrastructures

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

RAMCloud. Scalable High-Performance Storage Entirely in DRAM. by John Ousterhout et al. Stanford University. presented by Slavik Derevyanko

Sharding Introduction

Introduction to MapReduce

The Google File System. Alexandru Costan

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

Lecture 12 DATA ANALYTICS ON WEB SCALE

CLOUD-SCALE FILE SYSTEMS

Index construction CE-324: Modern Information Retrieval Sharif University of Technology

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2013/14

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

BigTable: A System for Distributed Structured Storage

PLATFORM AND SOFTWARE AS A SERVICE THE MAPREDUCE PROGRAMMING MODEL AND IMPLEMENTATIONS

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

CS 655 Advanced Topics in Distributed Systems

Transcription:

Distributed Systems 05r. Case study: Google Cluster Architecture Paul Krzyzanowski Rutgers University Fall 2016 1

A note about relevancy This describes the Google search cluster architecture in the mid 2000s. The search infrastructure was overhauled in 2010 (we ll get to this at the end). Nevertheless, the lessons are still valid and this demonstrates how incredible scalability has been achieved using commodity computers by exploiting parallelism. 2

What is needed? A single Google search query Reads hundreds of megabytes of data Uses tens of billions of CPU cycles Environment needs to support tens of thousands of queries per second Environment must be Fault tolerant Economical (price-performance ratio matters) Energy efficient (this affects price; watts per unit of performance matters) Workload should be highly parallelizable CPU performance matters less than price/performance ratio 3

Key design principles Have reliability reside in software, not hardware Use low-cost (unreliable) commodity s to build a high-end cluster Replicate services across machines & detect failures Design for best total throughput, not peak server response time Response time can be controlled by parallelizing requests Rely on replication: this helps with availability too Price/performance ratio more important than peak performance 4

Step 1: DNS User s browser must map google.com to an IP address google.com comprises multiple clusters distributed worldwide Each cluster contains thousands of machines DNS-based load balancing Select cluster by taking user s geographic & network proximity into account Load balance across clusters DNS Google s Loadbalanced DNS 1. Contact DNS server(s) to find the DNS server responsible for google.com 2. Google s DNS server returns addresses based on location of request 3. Contact the appropriate cluster 5

Step 2: Send HTTP request IP address corresponds to a load balancer within a cluster Load balancer Monitors the set of Google Web Servers (GWS) Performs local load balancing of requests among available servers GWS machine receives the query Coordinates the execution of the query Formats results into an HTML response to the user Google Web Server Query Coordination Hardware Load Balancer Google Web Server Google Web Server Google Web Server Data center 6

Step 3. Find documents via inverted index Index Servers Map each query word {list of documents} (hit list) Inverted index generated from web crawlers MapReduce Intersect the hit lists of each per-word query Compute relevance score for each document Determine set of documents Sort by relevance score Query word Query word Query word Document list Document list Document list Intersect 7

Parallel search through an inverted index Inverted index is 10s of terabytes Search is parallelized Index is divided into index shards Each index shard is built from a randomly chosen subset of documents Pool of machines serves requests for each shard Pools are load balanced Query goes to one machine per pool responsible for a shard Final result is ordered list of document identifiers (docids) Index server: shard 0 Index server: shard 0 Google Web Server Index server: shard 1 Index server: Index shard server: 0 shard 1Index server: shard N Index server: shard 1 Index server: shard N Index server: shard N 8

Index Servers Shard 0 Shard 1 Shard 2 Shard 3 Shard N 9

Step 4. Get title & URL for each docid For each docid, the GWS looks up Page title URL Relevant text: document summary specific to the query Handled by document servers (docservers) 10

Parallelizing document lookup Like index lookup, document lookup is partitioned & parallelized Documents distributed into smaller shards Each shard = subset of documents Pool of load-balanced servers responsible for processing each shard Together, document servers access a cached copy of the entire web! Docserver: shard 0 Docserver: shard 0 Google Web Server Docserver: shard 1 Docserver: shard Docserver: 0 shard 1 Docserver: shard 1 Docserver: shard N Docserver: shard N Docserver: shard N 11

Additional operations In parallel with search: Send query to a spell-checking system Send query to an ad-serving system to generate ads When all the results are in, GWS generates HTML output: Sorted query results With page titles, summaries, and URLs Ads Spelling correction suggestions Index server Index server Index server Index server Hardware Load Balancer Google Web Server Docserver Docserver Docserver Docserver Spell checker Ad server 12

Lesson: exploit parallelism Instead of looking up matching documents in a large index Do many lookups for documents in smaller indices Merge results together: merging is simple & inexpensive Divide the stream of incoming queries Among geographically-distributed clusters Load balance among query servers within a cluster Linear performance improvement with more machines Shards don t need to communicate with each other Increase # of shards across more machines to improve performance 13

Change to Caffeine In 2010, Google remodeled its search infrastructure Old system Based on MapReduce (on GFS) to generate index files Batch process: next phase of MapReduce cannot start until first is complete Web crawling MapReduce propagation Initially, Google updated its index every 4 months. Around 2000, it reindexed and propagated changes every month Process took about 10 days Users hitting different servers might get different results New system, named Caffeine Fully incremental system: Based on BigTable running on GFS2 Support indexing many more documents: ~100 petabytes High degree of interactivity: web crawlers can update tables dynamically Analyze web continuously in small chunks Identify pages that are likely to change frequently BTW, MapReduce is not dead. Caffeine uses it in some places, as do lots of other services. 14

GFS to GFS2 GFS was designed with MapReduce in mind But found lots of other applications Designed for batch-oriented operations Problems Single master node in charge of chunkservers All info (metadata) about files is stored in the master s memory limits total number of files Problems when storage grew to tens of petabytes (10 12 bytes) Automatic failover added (but still takes 10 seconds) Designed for high throughput but delivers high latency: master can become a bottleneck Delays due to recovering from a failed replica chunkserver delay the client GFS2 Distributed masters Support smaller files: chunks go from 64 MB to 1 MB Designed specifically for BigTable (does not make GFS obsolete) 15

Main References Web Search for a Planet: The Google Cluster Architecture Luiz André Barroso, Jeffrey Dean, Urs Hölzle Google, Inc. research.google.com/archive/googlecluster.html Our new search index: Caffeine The Official Google Blog http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html GFS: Evolution on Fast-forward Marshall Kirk McKusick, Sean Qunlan Association for Computing Machinery, August 2009 http://queue.acm.org/detail.cfm?id=1594206 Google search index splits with MapReduce Cade Metz The Register, September 2010 http://sept/2009/08/12/google_file_system_part_deux/ Google File System II: Dawn of the Multiplying Master Nodes Cade Metz The Register, August 2009 http://www.theregister.co.uk/2010/09/09/google_caffeine_explained/ Exclusive: How Google s Algorithm Rules the Web Steven Levy Wired Magazine, March 2010 http://www.wired.com/magazine/2010/02/ff_google_algorithm/all/1 16

The End 17