Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections )

Similar documents
Lecture: Networks, Disks, Datacenters, GPUs. Topics: networks wrap-up, disks and reliability, datacenters, GPU intro (Sections

ΕΠΛ372 Παράλληλη Επεξεργάσια

Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

Warehouse-Scale Computing

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

Concepts Introduced in Chapter 6. Warehouse-Scale Computers. Programming Models for WSCs. Important Design Factors for WSCs

Today s Lecture. CS 61C: Great Ideas in Computer Architecture (Machine Structures) Map Reduce

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

CS/EE 6810: Computer Architecture

Instructors Randy H. Katz hdp://inst.eecs.berkeley.edu/~cs61c/fa13. Power UVlizaVon Efficiency. Datacenter. Infrastructure.

Green Computing: Datacentres

Green Computing: Datacentres

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 17 Datacenters and Cloud Compu5ng

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Intelligent Placement of Datacenters for Internet Services. Íñigo Goiri, Kien Le, Jordi Guitart, Jordi Torres, and Ricardo Bianchini

CS 61C: Great Ideas in Computer Architecture. MapReduce

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Datacenters. Mendel Rosenblum. CS142 Lecture Notes - Datacenters

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

PCAP: Performance-Aware Power Capping for the Disk Drive in the Cloud

Data Center Fundamentals: The Datacenter as a Computer

CS 655 Advanced Topics in Distributed Systems

Introduction To Cloud Computing

The End of Redundancy. Alan Wood Sun Microsystems May 8, 2009

vrealize Business Standard User Guide

Lecture 23: Storage Systems. Topics: disk access, bus design, evaluation metrics, RAID (Sections )

Making the impossible data center a reality.

Big Data Management and NoSQL Databases

Introduction to MapReduce

Data Centers and Cloud Computing. Data Centers

Lessons Learned While Building Infrastructure Software at Google

Energy Usage in Cloud Part2. Salih Safa BACANLI

Topics. CIT 470: Advanced Network and System Administration. Google DC in The Dalles. Google DC in The Dalles. Data Centers

Energy Management of MapReduce Clusters. Jan Pohland

context: massive systems

Improving the MapReduce Big Data Processing Framework

Hadoop/MapReduce Computing Paradigm

Next-Generation Cloud Platform

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

Data Center UPS Systems Why 1% Efficiency Matters

DATA CENTER COLOCATION BUILD VS. BUY

Parallelizing Multiple Group by Query in Shared-nothing Environment: A MapReduce Study Case

CSE 490h, Autumn 2008 Scalable Systems: Design, Implementation and Use of Large Scale Clusters

Lecture 2: Performance

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

CS Project Report

Embedded Technosolutions

Lecture 11 Hadoop & Spark

Tools for Social Networking Infrastructures

Infrastructure Innovation Opportunities Y Combinator 2013

Lecture 12 DATA ANALYTICS ON WEB SCALE

Where We Are. Review: Parallel DBMS. Parallel DBMS. Introduction to Data Management CSE 344

Composable Infrastructure for Public Cloud Service Providers

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing, MapReduce, and Spark

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Lecture 10.1 A real SDN implementation: the Google B4 case. Antonio Cianfrani DIET Department Networking Group netlab.uniroma1.it

SIX Trends in our World

Databases 2 (VU) ( / )

Hot vs Cold Energy Efficient Data Centers. - SVLG Data Center Center Efficiency Summit

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Dremel: Interactice Analysis of Web-Scale Datasets

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing, MapReduce, and Spark

Towards Energy Proportional Cloud for Data Processing Frameworks

Interested in conducting your own webinar?

Data Center Fundamentals: The Datacenter as a Computer

COMPARING COST MODELS - DETAILS

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University

A Scalable, Commodity Data Center Network Architecture

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

what is cloud computing?

Optimization in Data Centres. Jayantha Siriwardana and Saman K. Halgamuge Department of Mechanical Engineering Melbourne School of Engineering

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Networking Issues For Big Data

Server room guide helps energy managers reduce server consumption

Low-Latency Datacenters. John Ousterhout

Lecture 7: Data Center Networks

Cassandra - A Decentralized Structured Storage System. Avinash Lakshman and Prashant Malik Facebook

MapReduce. U of Toronto, 2014

Advanced Computer Architecture (CS620)

Cloud Computing Economies of Scale

Apache Spark Graph Performance with Memory1. February Page 1 of 13

Introduction to Data Management CSE 344

Data Center Infrastructure Management (D.C.I.M.)

Internet Scale Storage

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

CISC 7610 Lecture 2b The beginnings of NoSQL

BigData and Map Reduce VITMAC03

Simplifying the Datacenter with Hyper-Convergence. Bob O Donnell, Founder and Chief Analyst

VM Migration Acceleration over 40GigE Meet SLA & Maximize ROI

Expert Reference Series of White Papers. Understanding Data Centers and Cloud Computing

SpongeFiles: Mitigating Data Skew in MapReduce Using Distributed Memory. Khaled Elmeleegy Turn Inc.

Will it one day be green to shun the internet?

HPE Flexible Capacity

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Driving Data Warehousing with iomemory

Transcription:

Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections 6.1-6.7) 1

Warehouse-Scale Computer (WSC) 100K+ servers in one WSC ~$150M overall cost Requests from millions of users (Google, Facebook, etc.) Cloud Computing: a model where users can rent compute and storage within a WSC, there s an associated service-level agreement (SLA) Datacenter: a collection of WSCs in a single building, possibly belonging to different clients and using different hardware/architecture 2

Workloads Typically, software developed in-house MapReduce, BigTable, etc. MapReduce: embarrassingly parallel operations performed on very large datasets, e.g., search on a keyword, aggregate a count over several documents Hadoop is an open-source implementation of the MapReduce framework; makes it easy for users to write MapReduce programs without worrying about low-level task/data management 3

MapReduce Application-writer provides Map and Reduce functions that operate on key-value pairs Each map function operates on a collection of records; a record is (say) a webpage or a facebook user profile The records are in the file system and scattered across several servers; thousands of map functions are spawned to work on all records in parallel The Reduce function aggregates and sorts the results produced by the Mappers, also performed in parallel 4

MR Framework Duties Replicate data for fault tolerance Detect failed threads and re-start threads Handle variability in thread response times Use of MR within Google has been growing every year: Aug 04 Sep 09 Number of MR jobs has increased 100x+ Data being processed has increased 100x+ Number of servers per job has increased 3x 5

WSC Hierarchy A rack can hold 48 1U servers (1U is 1.75 inches high and is the maximum height for a server unit) A rack switch is used for communication within and out of a rack; an array switch connects an array of racks Latency grows if data is fetched from remote DRAM or disk (300us vs. 0.1us for DRAM and 12ms vs. 10ms for disk ) Bandwidth within a rack is much higher than between arrays; hence, software must be aware of data placement and locality 6

Power Delivery and Efficiency Figure 6.9 Power distribution and where losses occur. Note that the best improvement is 11%. (From Hamilton [2010].) Source: H&P Textbook Copyright 2011, Elsevier Inc. All rights Reserved. 7

PUE Metric and Power Breakdown PUE = Total facility power / IT equipment power It is greater than 1; ranges from 1.33 to 3.03, median of 1.69 The cooling power is roughly half the power used by servers Within a server (circa 2007), the power distribution is as follows: Processors (33%), DRAM memory (30%), Disks (10%), Networking (5%), Miscellaneous (22%) 8

CapEx and OpEx Capital expenditure: infrastructure costs for the building, power delivery, cooling, and servers Operational expenditure: the monthly bill for energy, failures, personnel, etc. CapEx can be amortized into a monthly estimate by assuming that the facilities will last 10 years, server parts will last 3 years, and networking parts will last 4 9

CapEx/OpEx Case Study 8 MW facility : facility cost: $88M, server/networking cost: $79M Monthly expense: $3.8M. Breakdown: Servers 53% (amortized CapEx) Networking 8% (amortized CapEx) Power/cooling infrastructure 20% (amortized CapEx) Other infrastructure 4% (amortized CapEx) Monthly power bill 13% (true OpEx) Monthly personnel salaries 2% (true OpEx) 10

Improving Energy Efficiency An unloaded server dissipates a large amount of power Ideally, we want energy-proportional computing, but in reality, servers are not energy-proportional Can approach energy-proportionality by turning on a few servers that are heavily utilized See figures on next two slides for power/utilization profile of a server and a utilization profile of servers in a WSC 11

Power/Utilization Profile Source: H&P textbook. Copyright 2011, Elsevier Inc. All rights Reserved. 12

Server Utilization Profile Source: H&P textbook. Copyright 2011, Elsevier Inc. All rights Reserved. Figure 6.3 Average CPU utilization of more than 5000 servers during a 6-month period at Google. Servers are rarely completely idle or fully utilized, in-stead operating most of the time at between 10% and 50% of their maximum utilization. (From Figure 1 in Barroso and Hölzle [2007].) The column the third from the right in Figure 6.4 calculates percentages plus or minus 5% to come up with the weightings; thus, 1.2% for the 90% row means that 1.2% of servers were between 85% and 95% utilized. 13

Other Metrics Performance does matter, especially latency An analysis of the Bing search engine shows that if a 200ms delay is introduced in the response, the next click by the user is delayed by 500ms; so a poor response time amplifies the user s non-productivity Reliability (MTTF) and Availability (MTTF/MTTF+MTTR) are very important, given the large scale A server with MTTF of 25 years (amazing!) : 50K servers would lead to 5 server failures a day; Similarly, annual disk failure rate is 2-10% 1 disk failure every hour 14

Title Bullet 15