Building warehouse-scale computers or what s it like to supply exponential growth. john wilkes

Similar documents
Hva er den reelle business-verdien ved software definert datasenter (SDDC) Kjell-Einar Anderssen Country Manager - Norway

Data Center Fundamentals: The Datacenter as a Computer

Investor day. November 17, IT business Laurent Vernerey Executive Vice President

SUBMARINE CABLE DEVELOPMENTS. Karthik Kailasam Jan 16, 2017; PTC Hawaii

Warehouse-Scale Computing

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

Nominee: Prefabricated, Pre Assembled Data Centre Modules by Schneider Electric

VMware Virtual SAN Technology

Capacity Planning. for Web Operations. John Allspaw Operations Engineering

Data Centers: The Case For Energy Risk Management and Energy Reliability

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

An introductory look. cloud computing in education

Powering Transformation With Cisco

Optimize Your Data Center. Gary Hill VP, Business Development DVL Group, Inc.

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

Data Protection for Virtualized Environments

Taking Hyper-converged Infrastructure to a New Level of Performance, Efficiency and TCO

Markus Kujala, Systems Engineering Manager

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 17 Datacenters and Cloud Compu5ng

Green IT: Start Saving Money

Ten things hyperconvergence can do for you

Cloud 3.0 and Software Defined Networking October 28, Amin Vahdat on behalf of Google Technical Infratructure Google Fellow

Analyze Big Data Faster and Store It Cheaper

Disclosures Statements in this presentation that refer to Business Outlook, future plans and expectations are forward-looking statements that involve

The next step in Software-Defined Storage with Virtual SAN

NON ISO Modular Datacenter

Data Centers and Cloud Computing. Data Centers

Data centers control: challenges and opportunities

Building the Ecosystem for ARM Servers

Public Cloud Leverage For IT/Business Alignment Business Goals Agility to speed time to market, adapt to market demands Elasticity to meet demand whil

Cluster-Level Google How we use Colossus to improve storage efficiency

Innovative Data Center Efficiency Techniques. Michael K Patterson, PhD PE Eco-technology Program Office Intel Corporation

Choosing the Right Cloud Computing Model for Data Center Management

Remove Two Or More Things And Replace It With One Wachovia Corporate & Investment Banking

Architecting for Data Center Agility. Ted Ritter Senior Research Analyst

What is it? AMD and Intel are taking similar, yet different approaches to putting graphics in with the CPU.

Best Practices for designing Farms and Clusters

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections )

Hot vs Cold Energy Efficient Data Centers. - SVLG Data Center Center Efficiency Summit

DATA CENTER COLOCATION BUILD VS. BUY

Computer System. Rui Wang, Assistant professor Dept. of Information and Communication Tongji University.

Leading in the compute era

Green Computing: Datacentres

ECE Enterprise Storage Architecture. Fall ~* CLOUD *~. Tyler Bletsch Duke University

Cisco APIC Enterprise Module Simplifies Network Operations

EMC FORUM NEW YORK 2010

Bienvenue / Welcome. Joaquim Neves - System Solutions 31MAY2018

CS / Cloud Computing. Recitation 3 September 9 th & 11 th, 2014

Realities and Risks of Software-Defined Everything (SDx) John P. Morency Research Vice President

DATACENTER AS A SERVICE. We unburden you at the level you desire

Distributed Pervasive Systems

How do you transform your business into a business service center?

The Impact of Hyper- converged Infrastructure on the IT Landscape

Intel Rack Scale Architecture Integration with Orchestration solutions

/ Cloud Computing. Recitation 3 Sep 13 & 15, 2016

Evolution of Rack Scale Architecture Storage

Buy vs Build: Converged Platforms are the New End Game. Johannes Sieben Dell EMC CPSD varchitect Hyper-Converged

SGI Origin 400. The Integrated Workgroup Blade System Optimized for SME Workflows

Journey to the Cloud Next Generation Infrastructure for the future workforce.

Performance Analysis in the Real World of Online Services

The Impact of Hyper- converged Infrastructure on the IT Landscape

Datacenters. Mendel Rosenblum. CS142 Lecture Notes - Datacenters

Towards Deploying Decommissioned Mobile Devices as Cheap Energy-Efficient Compute Nodes

July 30, 2013 MEMORANDUM. Council Members. Massoud Jourabchi. Overview of Data Centers

Seagate Point of View Cloud and Data Center Trend

Discover the all-flash storage company for the on-demand world

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

VxRack FLEX Technical Deep Dive: Building Hyper-converged Solutions at Rackscale. Kiewiet Kritzinger DELL EMC CPSD Snr varchitect

Data Center Trends: How the Customer Drives Industry Advances and Design Development

context: massive systems

Discover the Hidden Cost Savings of Cloud Computing

Cloud & Datacenter EGA

Systems and Technology Group. IBM Technology and Solutions Jan Janick IBM Vice President Modular Systems and Storage Development

Optimizing the Data Center with an End to End Solutions Approach

MSYS 4480 AC Systems Winter 2015

vrealize Business for Cloud User Guide

Storage Optimization with Oracle Database 11g

Infrastructure Innovation Opportunities Y Combinator 2013

High Performance Oracle Database in a Flash Sumeet Bansal, Principal Solutions Architect

Mikko Ohvo Business Development Manager Nokia

Internet-Scale Datacenter Economics: Costs & Opportunities High Performance Transaction Systems 2011

Tech Talk on HPC. Ken Claffey. VP, Cloud Systems. May 2016

Maximising Energy Efficiency and Validating Decisions with Romonet s Analytics Platform. About Global Switch. Global Switch Sydney East

Constru January 15, 2013

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Topics. CIT 470: Advanced Network and System Administration. Google DC in The Dalles. Google DC in The Dalles. Data Centers

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

FIVE REASONS YOU SHOULD RUN CONTAINERS ON BARE METAL, NOT VMS

Converged Platforms and Solutions. Business Update and Portfolio Overview

Modernize with all-flash

Technology challenges and trends over the next decade (A look through a 2030 crystal ball) Al Gara Intel Fellow & Chief HPC System Architect

Dynamic Infrastructure: Building a Smarter Planet. Laura Voglino Vice President, Marketing System & Technology Group Growth Market

Database Architecture 2 & Storage. Instructor: Matei Zaharia cs245.stanford.edu

Hyper-Convergence De-mystified. Francis O Haire Group Technology Director

Copyright 2009 Juniper Networks, Inc.

OpEx Drivers in the Enterprise

Technology Trends You Cannot Afford to Ignore

Cloud Computing Capacity Planning

Future of the Data Center

Transcription:

Building warehouse-scale computers or what s it like to supply exponential growth john wilkes 2018-10

Video: https://youtu.be/m7uig8qfgmi

You re all Some of you are thinking too small

Scale has been the single most important force driving changes in system software over the last decade Technical perspective: Is scale your enemy, or is scale your friend? John Ousterhout, CACM 54(7):110, July 2011.

The kind of things we like to be able to do

Here s the system! Image http://processors.wiki.ti.com/index.php/file:compulab_cm-t43.png The Dalles, Oregon

What if the system included not just the computers, but the network fabric and the WAN endpoints Council Bluffs, Iowa

and the cooling system

and the building-management system St Ghislain, Belgium

and the power system and the technicians Backup generator at St. Ghislain, Belgium

In fact the entire warehouse scale computer

By the way it costs O($200M). No pressure.

Key challenges Add capacity optimally The Fleet Keep capacity optimized

Planning for compute resources All have multiple generations/options CPU RAM disk, SSD capacity, performance new NVRAM thingies... accelerators (complication: a wide variety) inter- and intra-datacenter networking power datacenters land, water, sewage

Response times are weeks/months/years, not milliseconds http://googleasiapacific.blogspot.se/2015/06/growing-our-data-center-in-singapore.ht

Planning for network Google network: 12 years to build and still growing Edge points of presence (>100)

How long does it take to build capacity? Note: variance is as bad as delay

Planning for compute resources What kind of machine to buy? different groups want different things (e.g., search, Cloud) MotD + customizations Idea: reuse machines when they get handed down when is it worthwhile? (price of power? room for expansion?) Also: do we have space? power? networking? budget? models; what-if analyses; uncertainty (demand, supply) objective function: total cost-of-ownership

Planning for compute resources: Total Cost of Ownership (TCO) what do you mean by total? average? over what? continent? time? depreciation schedule? what do you mean by cost? initial purchase price, or the average over time? is delivery or installation included? what if they are being reused? what do you mean by ownership? machines can be transferred/given/sold/break who owns a machine running a shared service for a customer?

A simplified overview demand forecast time sites, data centers, power network, machines, storage usage $$ prices compute/storage capacity

In a world of exponential demand growth Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google s Datacenter Network. SIGCOMM 2015.

In a world of exponential demand growth Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google s Datacenter Network. SIGCOMM 2015.

In a world of exponential demand growth Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google s Datacenter Network. SIGCOMM 2015. Prediction is very difficult, especially about the future Mark Twain(?) via Neils Bohr

A few factors affecting forecasts user growth new features better software better hardware (e.g., accelerators) time

How much capacity do we need? and when do you need to know? PA forecasts more forecasting magic firm demand aggregate quantities detailed orders time

Putting it all together 1 second 1 minute 1 hour 1 day 1 week 1 month 1 year 2 years 5 years Forecasts resource demand Borg resource reclamation online workload prediction short-term PA demand forecasts mid-tiers long-term fleet forecasts

It takes a few moving parts...

Cloud Datastore transactions/s

Planning for power PUE (Power Usage Effectiveness) = total Watts / compute Watts smaller is better industry average ~1.7

https://www.google.com/about/datacenters/efficiency/internal/ Planning for power High PUE ML Control On ML Control Off Low PUE

Planning for power Some of the challenges Narrow range of experience Exploration may discover unsafe states Inputs out of our control (e.g., weather) Control system reliability/availability Agility (new hardware) Reinforcement learning with long delays between action and change in system state counts Safety. Safety. Safety. setpoint 1 Graphics by Steve Webster setpoint 2

We re set to reach 100% renewable energy in 2017

Meanwhile what s up with Moore s law? Single-core performance plateauing after decades of exponential growth Graph from 40 Years of Microprocessor Trend Data, Karl Rupp, CC-BY 4.0.

Meanwhile what s up with Moore s law? Tensor Processing Units

Meanwhile what s up with Moore s law? Tensor Processing Unit pods...

Putting it all together Operations Delivery + fulfillment Supply Usage Monitor + report Supply requests (orders) Demand forecasting Demand Planning

Putting it all together a few more details Operations Fulfillment Supply (current + future) Monitoring Usage data Customer demand system Monitor + report Supply planning Ordering Historic-demand system Demand forecast system High-level services + lower-level stuff Borg, Colossus + lower-level stuff Machines, racks,... Datacenters, power, parts

Putting it all together a few more details Operations Fulfillment Supply (current + future) Monitoring Usage data Customer demand system Monitor + report Supply planning Ordering Historic-demand system Demand forecast system Policies High-level services + lower-level stuff Borg, Colossus + lower-level stuff Machines, racks,... Datacenters, power, parts

Putting it all together Operations Delivery + fulfillment Supply Usage Monitor + report Supply requests (orders) Demand forecasting Demand Planning

Putting it all together one Google-wide supply 0-6 months machines Product PAs PAs Areas each PA does these Policies 43

Putting it all together multiple timelines 0-6 months machines Product PAs PAs Areas Policies supply forecast 12-18 months parts Policies supply forecast 24+ months power, data centers Policies

Image by Connie Zhou

2018 Q1 CapEx = $5.3B (+$2.4B for an office building in New York) source: Alphabet SEC filing $29.4B 3-year trailing CapEx, as of March 2017

Final thoughts: There s a lot of technology behind the cloud At scale, efficiency really matters [and: we re hiring!] johnwilkes@google.com Images by Connie Zhou