Datacenters. Mendel Rosenblum. CS142 Lecture Notes - Datacenters

Similar documents
Data Centers. Tom Anderson

Data Centers and Cloud Computing. Data Centers

Data Centers and Cloud Computing

Data Centers and Cloud Computing. Slides courtesy of Tim Wood

Availability, Reliability, and Fault Tolerance

Worse Is Better. Vijay Gill/Bikash Koley/Paul Schultz for Google Network Architecture Google. June 14, 2010 NANOG 49, San Francisco

COMP Parallel Computing. Lecture 22 November 29, Datacenters and Large Scale Data Processing

image credit Fabien Hermenier Cloud compting 101

image credit Fabien Hermenier Cloud compting 101

Agenda. 1 Web search. 2 Web search engines. 3 Web robots, crawler, focused crawling. 4 Web search vs Browsing. 5 Privacy, Filter bubble.

Warehouse-Scale Computers to Exploit Request-Level and Data-Level Parallelism

Lecture 20: WSC, Datacenters. Topics: warehouse-scale computing and datacenters (Sections )

Large-Scale Web Applications

Dude Solutions Business Continuity Overview

Don t Run out of Power: Use Smart Grid and Cloud Technology

ΕΠΛ372 Παράλληλη Επεξεργάσια

Agenda. 1 Web search. 2 Web search engines. 3 Web robots, crawler. 4 Focused Web crawling. 5 Web search vs Browsing. 6 Privacy, Filter bubble

Take Back Lost Revenue by Activating Virtuozzo Storage Today

Recent Trends. CS 537 Section 11 Large Scale Systems. Effects of Google s HW philosophy. Google Design Philosophy 5/5/09

image credit Fabien Hermenier Cloud compting 101

Lecture 8: Networks to Internetworks

Data Center Fundamentals: The Datacenter as a Computer

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Lecture 17 Datacenters and Cloud Compu5ng

Concepts Introduced in Chapter 6. Warehouse-Scale Computers. Programming Models for WSCs. Important Design Factors for WSCs

EN2910A: Advanced Computer Architecture Topic 06: Supercomputers & Data Centers Prof. Sherief Reda School of Engineering Brown University

Network Service Description

Topics. CIT 470: Advanced Network and System Administration. Google DC in The Dalles. Google DC in The Dalles. Data Centers

S I T E I N F R A S T R U C T U R E C U R R E N T P R O B L E M S A N D P R O P O S E D S O L U T I O N S

RELIABILITY & AVAILABILITY IN THE CLOUD

Los Anseles W Department of 5* Water & Power

Green Computing: Datacentres

CSE 124: THE DATACENTER AS A COMPUTER. George Porter November 20 and 22, 2017

THE ZADARA CLOUD. An overview of the Zadara Storage Cloud and VPSA Storage Array technology WHITE PAPER

Lecture 10: Internetworking"

Green Computing: Datacentres

High Availability WAN

Advanced Computer Networks Exercise Session 7. Qin Yin Spring Semester 2013

Data Center Design and. Management. Steve Knipple. August 1, CTO / VP Engineering & Operations

Google s Green Data Centers: Network POP Case Study

MSYS 4480 AC Systems Winter 2015

Internet-Scale Datacenter Economics: Costs & Opportunities High Performance Transaction Systems 2011

Leveraging Heterogeneity to Reduce the Cost of Data Center Upgrades

Clustering Documents. Document Retrieval. Case Study 2: Document Retrieval

Remodel. New server deployment time is reduced from weeks to minutes

40,000 TRANSFORM INFRASTRUCTURE AT THE EDGE. Introduction. Exploring the edge. The digital universe is doubling every two years

Data Center Trends: How the Customer Drives Industry Advances and Design Development

Infrastructure Security Overview

Clustering Documents. Case Study 2: Document Retrieval

Case study on Green Data Center. Reliance Jio Infocomm Limited

TAIL LATENCY AND PERFORMANCE AT SCALE

POWERING A CONNECTED ASIA. Pacnet Hong Kong DataSpace1 Technical Specifications. Advanced Data Center Facility for Multi-Site Enterprises

STATE OF THE ART DATA CENTRES

A Scalable, Commodity Data Center Network Architecture

Intelligent Placement of Datacenters for Internet Services. Íñigo Goiri, Kien Le, Jordi Guitart, Jordi Torres, and Ricardo Bianchini

FOUR WAYS TO LOWER THE COST OF REPLICATION

Datacentres & Colocation

Real-Time Insights from the Source

Low-Latency Datacenters. John Ousterhout Platform Lab Retreat May 29, 2015

Green Data Centers A Guideline

A Green Approach. Thermal

Rack-Level I/O Consolidation with Cisco Nexus 5000 Series Switches

The Efficient Enterprise. All content in this presentation is protected 2008 American Power Conversion Corporation

RFP Annex A Terms of Reference UNHCR HQ Data Centre Colocation Service

Cloud Computing Economies of Scale

Dell Fabric Manager Deployment Guide 1.0.0

Goals. Facebook s Scaling Problem. Scaling Strategy. Facebook Three Layer Architecture. Workload. Memcache as a Service.

Our Bethel, Connecticut facility is on-line and functioning normally. We experienced no infrastructure damage due to the storm.

Introduction. Service and Support

COLOCATION 1/4 RACK RACK COLD AISLE CONTAINEMENT PRIVATE CAGE PRIVATE SUITE FLEXIBLE PRICING OPTIONS

Smart Data Centres. Robert M Pe, Data Centre Consultant HP Services SEA

Energy efficiency within data centers Is your IT organization capitalizing on the latest technology options and best practices?

A Computer Scientist Looks at the Energy Problem

DATA CENTER AIR HANDLING COOLING SOLUTIONS

Recapture Capacity for Existing. and Airflow Optimization

Data Center Network Topologies II

Modern RAID Technology. RAID Primer A Configuration Guide

Arista Cranks Leaf Switches To 100G For Big Data, Storage

Supporting the Evolution of IT with Hybrid Colocation

I/O CANNOT BE IGNORED

Information Retrieval and Web Search Engines

CS5112: Algorithms and Data Structures for Applications

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Zurich West Data Center

Introduction. Figure 1 - AFS Applicability in CORD Use Cases

CSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE. George Porter November 27, 2017

Technology Changes in Data Centers

Warehouse-Scale Computing

The Impact of Optics on HPC System Interconnects

Data center Networking: New advances and Challenges (Ethernet) Anupam Jagdish Chomal Principal Software Engineer DellEMC Isilon

HVDC Innovation, XECHNO Power & Fresh HVDC

Oracle Exadata: Strategy and Roadmap

CS 61C: Great Ideas in Computer Architecture (Machine Structures) Warehouse-Scale Computing

Why would you want to use outdoor air for data center cooling?

KAO DATA CAMPUS DATA CENTRES AT THE HOME OF INNOVATION

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

Top 8 Considerations for Choosing the Right Rack Power Distribution Units to Fit Your Needs

Campus Network Design

WIRELESS ENVIRONMENTAL MONITORS

PrepKing. PrepKing

WHITE PAPER. Solutions OnDemand Hosting Overview

Transcription:

Datacenters Mendel Rosenblum

Evolution of datacenters 1960's, 1970's: a few very large time-shared computers 1980's, 1990's: heterogeneous collection of lots of smaller machines. Today and into the future: Datacenter contains large numbers of nearly identical machines Individual applications use thousands of machines simultaneously Companies consider datacenter technology a trade-secret No public discussion of the state of the art from industry leaders

Typical specs for a datacenter today 15-40 megawatts power (Limiting factor) 50,000-200,000 servers $1B construction cost Onsite staff (security, administration): 15

Rack Typically is 19 or 23 inches wide Typically 42 U U is a Rack Unit - 1.75 inches Slots:

Rock Slots Slots hold power distribution, servers, storage, networking equipment Typical server: 2U 8-128 cores DRAM: 32-512 GB Typical storage: 2U 30 drives Typical Network: 1U 72 10GB

Row/Cluster 30+ racks

Networking - Switch locations Top-of-rack switch Effectively a cross-bar connecting machines in rack Multiple links going to end-of-row routers End-of-row router Aggregate row of machines Multiple links going to core routers Core router Multiple core routers

Multipath routing

Ideal: "full bisection bandwidth" Would like network like cross-bar Everyone has a private channel to everyone else In practice today: some oversubscription (can be as high as 100x) Assumes applications have locality to rack or row) but this is hard to achieve in practice. Some problem fundamental: Two machines transferring to the same machine Consider where to place: Web Servers Memcache server Database servers - Near storage slots Current approach: Spread things out

Power Usage Effectiveness (PUE) Early datacenters built with off-the-shelf components Standard servers HVAC unit designs from malls Inefficient: Early datacenters had PUE of 1.7-2.0 PUE ratio = Total Facility Power Server/Network Power Best-published number (Facebook): 1.07 (no air-conditioning!) Power is about 25% of monthly operating cost

Energy Efficient Data Centers Better power distribution - Fewer transformers Better cooling - use environment (air/water) rather than air conditioning Bring in outside air Evaporate some water Hot/Cold Aisles: IT Equipment range OK to +115 Need containment

Backup Power Massive amount of batteries to tolerate short glitches in power Just need long enough for backup generators to startup Massive collections of backup generators Huge fuel tanks to provide fuel for the generators Fuel replenishment transportation network (e.g. fuel trucks)

Fault Tolerance At the scale of new datacenters, things are breaking constantly Every aspect of the datacenter must be able to tolerate failures Solution: Redundancy Multiple independent copies of all data Multiple independent network connections Multiple copies of every services

Failures in first year for a new datacenter (Jeff Dean) ~thousands of hard drive failures ~1000 individual machine failures ~dozens of minor 30-second blips for DNS ~3 router failures (have to immediately pull traffic for an hour) ~12 router reloads (takes out DNS and external VIPs for a couple minutes) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~5 racks go wonky (40-80 machines see 50% packet loss) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours) ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back) ~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover)

Choose datacenter location drivers Plentiful, inexpensive electricity Examples - Oregon: Hydroelectric; Iowa: Wind Good network connections Access to the Internet backbone Inexpensive land Geographically near users Speed of light latency Country laws (e.g. Our citizen's data must be kept in our county.) Available labor pool

Google Data Centers

Google Data Center - Council Bluffs, Iowa, USA

Google datacenter pictures: Council Bluffs