Hardware design for Cloud-scale datacenters USENIX LISA14 1
Public Cloud Disaster Recovery / Business Continuity 2
5.8+ billion worldwide queries each month 250+ million active users 400+ million active accounts 2.4+ million emails per day 8.6+ trillion objects in Microsoft Azure storage 48+ million users in 41 markets 50+ million active users 1 in 4 enterprise customers 50+ billion minutes of connections handled each month 200+ Cloud Services 1+ billion customers 20+ million businesses 90+ markets worldwide 3
Design <10K SMB/Enterprise 100K Hosters 1M Cloud-Scale # SKUs Several Limited Extremely limited Redundancy model Hardware based (Hot-*) Software based (Local datacenter) Software based (Geo-distributed) HW availability 99.999% or higher 99.9% - 99.999% 99% - 99.9% HW type Enterprise SKU Off-the-shelf design, custom integration Custom designs, custom integration Infrastructure co-design None Limited integration with Datacenter and Network OS, Datacenter, Server and Network tightly integrated 4
Operations <10K SMB/Enterprise 100K Hosters 1M Cloud-Scale Break/fix support 24 hours x 7 days 8 hours x 5 days Up to 1-2 weeks Issue triage model OOB HW management Management domain scale FRU granularity IT admin Full command set, BMC required Some automation, Admin support Basic feature set, BMC required 100 s of servers 1000 s of servers Hot-swappable components Component replacement Fully automated, Machine learning Power On/Off only, No BMC 10 s of 1000 s of servers Entire server replacement 5
Partition Layer Partition Layer Stream Layer Intra-Stamp Replication Stream Layer Intra-Stamp Replication Storage Stamp Storage Stamp 6
Commit operation Write Erasure Coding operations 7
Query distribution Index unit 1 Index unit 2 Index unit Index unit n Partition 11 Partition 21 Partition n1 Partition 12 Partition 22 Partition n2 Partition 1m Partition 2m Partition nm Source: Web search using mobile cores, ISCA 2010 Query performance is measured as an aggregate of ALL compute nodes 8
9
Performance Customization Uniformity Power Agility Cost Reliability Simplicity 10
Architecture should be adapt to variety of cloud workloads Support for global datacenter operating environments CISPR, ANSI, IEC), UL, IEC, CSA) 11
Design Principles Standardization & Modularization Design Simplicity Operations Excellence 12
Open CloudServer (OCS) design Open Source Code Chassis management Operations Toolkit Specifications Chassis, Blade, Mezzanines Management APIs Certification Requirements Mechanical CAD Models Chassis, Blade, Mezzanines Board Files & Gerbers Power Distribution Backplane Tray Backplane http://www.opencompute.org/wiki/server/specsanddesigns 13
12U Shared Chassis EIA Rack Mountable Shared infrastructure for efficiency and TCO optimization Shared management Shared power Signal backplane Compute blade Shared fans JBOD expansion 14
Blind-mated signal connectivity Simplified installation and repair Cable free design for significantly fewer operator errors during servicing Reduces need for cabling reseats Signal Backplane Blind-mated connectors (12V Power, Ethernet, SAS, Management) Network Repairs 1 75% 25% H/W Replaced Reseated 15
HDDs are #1 failure item AFR increases with temperature 1 Simplified fan control cools HDDs HDDs in front of hot motherboard Closed loop fan moderates temperatures 1 DSN 2011: Impact of Temperature on Hard Disk Drive Reliability in Large Datacenters 16
Blade Address 1GbE Chassis Manager (CM) 1GbE Secure OOB management Low-cost embedded x86 SoC COM1 COM2 COM5 COM3 X86 SoC COM4 REST API for machine management CLI interface for human operations Hard-wired management On/Off to blade power cut-off circuit IPMI-over-serial out of band communication Fan and PSU control and monitoring Remote switch and CM power control 6 PSU RS232 Serial to/from blades Remote Power Control 6 6 6 Fans Fans Fans COM6 PDB ON/OFF TX/ RX CPLD Serial Multiplexer (x2) Serial to I2C (x2) Blade Enable ON/OFF PMBUS PWM GPIO I2C Mux Fan Control 17
Security at all layers Hardware, UEFI, APIs, User Management Trusted Platform Module v1.2 Blades and Chassis manager UEFI Firmware v2.3.2 Secure BIOS and Boot Chassis manager interfaces TLS (SSL) and IPsec for communication encryption User Management Active Directory integration and authentication Role Based Management TLS/SSL UEFI 2.3.2 TPM IPsec Active Directory Integration 18
BMC-Lite IPMI basic mode over Serial I2C Master (SDR) UART I/O System Event Log Power Control KVM, Video drivers Ethernet, Network Stack or SOL USB Full IPMI Command Set 19
Targeted for deployment and production support Features http://github.com/msopentech/ocsoperationstoolkit 20
Identify defective components by physical location Summarize data for quick repairs 21
View configuration command - View-WcsConfig 22
View-Disk, View-Dimm, View-Nic, View-Fru, etc 23
Check, clear, and log the Windows System Event Log and BMC SEL View contents of BMC SEL 24
Commands Example: Update-WcsConfig Command 25
26
Q & A 27