Why Scale Matters & How the Cloud Really is Different James Hamilton, AWS VP & Distinguished Engineer SPOT205: November 23, 2013 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda Redefining Scale at AWS AWS Designed Hardware & Infrastructure Multi-AZ Design Point & Why it Works
Perspective on Scaling On average, AWS adds enough new server capacity every day to support Amazon s global infrastructure when it was a $7B business (2004).
AWS Global Infrastructure 9 regions 25 availability zones 42 edge locations
Amazon S3 Growth Total Number of S3 Objects >3 Trillion Peak requests: Peak Requests: 2,000,000+ 1.5M/sec per second >1.7 Trillion 762 Billion 2.9 Billion 14 Billion 40 Billion 102 Billion 262 Billion Q4 2006 Q4 2007 Q4 2008 Q4 2009 Q4 2010 Q4 2011 Q4 2012 Q4 2013
DynamoDB Requests Served/Month
DynamoDB: Consistent Performance at Scale
AWS is the overwhelming market share leader, with more than five times the compute capacity in use than the aggregate total of the other fourteen providers.
Agenda Redefining Scale at AWS AWS Designed Hardware & Infrastructure Multi-AZ Design Point & Why it Works
Pace of Innovation Infrastructure pace of innovation increasing Driven by cloud service providers and high-scale internet applications Cost of datacenter and H/W infrastructure dominates Infrastructure more than just a cost center High focus on innovation Driving down cost Increasing aggregate reliability Reducing resource consumption footprint
AWS Custom Server Designs OEM Server Ecosystem Optimized for 10s to 100s of thousands of customers Broadly applicable servers can run a variety of workloads Cloud Server Ecosystem Optimized for single customer Highly specialized servers optimized for specific workload Large scale deployments allow hardware specialization Move hot s/w kernels to hardware implementations Datacenters, servers, networking, storage to designed to integrated spec.
AWS Custom Storage Designs Commercial high-density storage: Quanta M4600H 4U Disk Enclosure Impressive best in class general purpose design We use custom design with still higher density OEM storage & servers must target vast workload diversity High scale supports AWS-specific optimizations More space, power, & cost efficient
Networking Equipment Relative cost of networking increasing quickly Profit margins high Ecosystem vertically integrated Monthly Costs 8% 3 year server & 10 year infrastructure amortization
Get the Network Out of the Way Mainframe Model Goes Commodity Amazon custom routers & protocol stacks Current Networks Over-Subscribed Forces workload placement restrictions Goal: Make all points in datacenter equidistant
Power Infrastructure Negotiated power purchasing agreements AWS custom high-voltage sub-stations in some regions Lower power cost Build faster
Super Bowl Power Outage 34 minute outage that very nearly changed the 2013 game A piece of equipment that was designed to monitor electrical load sensed an abnormality in the system. The equipment operated as designed and opened a breaker that partially cut power to the Superdome in order to isolate the issue. Backup generators kicked in immediately as designed. Lights without immediate backup power Restarting gas discharge lights takes 15+ min Highly likely backup power wouldn t have helped Switchgear lockout We design & deploy custom switch firmware
Carbon Neutral Power Choice Most companies rarely build new datacenters so there are few new power procurement options The entire multi-datacenter US-WEST (Oregon) is 100% carbon neutral One of the largest AWS regions world-wide And, by far, the fastest growing
Procurement & Supply Chain Optimization Procurement Global demand allows purchasing power at volume Direct component purchasing Precise inventory control Better pricing Optimized designs Supply Chain Demand-driven supply chain Shorter cycle time drives higher utilization Predicting next week easier than 4 to 6 months out Less overbuy & less capacity risk yielding lower costs
Utilization & Economics Server Utilization Problem On premise 30% utilization VERY good &10% to 20% more common Solution: Pool number of heterogeneous services Pay as You Go Pay as You Grow Don t block the business Don t over-buy Transfers capital expense to variable expense Apply capital for business investments rather than infrastructure Chargeback Models Drive Good Behavior Cost encourages prioritization of work by application developers High scale needed to make a spot market for low priority work
Amazon Cycle of Innovation Re-invest in Features Innovate 15+ years of operational excellence Lower Reduce Prices Listen to Customers 38 AWS price reductions since 2006 Improve Processes Lower Costs
AWS Pace of Innovation New Service Announcements & Updates 235
Agenda Redefining Scale at AWS AWS Designed Hardware & Infrastructure Multi-AZ Design Point & Why it Works
Conventional Design: Cross-Region Replication 5 th app availability 9 only via multi-datacenter replication Conventional approach: Two datacenters in distant locations 99.999% Replicate all data to both datacenters The industry-wide dominant multi-dc availability approach Looks rock solid but performs remarkably poorly in practice Acid Test: Are you willing to pull the plug on the primary server?
What is wrong with inter-regional replication? Asynchronous replication between datacenters Committing to an SSD order 1 to 2 msec LA to New York 74 msec round trip On failure, a difficult & high skill decision: Fail-over & lose transactions, or Don t fail-over & lose availability I ve been on these calls in the past No win situation Very hard to get right
What Else is Wrong with X-Country Replication? Fragile: Active/Passive Doesn t Work Failover to a system that hasn t been taking operational load Passive secondary not recently tested Secondary config or S/W version different, incorrect load balancer config, incorrect network ACLs, latent hardware problem, router problem, resource shortage under load Can t test without negative customer impact If you don t test it, it won t work 2-Way Redundancy Expensive: More than ½ capacity reserved to handle failure 3 datacenters much less expensive but impractical w/o high scale
AWS Multi-Availability Zone Model Choose Region to be close to user, close to data, or meeting jurisdictional requirements Synchronous replication to 2 (or better 3) Availability Zones Easy when less than 2 to 3 msec away Can failover w/o customer impact ELB over EC2 instances in different AZs Stateless EC2 apps easy For persistent state use DynamoDB Simple Storage Service Mutli-AZ RDS
New Research: Customers Improve Availability by Migrating Apps to AWS 2013 AWS Customer Survey 32% reduction in total application downtime Research Note: Benchmarking availability and reliability in the cloud: Amazon Web Services Nucleus Research, November 2013, Document N168
Is Hosting On-premises Less Expensive? Utilization fundamentally higher in cloud Aggregating non-correlated workloads, scale, spot market Amazon specific H/W designs ODM acquisition of custom servers & net gear Direct purchasing of disk, memory, & CPU AWS controlled hypervisor & net protocol layers Deep R&D: Many new data centers built each year Immense scale Volume purchasing, highly automated, specialists in all areas Amazon margins are tiny compared to enterprise margins
Summary AWS Economics driven by scale & singular focus Economies of scale Increased availability through multiple-datacenter deployment Steadily declining price Mega-scale advantages available to all customers regardless of size Datacenter presence near all customers world-wide Multiple datacenters in each region for high availability Deeper R&D investment & operational focus in datacenter, server, storage, & networking than any IT organization in the world Buying power that rivals the biggest in the world Cloud Model Fundamentally different from the last 30 years Even if rebranded as cloud enabled, private cloud, cloud-like
Please give us your feedback on this presentation SPOT205 As a thank you, we will select prize winners daily for completed surveys!