CSE 124: Networked Services Lecture-15

Fall 2010 CSE 124: Networked Services Lecture-15 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/18/2010 CSE 124 Networked Services Fall 2010 1

Updates Signup sheet for PlanetLab experiment Signup Deadline is today Project-2 idea finalization Sample Idea: SuperProxy A web proxy service that lets you download files at higher speeds Client->Proxy: Parallelizes connection over multiple sockets Proxy->Server: Caches data, or downloads Free service Ad insertion capability for revenue generation Ads must be inserted in a non-invasive manner Ads may be from third parties Users will visit the url (e.g., www.superproxy.com) and use it for all their Internet needs Presentation/Demo Deadline: Last Lecture class (December 2 nd, 2010) 11/18/2010 CSE 124 Networked Services Fall 2010 2

Giant-Scale Services From Lessons from Giant-Scale Services by Eric A Brewer, UC Berkeley and formerly with Inktomi Corporation 11/18/2010 CSE 124 Networked Services Fall 2010 3

Why Giant Scale Services? Access anywhere, any time Home, office, coffee shop, airport etc. Available via multiple devices Computers, smart-phones, set-top boxes etc Groupware support Centralization of services helps Calendars, e-vite, etc Lower overall cost End-user device utilization: 4% Infrastructure resources: 80% Fundamental cost advantage over stand-alone application Simplified service update Most powerful long term advantage No physical distribution of software or hardware is necessary 11/18/2010 CSE 124 Networked Services Fall 2010 4

Key assumptions Service provider has limited control over Clients Network (except the intranet) Queries drive the traffic Web or database queries Eg. http, ftp, or RPC Read-only queries greatly out-number data updates Read dominates write Product evaluations vs purchases Stock quotes vs trading 11/18/2010 CSE 124 Networked Services Fall 2010 5

Basic Model of Giant Scale Services Clients Web browsers, e-mail clients, XML programs The Best-effort IP network Access to the service The load manager Level indirection to balance the load To prevent faults Servers System workers Combines CPU, memory, and disks Persistence data storage Replicated or partitioned database Spread across servers disks May include network attached storage, DBMs, or RAIDs Services backplane Optional system-area-network Handles inter server traffic 11/18/2010 CSE 124 Networked Services Fall 2010 6

Clusters Clusters in Giant-Scale Servers Collections of commodity servers Main benefits Absolute scalability Many new services must serve a fraction of the world s population Cost and performance Compelling reason for clusters Bandwidth and operational costs dwarf hardware cost Independent components Help handling faults Incremental scalability Helps handle uncertainty and expense of growing the service Typically 3 years depreciation lifetime A unit rack space quadruples in computing 11/18/2010 power in every CSE 124 3 years Networked Services Fall 2010 7

Load Management Simple load management strategy DNS Round-robin IP address distribution DNS It does not hide down/inactive servers Short time-to-live is an option Many browsers mishandle expired DNS info Layer-4/Layer-7 switches Transport/Application layer switches Processes higher layer info at wirespeeds Helps fault tolerance High throughput: 20Gbps Can detect down nodes: by connection status 11/18/2010 CSE 124 Networked Services Fall 2010 8

Load Management (contd) Layer-4 switches Can understand tcp and port numbers to route Service specific layer-7 switches A user: Walmart Can track session information Smart client End-to-end approach Using alternative server info (DNS) 11/18/2010 CSE 124 Networked Services Fall 2010 9

Key Requirements of Giant Scale Services High Availability Much like other communication services Always available water, electricity, telephone etc To handle component failures Natural disasters Growth and evolution Design points (reduce failures) Symmetry Internal disks No people, wires, monitors Offsite clusters Contracts limit temperature and power variations 11/18/2010 CSE 124 Networked Services Fall 2010 10

Availability Metrics Availability metrics Uptime: Fraction of time the site is available (e.g..9999= 99.99%; 8.64 seconds/day) MTBF: Mean time between failures MTTF: Mean time to repair Two ways to improve uptime Increase MTBF or reduce MTTR Hard to improve and verify MTBF MTTR reduction is preferred Total time required to verify improvement is less for MTTR 11/18/2010 CSE 124 Networked Services Fall 2010 11

Availability Metrics Yield: As an availability metric; not as throughput metric Similar to uptime, but translates to user experience Not all seconds have equal value A second lost when there are no queries A second lost during peak hour is really an issue 11/18/2010 CSE 124 Networked Services Fall 2010 12

Availability Metrics Harvest: A query may be answered partially or fully Harvest determines how much info returned Can help in ensuring user satisfaction while handling faults Email inbox loads, but task list or contacts are not Ebay auction info loaded, but not user profile Key point: We can control how faults affect Yield, Harvest, or both Total Capacity remains the same Fault In a Replicated system Reduced yield In a Partitioned system Reduced harvest 11/18/2010 CSE 124 Networked Services Fall 2010 13

DQ Principle Data /query x Queries/second -> Constant A useful metric for giant-scale systems Represents a physical capacity bottleneck Max I/O bandwidth or total disk seeks/second At high utilization a giant-scale system approaches the constant Includes all overhead Data copying, presentation layers, and network-bound issues Each node has a different DQ value Easy to measure the relative impact of faults on DQ values Because different systems have different DQ values 11/18/2010 CSE 124 Networked Services Fall 2010 14

DQ Value (contd..) Overall DQ value linearly scales with number of nodes Helps in sampling of the behavior of the entire system Small test cluster can predict the behavior of the entire system Inktome: 4 node clusters are used to predict the impact of software updates on 100 node clusters DQ impact must be evaluated before any proposed HW/SW changes Linear reduction of DQ is linear with faults 11/18/2010 CSE 124 Networked Services Fall 2010 15

DQ Value (contd..) Future demand DQ addition required is to be estimated Fault impact and DQ Degrade DQ linearly with number of faults (failed nodes) DQ may be handled differently Data intensive services DQ applies mostly to this category Data base access Majority of top 100 sites are data intensive Computation intensive services Simulation, super computing, or Communication intensive services Chat, news, or VoIP How DQ impacts the Uptime Yield Harvest 11/18/2010 CSE 124 Networked Services Fall 2010 16

Replication Vs Partitioning Replication Traditional method for improving availability How DQ affects Replication E.g, two node cluster, one fault Harvest: 100% Yield: 50% Maintains D and reduces Q Partition How DQ affects Partition E.g, two node cluster, one fault Harvest: 50% Yield: 100% Reduces D and maintains Q DQ drops to 50% in both cases 11/18/2010 CSE 124 Networked Services Fall 2010 17

Load Redirection and Replication Traditional replication provisions excess capacity Load redirection is required on faults Replicas handle load handled by the failed nodes Hard to achieve under high utilization k out of n failures will demand a redirection of k/(n-k) load over to the remaining n-k nodes Loss of 2 out of 5 nodes implies a redirected load of 2/3 and an overload of 5/3 (166%) 11/18/2010 CSE 124 Networked Services Fall 2010 18

Replication and DQ Replication of disks is cheap Storage is cheap, but not processing But to access the data, DQ points is required Partitioning has no real savings over replication (in terms of DQ points) The same DQ points are needed In some rare cases, replication can demand more DQ points 11/18/2010 CSE 124 Networked Services Fall 2010 19

Replication and DQ (contd..) Replication and partition can be used to provide better control over availability Partition the data first to suitable size Replicate based on the importance of data Easy to grow the system via replication than partition Replication can be based on data s importance Which data is lost in the event of a fault Replication of key data At the cost of some extra disks A fault can still result in 1/n data loss, but of lesser importance Replication can be made random the lost harvest a random subset of data avoids hotspots in the partitions Search (Inktome): Partial replication Email system: Full replication User content: Full replication Clustered Web: No replication 11/18/2010 CSE 124 Networked Services Fall 2010 20

Graceful Degradation Degradation under faults must be trouble free A Graceful degradation is affected by High peak-to-average ratio 1.6:1 to 6:1 and even 10:1 Single event bursts (Flash crowd) Movie ticket sales, Football matches, breaking sensational news Natural disasters and power failures DQ can drop very high Can happen independently 11/18/2010 CSE 124 Networked Services Fall 2010 21

Graceful degradation under faults DQ principle gives new opportunities Either maintain D and limit Q or reduce D and maintain Q Admission Control (AC) Maintain D, reduces Q Maintains harvest Dynamic database reduction (cut the data size by half) Reduces D, maintains Q Maintains Yield Graceful degradation can be achieved at various degrees combination of the above two Key question: How saturation should affect: uptime, yield, harvest, and Quality of Service 11/18/2010 CSE 124 Networked Services Fall 2010 22

Access control strategies Cost based AC Perform AC based on estimated query cost (in DQs) Reduces the average D per query Denying one expensive query can retain many inexpensive queries Net gain in query and harvest Another method is probabilistic AC Helps retrying queries will lead to success Reduced yield, increased harvest Priority or value based AC Datek handles stock queries differently from other queries Queries will be executed within 60 seconds or they charge no commission Drop low valued queries and thus DQ points Reduced yield, increased harvest Reduced data freshness When saturated, a financial site can make stock quotes expire less frequently Not only reduces freshness, but also DQ requirement Increased yield, reduced harvest 11/18/2010 CSE 124 Networked Services Fall 2010 23

Disaster tolerance Disaster Complete loss of one or more replicas Natural disasters can affect all the replicas in a geographical location Fire or other disasters affect only one replica Disaster tolerance deals with managing replica groups and graceful degradation for handling disaster Key questions: How many locations and how many replicas 2 replicas in 3 locations: 2/6 loss in a natural disaster Each remaining locations must handle 50% (6/4=1.5) more traffic Inktome Current approach Reduce D by 50% at remaining locations Best approach Reduce D by 2/3 and thereby increases Q by 3/2 11/18/2010 CSE 124 Networked Services Fall 2010 24

Disaster Tolerance (contd..) Load management is another issue in Disaster Tolerance When clusters fail, Layer-4 switches do not help DNS Long failover response time (several hours) Smart clients Are more suitable to quick failovers (seconds to minutes) 11/18/2010 CSE 124 Networked Services Fall 2010 25

Evolution and Growth Giant scale services need to be frequently updated Product revisions, software bug fixes, security updates, or addition of new services Hard to detect problems slow memory leaks non-deterministic bugs Continued growth plan is essential Online evolution process Evolution with minimal downtime Giant scale services are frequently updated Acceptable quality software Target MTBF Minimal MTTR No cascading failures 11/18/2010 CSE 124 Networked Services Fall 2010 26

Online evolution Process Each online evolution phase a certain amount of DQ points Total DQ loss for n nodes n-number of nodes, u- time required per node Total DQ loss = DQ x upgrade time per node. Upgrades Software upgrades Quick; new and old systems can co-exist Can be done by controlled reboot during MTTR Hardware upgrades are harder 11/18/2010 CSE 124 Networked Services Fall 2010 27

Upgrade approaches Fast reboot Quickly reboot with the upgrades Downtime cannot be avoided Effect on yield can be contained by scheduling the reboot at off-peak hours Staging area and automation are essential Upgrades happen simultaneously 11/18/2010 CSE 124 Networked Services Fall 2010 28

Upgrade approaches Rolling upgrades Upgrades nodes one at a time in a wave manner One node is down at a time Old and new systems may co-exist Compatibility between old and new systems is a must Partitioned system Harvest will be affected, yield is unaffected Replicated system Harvest and yield are unaffected Upgrade happens on replica at a time Still conducted at off-peak hours to avoid affecting the yield due to faults 11/18/2010 CSE 124 Networked Services Fall 2010 29

Upgrade approaches Big flip Most complicated among the three Upgrade the cluster one half at a time Switch off all traffic to it, take down a half, upgrade it Turn the upgraded part on, direct new traffic to the upgraded part, wait for old traffic in the to-be-upgraded part to complete The upgraded half runs while the old half is taken down One version (half) runs at a time 50% DQ loss Replicas: 50% loss of D (yield) Partitions: 50% loss of Q (harvest) Big flip is powerful Hardware, OS, schema, networking, physical relocation can all be done Inktome did it twice 11/18/2010 CSE 124 Networked Services Fall 2010 30

Basics of Giant scale services: Summary Get the basics right Use symmetry to simplify the analysis and management Decide on availability metrics Yield and Harvest are more important than uptime Focus on MTTR at least as much as MTBF Repair time is easier to affect (and to be controlled) for an evolving system Understand load redirection during faults Replication is insufficient, higher DQ demand is to be considered Graceful degradation Intelligent admission control and dynamic database reduction can help Use DQ analysis of all upgrades Evaluate all upgrade options and DQ demand in advance and do capacity planning Automatic upgrades as much as possible Develop automatic upgrade options such as rolling upgrades, ensure simple way to revert to old version 11/18/2010 CSE 124 Networked Services Fall 2010 31