CSE 124: Networked Services Lecture-15

Similar documents
Lecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011

Internet Services and Search Engines. Amin Vahdat CSE 123b May 2, 2006

Data Center Performance

Clusters. Or: How to replace Big Iron with PCs. Robert Grimm New York University

CSE 124: TAIL LATENCY AND PERFORMANCE AT SCALE. George Porter November 27, 2017

TAIL LATENCY AND PERFORMANCE AT SCALE

CSE 124: QUANTIFYING PERFORMANCE AT SCALE AND COURSE REVIEW. George Porter December 6, 2017

CSE 124: Networked Services Lecture-16

11/13/2018 CACHING, CONTENT-DISTRIBUTION NETWORKS, AND OVERLAY NETWORKS ATTRIBUTION

Cluster-Based Scalable Network Services

CSE 124: Networked Services Lecture-17

Business Continuity and Disaster Recovery. Ed Crowley Ch 12

Lecture 9: MIMD Architectures

CSE 124: Networked Services Fall 2009 Lecture-19

DNS and Modern Network Services. Amin Vahdat CSE 123b April 27, 2006

Large-Scale Web Applications

Lecture 4: Introduction to Computer Network Design

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

RAID SEMINAR REPORT /09/2004 Asha.P.M NO: 612 S7 ECE

02 - Distributed Systems

6.033 Lecture Fault Tolerant Computing 3/31/2014

Engineering Goals. Scalability Availability. Transactional behavior Security EAI... CS530 S05

Lecture 9: MIMD Architectures

02 - Distributed Systems

Distributed Systems. 05r. Case study: Google Cluster Architecture. Paul Krzyzanowski. Rutgers University. Fall 2016

Maximize the Speed and Scalability of Your MuleSoft ESB with Solace

Storage Optimization with Oracle Database 11g

Storage. Hwansoo Han

Google File System (GFS) and Hadoop Distributed File System (HDFS)

Database Architectures

ApsaraDB for Redis. Product Introduction

Oracle Exadata: Strategy and Roadmap

Increasing Performance of Existing Oracle RAC up to 10X

HDFS Architecture. Gregory Kesden, CSE-291 (Storage Systems) Fall 2017

Database Architectures

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Scalability of web applications

What do we want out of a Network?

CEC 450 Real-Time Systems

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

GIS - Clustering Architectures. Raj Kumar Integration Management 9/25/2008

ScaleArc for SQL Server

Send me up to 5 good questions in your opinion, I ll use top ones Via direct message at slack. Can be a group effort. Try to add some explanation.

Best Practices for Scaling Websites Lessons from ebay

Next Generation Erasure Coding Techniques Wesley Leggette Cleversafe

CS5460: Operating Systems Lecture 20: File System Reliability

Issues in Distributed Architecture

Downtime Prevention Buyer s Guide. 6 QUESTIONS to help you choose the right availability protection for your applications

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Lecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown

Parallel Databases C H A P T E R18. Practice Exercises

GFS: The Google File System. Dr. Yingwu Zhu

HP ProLiant BladeSystem Gen9 vs Gen8 and G7 Server Blades on Data Warehouse Workloads

MULE ESB High Availability (HA) CLUSTERING

Next Steps Spring 2011 Lecture #18. Multi-hop Networks. Network Reliability. Have: digital point-to-point. Want: many interconnected points

CS 111. Operating Systems Peter Reiher

The Google File System (GFS)

Ensuring the Success of E-Business Sites. January 2000


PRO, PRO+, and SERVER

CS 347 Parallel and Distributed Data Processing

CS 347 Parallel and Distributed Data Processing

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

More on Testing and Large Scale Web Apps

The Google File System

SEDA: An Architecture for Well-Conditioned, Scalable Internet Services

Assignment 5. Georgia Koloniari

ECE 486/586. Computer Architecture. Lecture # 2

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

Lecture 9: MIMD Architecture

Oracle Rdb Hot Standby Performance Test Results

Performance of relational database management

Configuring Network Load Balancing

Performance Evaluation of Virtualization Technologies

The UnAppliance provides Higher Performance, Lower Cost File Serving

Today: Coda, xfs. Case Study: Coda File System. Brief overview of other file systems. xfs Log structured file systems HDFS Object Storage Systems

Performance Innovations with Oracle Database In-Memory

Memory-Based Cloud Architectures

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Lecture 23 Database System Architectures

Client Server & Distributed System. A Basic Introduction

Chapter 20: Database System Architectures

Distributed Systems. Lecture 4 Othon Michail COMP 212 1/27

Chapter 6. Storage and Other I/O Topics

The Microsoft Large Mailbox Vision

Data Centers. Tom Anderson

CSE 123b Communications Software

CSE 451: Operating Systems Winter Redundant Arrays of Inexpensive Disks (RAID) and OS structure. Gary Kimura

Hadoop File System S L I D E S M O D I F I E D F R O M P R E S E N T A T I O N B Y B. R A M A M U R T H Y 11/15/2017

WHITE PAPER Software-Defined Storage IzumoFS with Cisco UCS and Cisco UCS Director Solutions

Introduction to Distributed * Systems

Today s class. CSE 123b Communications Software. Telnet. Network File System (NFS) Quick descriptions of some other sample applications

Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE

The Path to Lower-Cost, Scalable, Highly Available Windows File Serving

INFRASTRUCTURE BEST PRACTICES FOR PERFORMANCE

Architekturen für die Cloud

BUILDING A SCALABLE MOBILE GAME BACKEND IN ELIXIR. Petri Kero CTO / Ministry of Games

Modern Database Concepts

SYSTEM UPGRADE, INC Making Good Computers Better. System Upgrade Teaches RAID

Transcription:

Fall 2010 CSE 124: Networked Services Lecture-15 Instructor: B. S. Manoj, Ph.D http://cseweb.ucsd.edu/classes/fa10/cse124 11/18/2010 CSE 124 Networked Services Fall 2010 1

Updates Signup sheet for PlanetLab experiment Signup Deadline is today Project-2 idea finalization Sample Idea: SuperProxy A web proxy service that lets you download files at higher speeds Client->Proxy: Parallelizes connection over multiple sockets Proxy->Server: Caches data, or downloads Free service Ad insertion capability for revenue generation Ads must be inserted in a non-invasive manner Ads may be from third parties Users will visit the url (e.g., www.superproxy.com) and use it for all their Internet needs Presentation/Demo Deadline: Last Lecture class (December 2 nd, 2010) 11/18/2010 CSE 124 Networked Services Fall 2010 2

Giant-Scale Services From Lessons from Giant-Scale Services by Eric A Brewer, UC Berkeley and formerly with Inktomi Corporation 11/18/2010 CSE 124 Networked Services Fall 2010 3

Why Giant Scale Services? Access anywhere, any time Home, office, coffee shop, airport etc. Available via multiple devices Computers, smart-phones, set-top boxes etc Groupware support Centralization of services helps Calendars, e-vite, etc Lower overall cost End-user device utilization: 4% Infrastructure resources: 80% Fundamental cost advantage over stand-alone application Simplified service update Most powerful long term advantage No physical distribution of software or hardware is necessary 11/18/2010 CSE 124 Networked Services Fall 2010 4

Key assumptions Service provider has limited control over Clients Network (except the intranet) Queries drive the traffic Web or database queries Eg. http, ftp, or RPC Read-only queries greatly out-number data updates Read dominates write Product evaluations vs purchases Stock quotes vs trading 11/18/2010 CSE 124 Networked Services Fall 2010 5

Basic Model of Giant Scale Services Clients Web browsers, e-mail clients, XML programs The Best-effort IP network Access to the service The load manager Level indirection to balance the load To prevent faults Servers System workers Combines CPU, memory, and disks Persistence data storage Replicated or partitioned database Spread across servers disks May include network attached storage, DBMs, or RAIDs Services backplane Optional system-area-network Handles inter server traffic 11/18/2010 CSE 124 Networked Services Fall 2010 6

Clusters Clusters in Giant-Scale Servers Collections of commodity servers Main benefits Absolute scalability Many new services must serve a fraction of the world s population Cost and performance Compelling reason for clusters Bandwidth and operational costs dwarf hardware cost Independent components Help handling faults Incremental scalability Helps handle uncertainty and expense of growing the service Typically 3 years depreciation lifetime A unit rack space quadruples in computing 11/18/2010 power in every CSE 124 3 years Networked Services Fall 2010 7

Load Management Simple load management strategy DNS Round-robin IP address distribution DNS It does not hide down/inactive servers Short time-to-live is an option Many browsers mishandle expired DNS info Layer-4/Layer-7 switches Transport/Application layer switches Processes higher layer info at wirespeeds Helps fault tolerance High throughput: 20Gbps Can detect down nodes: by connection status 11/18/2010 CSE 124 Networked Services Fall 2010 8

Load Management (contd) Layer-4 switches Can understand tcp and port numbers to route Service specific layer-7 switches A user: Walmart Can track session information Smart client End-to-end approach Using alternative server info (DNS) 11/18/2010 CSE 124 Networked Services Fall 2010 9

Key Requirements of Giant Scale Services High Availability Much like other communication services Always available water, electricity, telephone etc To handle component failures Natural disasters Growth and evolution Design points (reduce failures) Symmetry Internal disks No people, wires, monitors Offsite clusters Contracts limit temperature and power variations 11/18/2010 CSE 124 Networked Services Fall 2010 10

Availability Metrics Availability metrics Uptime: Fraction of time the site is available (e.g..9999= 99.99%; 8.64 seconds/day) MTBF: Mean time between failures MTTF: Mean time to repair Two ways to improve uptime Increase MTBF or reduce MTTR Hard to improve and verify MTBF MTTR reduction is preferred Total time required to verify improvement is less for MTTR 11/18/2010 CSE 124 Networked Services Fall 2010 11

Availability Metrics Yield: As an availability metric; not as throughput metric Similar to uptime, but translates to user experience Not all seconds have equal value A second lost when there are no queries A second lost during peak hour is really an issue 11/18/2010 CSE 124 Networked Services Fall 2010 12

Availability Metrics Harvest: A query may be answered partially or fully Harvest determines how much info returned Can help in ensuring user satisfaction while handling faults Email inbox loads, but task list or contacts are not Ebay auction info loaded, but not user profile Key point: We can control how faults affect Yield, Harvest, or both Total Capacity remains the same Fault In a Replicated system Reduced yield In a Partitioned system Reduced harvest 11/18/2010 CSE 124 Networked Services Fall 2010 13

DQ Principle Data /query x Queries/second -> Constant A useful metric for giant-scale systems Represents a physical capacity bottleneck Max I/O bandwidth or total disk seeks/second At high utilization a giant-scale system approaches the constant Includes all overhead Data copying, presentation layers, and network-bound issues Each node has a different DQ value Easy to measure the relative impact of faults on DQ values Because different systems have different DQ values 11/18/2010 CSE 124 Networked Services Fall 2010 14

DQ Value (contd..) Overall DQ value linearly scales with number of nodes Helps in sampling of the behavior of the entire system Small test cluster can predict the behavior of the entire system Inktome: 4 node clusters are used to predict the impact of software updates on 100 node clusters DQ impact must be evaluated before any proposed HW/SW changes Linear reduction of DQ is linear with faults 11/18/2010 CSE 124 Networked Services Fall 2010 15

DQ Value (contd..) Future demand DQ addition required is to be estimated Fault impact and DQ Degrade DQ linearly with number of faults (failed nodes) DQ may be handled differently Data intensive services DQ applies mostly to this category Data base access Majority of top 100 sites are data intensive Computation intensive services Simulation, super computing, or Communication intensive services Chat, news, or VoIP How DQ impacts the Uptime Yield Harvest 11/18/2010 CSE 124 Networked Services Fall 2010 16

Replication Vs Partitioning Replication Traditional method for improving availability How DQ affects Replication E.g, two node cluster, one fault Harvest: 100% Yield: 50% Maintains D and reduces Q Partition How DQ affects Partition E.g, two node cluster, one fault Harvest: 50% Yield: 100% Reduces D and maintains Q DQ drops to 50% in both cases 11/18/2010 CSE 124 Networked Services Fall 2010 17

Load Redirection and Replication Traditional replication provisions excess capacity Load redirection is required on faults Replicas handle load handled by the failed nodes Hard to achieve under high utilization k out of n failures will demand a redirection of k/(n-k) load over to the remaining n-k nodes Loss of 2 out of 5 nodes implies a redirected load of 2/3 and an overload of 5/3 (166%) 11/18/2010 CSE 124 Networked Services Fall 2010 18

Replication and DQ Replication of disks is cheap Storage is cheap, but not processing But to access the data, DQ points is required Partitioning has no real savings over replication (in terms of DQ points) The same DQ points are needed In some rare cases, replication can demand more DQ points 11/18/2010 CSE 124 Networked Services Fall 2010 19

Replication and DQ (contd..) Replication and partition can be used to provide better control over availability Partition the data first to suitable size Replicate based on the importance of data Easy to grow the system via replication than partition Replication can be based on data s importance Which data is lost in the event of a fault Replication of key data At the cost of some extra disks A fault can still result in 1/n data loss, but of lesser importance Replication can be made random the lost harvest a random subset of data avoids hotspots in the partitions Search (Inktome): Partial replication Email system: Full replication User content: Full replication Clustered Web: No replication 11/18/2010 CSE 124 Networked Services Fall 2010 20

Graceful Degradation Degradation under faults must be trouble free A Graceful degradation is affected by High peak-to-average ratio 1.6:1 to 6:1 and even 10:1 Single event bursts (Flash crowd) Movie ticket sales, Football matches, breaking sensational news Natural disasters and power failures DQ can drop very high Can happen independently 11/18/2010 CSE 124 Networked Services Fall 2010 21

Graceful degradation under faults DQ principle gives new opportunities Either maintain D and limit Q or reduce D and maintain Q Admission Control (AC) Maintain D, reduces Q Maintains harvest Dynamic database reduction (cut the data size by half) Reduces D, maintains Q Maintains Yield Graceful degradation can be achieved at various degrees combination of the above two Key question: How saturation should affect: uptime, yield, harvest, and Quality of Service 11/18/2010 CSE 124 Networked Services Fall 2010 22

Access control strategies Cost based AC Perform AC based on estimated query cost (in DQs) Reduces the average D per query Denying one expensive query can retain many inexpensive queries Net gain in query and harvest Another method is probabilistic AC Helps retrying queries will lead to success Reduced yield, increased harvest Priority or value based AC Datek handles stock queries differently from other queries Queries will be executed within 60 seconds or they charge no commission Drop low valued queries and thus DQ points Reduced yield, increased harvest Reduced data freshness When saturated, a financial site can make stock quotes expire less frequently Not only reduces freshness, but also DQ requirement Increased yield, reduced harvest 11/18/2010 CSE 124 Networked Services Fall 2010 23

Disaster tolerance Disaster Complete loss of one or more replicas Natural disasters can affect all the replicas in a geographical location Fire or other disasters affect only one replica Disaster tolerance deals with managing replica groups and graceful degradation for handling disaster Key questions: How many locations and how many replicas 2 replicas in 3 locations: 2/6 loss in a natural disaster Each remaining locations must handle 50% (6/4=1.5) more traffic Inktome Current approach Reduce D by 50% at remaining locations Best approach Reduce D by 2/3 and thereby increases Q by 3/2 11/18/2010 CSE 124 Networked Services Fall 2010 24

Disaster Tolerance (contd..) Load management is another issue in Disaster Tolerance When clusters fail, Layer-4 switches do not help DNS Long failover response time (several hours) Smart clients Are more suitable to quick failovers (seconds to minutes) 11/18/2010 CSE 124 Networked Services Fall 2010 25

Evolution and Growth Giant scale services need to be frequently updated Product revisions, software bug fixes, security updates, or addition of new services Hard to detect problems slow memory leaks non-deterministic bugs Continued growth plan is essential Online evolution process Evolution with minimal downtime Giant scale services are frequently updated Acceptable quality software Target MTBF Minimal MTTR No cascading failures 11/18/2010 CSE 124 Networked Services Fall 2010 26

Online evolution Process Each online evolution phase a certain amount of DQ points Total DQ loss for n nodes n-number of nodes, u- time required per node Total DQ loss = DQ x upgrade time per node. Upgrades Software upgrades Quick; new and old systems can co-exist Can be done by controlled reboot during MTTR Hardware upgrades are harder 11/18/2010 CSE 124 Networked Services Fall 2010 27

Upgrade approaches Fast reboot Quickly reboot with the upgrades Downtime cannot be avoided Effect on yield can be contained by scheduling the reboot at off-peak hours Staging area and automation are essential Upgrades happen simultaneously 11/18/2010 CSE 124 Networked Services Fall 2010 28

Upgrade approaches Rolling upgrades Upgrades nodes one at a time in a wave manner One node is down at a time Old and new systems may co-exist Compatibility between old and new systems is a must Partitioned system Harvest will be affected, yield is unaffected Replicated system Harvest and yield are unaffected Upgrade happens on replica at a time Still conducted at off-peak hours to avoid affecting the yield due to faults 11/18/2010 CSE 124 Networked Services Fall 2010 29

Upgrade approaches Big flip Most complicated among the three Upgrade the cluster one half at a time Switch off all traffic to it, take down a half, upgrade it Turn the upgraded part on, direct new traffic to the upgraded part, wait for old traffic in the to-be-upgraded part to complete The upgraded half runs while the old half is taken down One version (half) runs at a time 50% DQ loss Replicas: 50% loss of D (yield) Partitions: 50% loss of Q (harvest) Big flip is powerful Hardware, OS, schema, networking, physical relocation can all be done Inktome did it twice 11/18/2010 CSE 124 Networked Services Fall 2010 30

Basics of Giant scale services: Summary Get the basics right Use symmetry to simplify the analysis and management Decide on availability metrics Yield and Harvest are more important than uptime Focus on MTTR at least as much as MTBF Repair time is easier to affect (and to be controlled) for an evolving system Understand load redirection during faults Replication is insufficient, higher DQ demand is to be considered Graceful degradation Intelligent admission control and dynamic database reduction can help Use DQ analysis of all upgrades Evaluate all upgrade options and DQ demand in advance and do capacity planning Automatic upgrades as much as possible Develop automatic upgrade options such as rolling upgrades, ensure simple way to revert to old version 11/18/2010 CSE 124 Networked Services Fall 2010 31