Scalability. ! Load Balancing. ! Provisioning/auto scaling. ! State Caching Commonly used items of the database are kept in memory. !

Size: px

Start display at page:

Download "Scalability. ! Load Balancing. ! Provisioning/auto scaling. ! State Caching Commonly used items of the database are kept in memory. !"

Ruby Eaton
5 years ago
Views:

Scalability! Load Balancing Recommended Text: Scalability! Provisioning/auto scaling! State Caching Commonly used items of the database are kept in memory.

Database Replication Bettina Kemme, Ricardo Jiménez-Peris, Marta Patiño-Martínez Morgan & Claypool Publishers! Fault Tolerance 2 2010 VMware Inc. All rights reserved Load Balancing!

Load aware Consider load on replicas when routing requests Different request may generate different load Replica nodes may have different capacity Strategies: Shortest queue first Simple to

1 Scalability! Load Balancing Recommended Text: Scalability! Provisioning/auto scaling! State Caching Commonly used items of the database are kept in memory. Replication Logical items of the database (e.g., tuples, objects) have multiple physical copies located on different nodes. Partitioning Logical items of the database (e.g., tuples, objects) are divided among multiple physical nodes. Database Replication Bettina Kemme, Ricardo Jiménez-Peris, Marta Patiño-Martínez Morgan & Claypool Publishers! Fault Tolerance VMware Inc. All rights reserved Load Balancing! Determines where requests/transactions will be executed! Blind techniques 3 Require minimum or no state on load balancer Strategies: Random Round robin! Load aware Consider load on replicas when routing requests Different request may generate different load Replica nodes may have different capacity Strategies: Shortest queue first Simple to implement, just track number of active requests Least loaded Replicas need to periodically provide load info to load balancer Load Balancing! Application Aware 4 Uses knowledge of the application Strategies: Shortest execution length Profile request types to generate execution time estimates Load estimates load at replica by keeping tracks of pending requests and their type, i.e., determines application-specific load metric. Routes to less loaded replica Locality-aware request distribution (LARD) Takes into account data accessed by requests, and routes similar request to the same replica to increase cache hit rates Conflict-aware distribution Execute potentially conflicting requests on same replica for early conflict detection 1

2 Load Balancing Comparison! Blind General Easy to implement Suboptimal! Load aware Better performance Require more state! Application aware Best performance Require even more state Harder to implement Brittle need to change if application changes AWS Elastic Load Balancer! Balances load between EC2 instances! Can distribute load across availability zones! Within an availability zone: Round-robin among the least-busy instances Tracks active connections per instance Does not monitor CPU! Supports the ability to stick user sessions to specific EC2 instances. 5 6 AppEngine Load Balancing! Google is not saying! Self Provisioning! Automatically adjust the worker pool or number of replicas to the current workload Add workers/replicas when load increases Retire replica/workers when under load! Approaches Reactive Reconfigure when load threshold reached Proactive User prediction mechanism to trigger reconfiguration Reconfigure before saturation Based on time series or machine learning approaches 7 8 2

3 Self Provisioning! Retiring replicas/workers Move instance load to rest of system All new request go to other nodes Wait for outstanding transactions to commit When idle, release Self Provisioning Considerations! Load metrics Use low level system metrics to determine node utilization CPU, I/O utilization Use application level metrics Response time, e.g., transaction completes within X milliseconds.! Adding replicas/workers Transfer state/database to new replica Receive all updates that occurred during previous transfer Add replica to pool! Cost/latency prediction How long and how expensive it is to add a replica/worker Reactive: the higher the latency the lower the configuration threshold Proactive: the higher the latency the farther into the future we need to predict! Monitoring Monitoring isn t free The more data that we collect, the higher the impact on the system 9 10 AWS Auto Scaling! Scale automatically according to user-defined conditions! Enabled by CloudWatch! Create scale up/down policy! Associate scaling policy with CloudWatch alarm AS Tutorial! Step 1: Download the command line tools Step 2: Create a credentials file AWSAccessKeyId=<Write your AWS access ID> AWSSecretKey=<Write your AWS secret key>! Step 3: Set environment variables AWS_AUTO_SCALING_HOME=<directory where you installed the tools> JAVA_HOME=<java installation home directory> PATH = $AWS_AUTO_SCALING_HOME/bin:$PATH AWS_CREDENTIAL_FILE=<the file created in 2>! Step 4: Test as-cmd help prints list of available commands

Tutorial! Step 5: Create a EC2 security group and a load balancer! Step 6: Create a launch configuration AS Tutorial!

--instance-type m1.small --group default!

ece1779_config --min-size=1 --max-size 3 --load-balancers ece1779loadbalancer --health-check-type ELB --grace-period 300 Health-chek-type

Step 8: Create a scaling policy as-put-scaling-policy ScaleUp --auto-scaling-group ece1779_group --adjustment=1 --type ChangeInCapacity

4 Tutorial! Step 5: Create a EC2 security group and a load balancer! Step 6: Create a launch configuration AS Tutorial! Step 9: Create alarms on CloudWatch and associate with AS policy as-create-launch-config ece1779_config --image-id ami-37ec3c5e --instance-type m1.small --group default! Step 7: Create an auto scaling group as-create-auto-scaling-group ece1779_group --availability-zones us-east-1a --launch-configuration ece1779_config --min-size=1 --max-size 3 --load-balancers ece1779loadbalancer --health-check-type ELB --grace-period 300 Health-chek-type can set to EC2 (machine level health) or ELB (app-level health)! Step 8: Create a scaling policy as-put-scaling-policy ScaleUp --auto-scaling-group ece1779_group --adjustment=1 --type ChangeInCapacity Other types: ExactCapacity and PecentageChangeInCapacity AppEngine Auto Scaling! App owner can adjust the front instance class,# of idle instances, and the pending latency. AppEngine Auto Scaling! Can configure number of backend nodes! war/web-inf/backends.xml! Administer AppEngine management console appcfg.sh command line tool (part of the Java SDK)

Caching! Objective: Reduce load on storage server Reduce latency to access data! Store recent/hot/frequent data in RAM! Volatile! Fast read/write access memcached!

5 Caching! Objective: Reduce load on storage server Reduce latency to access data! Store recent/hot/frequent data in RAM! Volatile! Fast read/write access memcached! A distributed memory cache implementation! Provides a key-value store interface put (key,value) value = get(key)! Scalable and consistent temporary storage Secondary system that provides fast access to data Data stored reliably somewhere else Google memcache! Inspired by memcached! Commonly used to cache datastore entities by their keys! Key is up to 250 bytes long Larger keys hashed! Value can be up to 1 megabyte! Single value operation are atomic No support for multi-item transactions! Support for atomic numeric value increment/decrement Usage Model! Data accesses get value from memcached if cache miss fectch data from datastore put value on memcached operate on value! Date updates Possible to overwrite value with new data No transactional way to do this Update may succeed in datastore and fail in memcache Updates may happen at different order in datastore and memcache Instead: Invalidate item on update Fetch fresh value on next access

Memcache Java API! Key and value can be any Java serializable object!

Put Memcache Java API! Get! Delete! Contains 21 22 Memcache Java API! Increment!

Fault tolerance High availability despite failures If one replica fails, there is another one that

6 Memcache Java API! Key and value can be any Java serializable object! Obtain memcache pointer MemcacheService synccache = MemcacheServiceFactory.getMemcacheService();! Put Memcache Java API! Get! Delete! Contains Memcache Java API! Increment! Cache statistics Replication! Fault tolerance High availability despite failures If one replica fails, there is another one that can take over! Throughput Support a large use base Each replica handles a fraction of the users Scalability! Response time Support a geographically distributed user base There is a replica close to the user Fast local access

7 Stateless vs. Stateful! Stateless Services that require mainly computing State can be regenerated on demand Easy to replicate! Stateful Services uses business critical data Good performance depends on state availability Harder to replicate Transactions! User-defined sequence of read and write operations on data items that meet the following properties: Atomicity Transaction either executes entirely and commits, or it aborts not leaving any changes in the database All or nothing Consistency Integrity constraints on the data are maintained Isolation Provides impression that no other transaction is executing in the system Durability Once the transaction commits, the changes are indeed reflected in the database, even in the event of failures Replica Control! Task of keeping data copies consistent as items are updated! There is a set of database nodes R A, R B,! Database consist of set of logical data items x, y,.! Each logical item x has physical copies x A, x B,. Where x A resides in R A! A transaction is a sequence of read and write operation on logical data items! The replica control mechanism maps the operation on the logical data items onto the physical copies Read-One-Write-All (ROWA)! Common replica control method! Read can be sent to any replicas Logical read operation r i (x) on transaction T i Mapped to r i (x A ) on one particular copy of x! Updates performed on all replicas Logical write operation w i (x) on transaction T i Mapped to w i (x A ), w i (x B ), on one particular copies of x! ROWA works well because on most applications reads >> writes

1-copy equivalence! A replicated system provides the same semantics as a nonreplicated system 1-copy-isolation 1-copy-atomicity 1-copy-durability 1-copy-consistency!

Provides strongest isolation level! Concurrent execution of a set of transactions most be equivalent to some possible serial execution of the set!

To be serializable all conflicts have to execute in the same order Ti has to appear to execute before Tj or Tj before Ti Acyclic serialization graph S1, S2 are serial executions S3 is serializable S4

8 1-copy equivalence! A replicated system provides the same semantics as a nonreplicated system 1-copy-isolation 1-copy-atomicity 1-copy-durability 1-copy-consistency! Requires thigh coupling of replica control mechanisms with db mechanisms that achieve: atomicity, consistency, isolation, durability Serializability! Most well-known isolation model! Provides strongest isolation level! Concurrent execution of a set of transactions most be equivalent to some possible serial execution of the set! Conflict: two operations that access the same item, are from different transactions, and at least 1 is a write! To be serializable all conflicts have to execute in the same order Ti has to appear to execute before Tj or Tj before Ti Acyclic serialization graph S1, S2 are serial executions S3 is serializable S4 is unserializable Serializability! Use concurrency control to enforce transaction isolation Acquire share lock before read Acquire exclusive lock before write Release all locks at end of transaction (2-phase locking) 1-copy-serializability! First attempt: just make sure schedules are serializable at each replica Does not work! Need to order all conflicting operations in the same way! Whenever one local schedule executes one of Ti s operations before a conflicting operation of Tj, then Ti is executed before Tj in all replicas

9 1-copy-atomicity! Transaction executes on its entirety and commits, or it aborts and does not leave any effects on database! On replicated database Transaction has to have same decision of either all (commit) or nothing (abort) at all replicas at which it performs ROWA All replicas for update transaction 1 replica for read-only transactions Requires some agreements protocol to force all replicas to reach same decision on outcome of transaction 2-phase commit Read-One-Write-All (ROWA)! Key design alternatives: Where are updates executed When are changes propagated How are changes propagated Primary Copy vs. Update Anywhere! Primary copy All updates are executed first on a single node, the primary Advantages: simpler to implement Disadvantages: primary can be come bottleneck! Update Anywhere/Everywhere Updates and read only request can be sent to any replica Advantages: potentially more scalable Disadvantages: harder to guarantee consistency Eager vs. Lazy! Eager replication Confirmation is only returned to user once all secondaries execute update Advantages: Strong consistency Disadvantages: Replica coordination can be slow! Lazy/asynchronous replication Confirmation is returned to client after updates are applied to a single replica Updates are subsequently propagated to other replicas Advantages: Fast Disadvantages: Weaker consistency, potential for conflicts

Eager Primary Copy Eager Primary Copy Properties S(y) acquire shared lock X(y) acquire exclusive lock U(y) unlock! Strengths 1-copy-serializability 1-copy-atomicity!

locking can lead to blocking in the presence of long read transactions Primary may become bottleneck as it has to execute all reads for update transactions Single point of failure 37 38 Eager Update

10 Eager Primary Copy Eager Primary Copy Properties S(y) acquire shared lock X(y) acquire exclusive lock U(y) unlock! Strengths 1-copy-serializability 1-copy-atomicity! Weaknesses Transaction only commits when all replicas have executed writes Execution time determined by slowest replica 2-phase-commit is slow and expensive Combination of eager write propagation and locking can lead to blocking in the presence of long read transactions Primary may become bottleneck as it has to execute all reads for update transactions Single point of failure Eager Update Anywhere Eager Update Anywhere Properties! Strengths 1-copy-serializability 1-copy-atomicity No single point of failure! Weaknesses Possibility for distributed deadlock Adds significant complexity to protocol Transaction only commits when all replicas have executed writes Execution time determined by slowest replica 2-phase-commit is slow and expensive Combination of eager write propagation and locking can lead to blocking in th presence of long read transactions

eager primary copy 41 42 Lazy Update Anywhere Lazy Update Anywhere Properties!

11 Lazy Primary Copy Lazy Primary Copy Properties! Strengths Fast Transactions don t involve remote communication! Weaknesses Weak consistency Remote replicas have stale date Does not comply with 1-copy-atomicity Similar bottleneck issues as eager primary copy Lazy Update Anywhere Lazy Update Anywhere Properties! Strengths Fast Transactions don t involve remote communication! Weaknesses Weak consistency Remote replicas have stale date Does not comply with 1-copy-atomicity, 1-copy-isolation

12 Processing Write Operations Replication Architecture! Writes have to the executed on all replicas! Write processing is the main overhead of replication! Symmetric update processing 45 SQL statement is sent to all replicas All replicas parse the statement, determine the tuples affected, and perform the modification/deletion/insertion. Pros: Reuse existing mechanisms Cons: Redundancy Execution has to be deterministic. Consider and update that sets a timestamp! Asymmetric update processing SQL statement is executed locally on a single replica Writeset (extracted changes) sent to other replicas Identifier and after-image of each updated record Pros: Efficiency Cons: Harder to implement! Kernel-based or White-box approach Implement inside the database Pros: Best integration with concurrency control and other transaction control mechanisms Cons: Hard to maintain and port to other databases Confined to a single database system! Middleware-based or Black-box approach Replication performed by an outside component interposed between the database replicas and the client Pros: Does not require changes to the database Portability, modularity Can use different database systems Cons: Have to re-implement concurrency control Coarse-grain locking " Less parallelism! Grey-box approach Database provides special replications API Example: a writeset extraction/application API Middleware calls on the APIs 46 MySQL Replication! ROWA! Primary copy! Eager and lazy replication! Symmetric and asymmetric update processing! Full and partial replication MySQL Isolation Levels! SERILIZABLE! REPEATABLE READ (default) By default reads do not acquire locks A snapshot is created at transaction start Multiple reads to the same item will return consistent value Writes by others will not be seen Possible to acquire explicit locks SELECT * FROM sometable FOR UPDATE SELECT * FROM sometable LOCK IN SHARE MODE! READ COMMITTED! READ UNCOMMITTED

13 App Engine Datastore! High Replication Datastore (HRD) Data is replicated across multiple data centers using a consensus algorithm Queries are eventually consistent Pros: Highest level of availability for reads and writes Cons: Higher latency on writes due to the propagation of data.! Master/Slave Datastore One data center holds the master copy of all data Data written to the master data center is replicated asynchronously to all other (slave) data centers. Pros: Strong consistency for all reads and queries Cons: Temporary unavailability during data center downtime Failures! Machine crashes! Network errors Partitions Quorum protocols Message loss TCP handles this case Message corruption TCP handles this case Self Healing! Normal operation Replica control protocols keep replicas consistent Provide enough information to recover from failures! Failover Failure detections In case of primary failure, choose a new primary Voting Deterministic assignment scheme, e.g., chose surviving replica with smallest ID Deal with outstanding transactions Preserve atomicity All submitted transactions should either commit or abort at all remaining replicas! Recovery Restart a failed replica, or create a new replica Integrate into the replication system Failure Tolerant Architecture! Decouple application to tolerate failure! Implemented as shared-noting independent modules! Use reliable message queues for coordination! When a task fails, it is just restarted! In addition, each part of the application can scale independently! Examples: Amazon SQS, AppEngine Task Queues

Fault tolerant Retry if task does not succeed (returns HTTP 200 status) Retry rate uses exponential backoff strategy!

14 AppEngine Task Queues! Task Quest Asynchronous processing Image transcoding of the critical path! Scheduled Tasks Periodic processing, cron jobs Fetching and caching data from remote services Sending daily status s Generating reports Task Queues! A facility for enqueueing HTTP requests! Fault tolerant Retry if task does not succeed (returns HTTP 200 status) Retry rate uses exponential backoff strategy! Enqueueing a task is fast About 3 time faster than writing to datastore Adding a Task! Tasks consist of URL If null, URL set to /_ah/queue/queue-name Default URL for default queue: /_ah/queue/default Name If null, AppEngine assignes a unique name Name can be used to prevent enqueueing duplicates Request parameters Configuring Task Queues! Task queues defined in WEB-INF/queue.xml! An application can have up to 10 queues! Configure queues to determine how fast tasks execute bucket-size " maximum number tokens for the queue to execute, a task need a token rate " speed at which token are replenished

Securing Tasks! AppEngine s front end recognizes requests coming from a queue as if it were coming from a developer account! Can use web.xml to restrict access to task URLs Sharding!

15 Securing Tasks! AppEngine s front end recognizes requests coming from a queue as if it were coming from a developer account! Can use web.xml to restrict access to task URLs Sharding! Challenges: Very large working set Slows reads Facebook/Google user table Too many writes! Solution: Partition the data into shards Assign shards to different machines Denormalize the data Sharding Strategies! Range-based partitioning E.g., username from a to m assigned to shard 1, n to z to shard 2 Pros: simple Cons: hard to get load balancing right! Hash-based partitioning Compute hash of key. Different shards responsible for different hash ranges Pros: Good load balancing Cons: A little more complex,! Directory-based partitioning Lookup service keeps track of partitioning scheme Pros: Flexibility Cons: Lookup service may become bottleneck Denormalization! Data that is access together is stored together E.g., multi valued properties in AppEngine datastore entities! Eliminates costly joins! May require replication E.g., a comment may be stored on the commenter s and commentee s profile

16 Sharding! Pros High availability Faster queries More write bandwidth! Cons Queries get more complicated Need to figure out which shard to target May need to join data from multiple shards Referential integrity Foreign key constrains are hard to enforce Not supported in many databases 61 16

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Data Modeling and Databases Ch 14: Data Replication Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich Database Replication What is database replication The advantages of