CNA Cloud-native applications - PDF Free Download

Cloud-native applications Cloud-native application Service Oriented Architecture / Microservice Architecture Twelve-factor applications Cloud Native Application Design Patterns Serverless applications 2

Benefits & Drawbacks of Cloud Computing Benefits for application providers Obtain IT-Resources on Demand (Compute, Storage, Network, Services,...) Speed-Up Development / Deployment Cycle Improve Time-To-Market Pay-As-You-Go Pricing-Model Optimize Costs Drawbacks for application providers Cloud infrastructures are built on commodityhardware to leverage economies-of-scales effect increased failure rate The Cloud infrastructure is shared by its customers (Resource Pooling) negative influence on performance (i.e. Noisy Neighbour Problem) Goal to run applications economically efficient avoid over-provisioning (paying for unused resources) under-provisioning (degrading QoS, violating SLAs) 3

Characteristics of CNA A cloud-native application is an application optimized for running on a cloudinfrastructure (IaaS or PaaS) having the essential characteristics of being scalable and resilient. * One of the main goals of a cloud-native application is to use its underlying cloudresources as economically efficiently as possible. * Each phase in the application life-cycle has to be adapted and optimized for running in a cloud-environment. * A cloud-native application is typically designed as a distributed application built up from stateless components (employing asynchronous communication). It s also possible to get there by migrating (re-designing) an already existing application. It s just semantics but cannot be called Cloud-native in that case 4

Cloud-native application implementation To implement CNA, the following topics need to be addressed Architecture: Application needs to be designed for scalability and resilience move towards Service Oriented Architecture / Microservices Architecture use of CNA specific Patterns Organization: Teams need to be (re)organized to Agile teams organized around Business Capabilities Incorporating DevOps principles and methodologies Process: Tools & Technologies needs to be adapted/extended Automated Software Development / Deployment / Management Pipeline Methodologies and guidelines to support Cloud-native application development Service Oriented Architecture (SOA) / Microservices Architecture Twelve Factor Application CNA Patterns 5

Service Oriented Architecture (SOA) Open group definition: Service-Oriented Architecture (SOA) is an architectural style that supports service-orientation. Service-orientation is a way of thinking in terms of services and service-based development and the outcomes of services. Service: a self-contained unit of functionality that can be accessed in a remote, standardized, technology-independent fashion SOA Principles: Standardized protocols (e.g., SOAP, REST) Abstraction (from service implementation) Loose coupling Reusability Composability Stateless services Discoverable services 6

Microservices Introduction Microservices architecture is an SOA architectural style to develop applications as a suite of small services, each running in its own process and communicating with lightweight mechanisms (REST APIs). They are built around business capabilities following the do one thing well principle. They are independently deployable in a fully automatic way. There is some minimal centralized management of the services. Term coined in 2011 at a software architects workshop and formalized by Martin Fowler and others in 2014. But emerged much earlier among cloud-oriented companies: Amazon in the early 00s moved from the monolithic Obidos application to a service-oriented architecture. Netflix (whose infrastructure is completely cloud-based) did a major redesign of their system along a microservice architecture. Pivotal is one of the main promoters of Microservices today. For more details about Microservices see appendix 7

Twelve-Factor Applications Introduction Collection of best-practices for cloud-native application architectures, originally developed by engineers at Heroku. Methodology for building software-as-a-service apps that are suitable for deployment on modern cloud platforms, obviating the need for servers and systems administration support development / deployment workflow can scale up without significant changes to tooling, architecture, or development practices. scalability Published in 2011 http://12factor.net/ 8

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes One codebase tracked in revision control, many deploys A twelve-factor app is always tracked in a version control system (Git, Mercurial, Subversion, ) There is only one codebase per app, but there will be many deploys of the app. The codebase is the same across all deploys, although different versions may be active in each deploy. 9

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Explicitly declare and isolate dependencies A twelve-factor app never relies on implicit existence of system-wide packages. It declares all dependencies, completely and exactly, via a dependency declaration manifest. It uses a dependency isolation tool to ensure that no implicit dependencies leak in from the surrounding system. Advantage of explicit dependency declaration: it simplifies setup for developers new to the app. They just run a build command which will set up everything deterministically. 10

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Store config in the environment An app s config is everything that is likely to vary between deploys (staging, production, developer environments, etc). This includes: Resource handles to the database, Memcached, and other backing services Credentials to external services such as Amazon S3 or Twitter Per-deploy values such as the canonical hostname for the deploy Twelve-factor requires strict separation of config from code and stores config in environment variables. Env vars are easy to change between deploys without changing any code. Unlike config files, there is little chance of them being checked into the code repo accidentally. 11

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Treat backing services as attached resources A backing service is any service the app consumes over the network as part of its normal operation (databases / datastores, messaging / queueing systems, SMTP services, caching, ) The code for a twelve-factor app makes no distinction between local and third party services. To the app, both are attached resources, accessed via a URL or other locator/credentials stored in the config. At runtime resources can be attached and detached to deploys at will, without any code changes. 12

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Strictly separate build and run stages The twelve-factor app uses strict separation between the build, release, and run stages. For example, it is impossible to make changes to the code at runtime, since there is no way to propagate those changes back to the build stage. Builds are initiated by the app s developers whenever new code is deployed. Runtime execution, by contrast, can happen automatically in cases such as a server reboot, or a crashed process being restarted by the process manager. 13

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Execute the app as one or more stateless processes Twelve-factor processes are stateless and share-nothing. The twelve-factor app never assumes that anything cached in memory or on disk will be available on a future request. Any data that needs to persist must be stored in a stateful backing service, typically a database. Sticky sessions are a violation of twelve-factor and should never be used or relied upon. Session state data is a good candidate for a datastore that offers time-expiration, such as Memcached or Redis. 14

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Export services via port binding The twelve-factor app is completely self-contained and does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port. One app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app. 15

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Scale out via the process model Apps handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process. Scale out app processes horizontally and independently for each process type. The array of process types and number of processes of each type is known as the process formation. 16

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Maximize robustness with fast startup and graceful shutdown The twelve-factor app s processes are disposable, meaning they can be started or stopped at a moment s notice. Minimize startup time Shut down gracefully when receiving a termination signal This facilitates fast elastic scaling, rapid deployment of code or config changes, and robustness of production deploys. Processes should also be robust against sudden death, in the case of a failure in the underlying hardware. 17

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Keep development, staging, and production as similar as possible Continuous delivery and deployment are enabled by keeping development, staging, and production environments as similar as possible. Avoid gaps between development and production Avoid time gap: Write code and have it deployed hours or even just minutes later Avoid personnel gap: developers who wrote code are closely involved in deploying it and watching its behavior in production Avoid tools gap: use the same tools in development and production 18

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Treat logs as event streams All running processes and backing services produce logs, which are commonly written to a file on disk. A twelve-factor app never concerns itself with routing or storage of its output stream. In staging or production deploys, each process stream will be captured by the execution environment, collated together with all other streams from the app, and routed to one or more final destinations for viewing and long-term archival. The stream can be sent to a log indexing and analysis system such as Splunk. This allows for great power and flexibility for introspecting an app s behavior over time, including: Finding specific events in the past. Large-scale graphing of trends (such as requests per minute). Active alerting according to user-defined heuristics (such as an alert when the quantity of errors per minute exceeds a certain threshold). 19

Twelve Factors I II III IV V VI VII VIII IX X XI XII Codebase Dependencies Config Backing services Build, release, run Processes Port binding Concurrency Disposability Dev/prod parity Logs Admin processes Run admin/management tasks as one-off processes Administrative or managements tasks, such as database migrations, should be kept in source control and packaged with the application. They are executed as one-off processes in environments identical to the app s long-running processes. 20

Design patterns for CNA Introduction After companies started embracing the cloud and building applications for it best practices and new design patterns for such applications emerged They all came into existence because of a need to deal with this new environment which constituted of new challenges The aim of most of the patterns is to make the application more scalable and/or resilient. 21

Service Registry / Configuration Mgmt Problem All components/services in a cloud are dynamic, think of lifecycle, every resource/component/service can be provisioned and disposed at any time How do clients of a service and/or routers know about the available instances of a service? Main Ideas The Service Registry manages information of available services Configuration of the services Endpoint of the services / How to reach a service Status of a Service Service instances are registered on startup and deregistered on shutdown Can be polled/used by other components/services (e.g. : Load Balancer) Mostly implemented as a highly-available key-value store Implementations Etcd Consul Zookeeper Netflix Eureka 22

Service Request Flow Complex chain of services calling one or more other services may result in high latency Source: http://techblog.netflix.com/2015/02/a-microscope-on-microservices.html 23

Circuit Breaker Problem In a distributed environment remote services or shared resources might become unavailable at some point. This may effect the response time of depending services. Main Idea A failed service should not influence other parts of a distributed system Fail fast and allow invoking service to react Implementation A circuit breaker is a proxy that sits between your application and a remote service or shared resource that your application accesses. If a request to the remote service or shared resource is highly likely to fail then the circuit breaker proxy will not forward the request and immediately respond with an alternative default result or an error message. Achieves two things: The remote service or shared resource will not be flooded by requests which could amplify the problem. The application won't waste resources (RAM, CPU) for requests that in the end will time out or never return. 24

Circuit Breaker - State Machine Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications The circuit breaker is implemented as a state machine with three states mimicking an electric circuit breaker Closed, Open and Half-Open Closed: Circuit Breaker will pass every request. If a request fails a failure counter will be increased. If the failure counter surpasses a certain threshold the circuit breaker will change its state to open. The failure counter will be reset after a certain amount of time. (Prevents state from changing to open if only sporadic failures occur). Open: Circuit Breaker will immediately return an error when asked to forward a request. After a certain amount of time it will change to the half-open state. Another possibility is that the circuit breaker sporadically sends some request/pings by himself to see if the service is responsive again and change to the half-open state after confirming it. Half-Open: Circuit Breaker will let some request pass while still responding with an immediate error to most requests. The reason for this is to not just overwhelm a maybe recuperating service. It will count the successful requests and will change its state to close if a certain amount of successful requests have passed. If a request fails the state will immediately change to open again. 25

Service Load Balancer Problem A single cloud service implementation has a finite capacity, which leads to runtime exceptions, failure and performance degradation when its processing thresholds are exceeded. Main Idea Redundant deployments of the cloud service are created and a load balancing system is added to dynamically distribute workloads across cloud service implementations. A service load balancer is a piece of software-logic which receives information and forwards this information to a multitude of recipients. There are multiple ways (algorithms) of how the forwarding-decision can be taken. Some examples being: Round-Robin: Information is distributed in a round-robin fashion Least-Connection: Information is sent to the recipient with the least current connections Source: Information is sent to the same recipient as on former connections Used for Sticky Sessions 26

Service Load Balancer variants Dedicated / central Load Balancer: Runs in separate process Shared by clients Often shared for multiple services Possible bottleneck Client Load Balancer Service Service Example: HA-Proxy Client Service Registry Service Client-Side Load Balancer: Implemented and used as a library Runs in client process Client decides which service instance to connect Required to keep in sync (needs registry) Client Client Load Balancer Load Balancer Service Service Example: Netflix Ribbon Service Registry Service 27

API Gateway Problem An large application may consist of many endpoints, especially in a microservices architecture. If the client (e.g. browser) would call these directly he would cause high network-traffic and increase the latency. He even may need to know internal structures or support multiple protocols. This makes it difficult to change the internal structure or implementation. Main Idea Provide clients with a single, simplified entry point to the backend. May provide client specific API variants (Browser, Mobile App - ios or Android, IoT devices) May accesses tens (or hundreds) of internal services (microservices) concurrently Aggregates the responses and transforms them to meet the client s needs An implementation of the facade pattern 28

API Gateway - problem Source: https://spring.io/blog/2014/11/24/springone2gx-2014-replay-developing-microservices-for-paas-with-spring-and-cloud-foundry 29

API Gateway - solution Source: https://spring.io/blog/2014/11/24/springone2gx-2014-replay-developing-microservices-for-paas-with-spring-and-cloud-foundry 30

Health Endpoint Monitoring Problem How to detect and report the status (health) of your application. Even if the process is still running, the application could be crashed, stuck or have some other problem. Main Idea Implement functional checks within an application that external tools can access through exposed endpoints at regular intervals. This pattern can help to verify that applications and services are performing correctly. Implementation A health monitoring check typically combines two factors: The checks performed by the application/service in response to the request to the health verification endpoint. Analysis of the result by the tool or framework that is performing the health verification check. Typical checks that can be performed by the monitoring tools include: Validating the (HTTP) response code. Checking the content of the response to detect errors Measuring the response time, which indicates a combination of the network latency and the time that the application took to execute the request Checking resources or services located outside the application, such as a content delivery network (CDN) used by the application to deliver content from global caches. Checking for expiration of SSL certificates. 31

Health Endpoint Monitoring Source: Cloud Design Patterns: Precriptive Architecture Guidance For Cloud Applications 32

Health Manager Problem Applications may fail at any time. Hardware defect, VM failure, programming error like memory leaks, etc. How to make sure, that all the desired number of instances of each programs and services are operational? Main Idea Monitor the application processes and status (health) of applications (e.g. by checking the health endpoints) Assures that the set of deployed resources for a service are: consistent with the desired state of the service and functioning correctly Implementation It uses a specification of the desired state and compares it with the "actual state" It automatically restarts failed components Examples Fleet (Docker) Kubernetes CloudFoundry HM9000 33

And more Cloud Design Patterns... Valet Key Use a token or key that provides clients with restricted direct access to a specific resource or service in order to offload data transfer operations from the application code. Throttling Manage the consumption of resources used by an instance of an application, an individual tenant, or an entire service, by throttling access on high load with the goal to stay operational. Queue-Based Load Levelling Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads. Competing Consumers Enable multiple concurrent consumers to process messages received on the same messaging channel to optimize throughput Event Sourcing Rather than storing just the current state, Event Sourcing record the full series of events that describe actions taken on data in a domain, which are replayed to reconstruct the current state. Command Query Response Segregation (CQRS) Split the system into two parts. The command side handles create, update and delete requests. The query side handles queries using one or more materialized views of the application s data. See details in Appendix And even more https://msdn.microsoft.com/en-us/library/dn568099.aspx 34

Serverless computing Introduction Serverless computing refers to applications that significantly depend on custom code run in ephemeral containers in the cloud (Function-as-a-Service (FaaS)). FaaS first introduced by AWS Lambda (2014) Followed by Google Cloud Functions (2016) FaaS characteristics: Stateless compute containers Event-triggered Ephemeral (container may last only for one invocation) Fully managed by cloud service provider (including autoscaling) Pricing based on number of requests and time the code executes (100 millisecond increments) When combining a FaaS backend with a Single-Page application front-end one obtains a serverless web app. Serverless" meaning that the organization building and supporting a "serverless" application is not looking after server hardware or IaaS virtual machine instances or PaaS instances, instead delegating everything to the cloud provider. 35

Serverless computing AWS Lambda Originally designed for use cases such as image upload, responding to website clicks or reacting to sensor readings from an IoT device. 36

Serverless computing AWS Lambda 37

Serverless computing AWS Lambda The code that run on AWS Lambda is called a Lambda function. Lambda functions must be stateless. Maximum execution time 300 seconds (configurable timeout, default 3 seconds). Inbound network connections are blocked, but outbound network connections are allowed. Supported languages: Java Node.js C# Python Code can include custom libraries, even native ones. Makes it possible to run arbitrary executables (on Amazon Linux). 38

Serverless computing AWS Lambda Triggers A Lambda function can be invoked via HTTPS synchronously or asynchronously using Amazon Gateway. One can schedule Lambda functions to be invoked at a specific time. The service is tightly integrated with other AWS services which can act as event sources to trigger Lambda functions. A lambda function can respond to changes in an Amazon S3 (object store) bucket respond to updates in an Amazon DynamoDB (NoSQL DBaaS) table process records in an Amazon Kinesis (publish/subscribe messaging) stream respond to notifications sent by Amazon Simple Notification Service (SNS) respond to emails sent to Amazon Simple Email Service (SES) respond to Amazon CloudWatch (monitoring) alarms process CloudWatch log events respond to changes in user or device data managed by Amazon Cognito respond to changed in a CodeCommit (version control) repository respond to changes in Amazon Config resource configurations respond to Alexa voice assistant events respond to Lex (conversational interface) events 39

Serverless computing A serverless web app Example of a traditional web application transformed into a serverless application. 1. Part of the app executes clientside, using JavaScript. Code, HTML and CSS are downloaded from static file server. 2. Authentication logic is moved to a third-party authentication service. 3. Client accesses product listings directly from product database. 4. Client-side code keeps track of user session, does page navigation, etc. 5. Compute-heavy search is implemented as Lambda function. 6. Purchase is implemented as Lambda function as well, kept server-side for security reasons. Client (browser) 4 Client (browser) Clientside logic 2 3 Download HTML, JavaScript, CSS 1 API Gateway Server, VM or instance Web app Authentication service Static file server Product database Purchase function Search function 6 5 Database Purchase database 40

References Matt Stine Migrating to Cloud-Native Application Architectures 2015-02 O'Reilly Media Martin Fowler Microservices 2014-03 http://www.martinfowler.com/articles/ microservices.html Irakli Nadareishvili, Ronnie Mitra, Matt McLarty, Mike Amundsen Microservice Architecture Alex Homer, John Sharp, Larry Brader, Masashi Narumoto, Trent Swanson Cloud Design Patterns 2016-07 O Reilly Media 2014-02 Microsoft Press 41

Appendix 42

Life-cycle of a cloud-application 43

Microservices Monolithic vs. Microservices architecture Source: http://www.martinfowler.com/articles/microservices.html 44

Microservices Componentization around services Shipment UI Accounting Any well-architected system is based on modular components. Traditionally components have been encapsulated into libraries. Interfaces defined using programming language mechanisms. Microservices componentize around services. Services are accessed over the network. Interface defined by RPC mechanisms (e.g. REST APIs). Services are independently deployable (no need to deploy the whole monolith when a component changes). Services are independently scalable (able to accommodate individual processing load). Supply Warehouse Database Monolithic UI Shipment Accounting GET Database Database GET Supply GET Warehouse Database Database Microservices 45

Microservices Organized around Business Capabilities Conway s law: Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure. Common to see in large organizations teams split along UI, server-side logic and database (silos). Because simple changes require crossteam approval which is burdensome each team starts to develop code workarounds. In the microservice approach teams are split along business capability. Teams are cross-functional: skills across the whole stack. Team size follows two pizza principle: the whole team can be fed by two pizzas. Siloed functional teams lead to a siloed application architecture. Cross-functional teams organized around business capabilites 46

Microservices Monolithic vs. Microservices architecture Architecture Monolithic Built as a single logical executable (typically the server-side part of a three tier clientserver-database architecture) Microservices Built as a suite of small services, each running separately and communicating with lightweight mechanisms Modularity Based on language features Based on business capabilities Agility Scaling Implementation Maintainability Persistence Changes to the system involve building and deploying a new version of the entire application Entire application scaled horizontally behind a load-balancer Typically written in one language Large code base intimidating to new developers One single database holding all the data Changes can be applied to each service independently Each service scaled independently when needed Each service implemented in the language that best fits the need Smaller code base easier to manage Each service has its own independent database 47

More patterns 48

etcd: An example of a service registry Distributed key value store Designed for: shared configuration & service discovery Implements Raft consensus algorithm Handles machine failures, master election etc. Actions: read, write, listen Data structure /folder /folder/key REST-API easy to use client: etcdctl read/write a value etcdctl get /folder/key etcdctl set /folder/key value read/create directory etcdctl mkdir /folder etcdctl ls /folder listen to changes etcdctl watch /folder/key etcdctl exec-watch /folder/key -- /bin/bash -c touch /tmp/test Slide credit: Martin Blöchlinger Migrating an Application into the Cloud with Docker and CoreOS 49

Retry Pattern Problem In a distributed environment like the cloud you will have to deal with transient errors. Errors that only occur for a very short time and will be resolved by the system Main Idea If a service/resource responds with an error and based on the error message it might be transient, first retry the request before assuming service/resource is down. If the fault indicates that the failure is not transient or is unlikely to be successful if repeated the application should abort the operation and report a suitable exception Possible reasons for transient errors Temporary overload of service Network interruption Corrupted package Different implementation strategies depending on the kind of application: Retry with no time delay (immediate) Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications Retry with a fixed time delay 50

Valet Key Pattern Problem When offering static data like images or videos the process of up- or downloading that media can use up a high amount of resources resulting in (unnecessarily) high costs. Main Idea Use a token or key that provides clients with restricted direct access to a specific resource or service in order to offload data transfer operations from the application code Example: http://docs.openstack.org/juno/config-reference/content/object-storage-tempurl.html Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications 51

Throttling Pattern Problem A CNA has always a limit on immediately available (allocated) resources (e.g., CPU, RAM, Storage) Bursts in user-request can over overwhelm the application (Poor Performance) Throttling of service can mitigate the problem until scaling was performed or the situation has normalized Main Idea Control the consumption of resources used by an instance of an application, an individual tenant, or an entire service. This pattern can allow the system to continue to function and meet service level agreements, even when an increase in demand places an extreme load on resources. Throttling Strategies Reject requests from an individual user who has already accessed system APIs more than n times per second over a given period of time. Disable or degrade the functionality of selected nonessential services so that essential services can run unimpeded with sufficient resources. E.g: If the application is streaming video output, it could switch to a lower resolution. Defer operations being performed on behalf of lower priority applications or tenants. These operations can be suspended or curtailed, with an exception generated to inform the tenant that the system is busy and that the operation should be retried later. 52

Throttling Pattern: Example Figure 1 - Graph showing resource utilization against time for applications running on behalf of three users Figure 2 - Graph showing the effects of combining throttling with autoscaling Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications 53

Queue-Based Load Leveling Pattern Problem It is possible that a service might experience peaks in demand that cause it to become overloaded and unable to respond to requests in a timely manner. Flooding a service with a large number of concurrent requests may also result in the service failing if it is unable to handle the contention that these requests could cause. Main Idea Use a queue that acts as a buffer between a task and a service that it invokes in order to smooth intermittent heavy loads that may otherwise cause the service to fail or the task to time out It is also possible to use the queue as an indicator for the auto-scaling mechanism Instantiate more Services when the message queue contains a certain number of messages (see also Competing Consumer Pattern ) Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications 54

Competing Consumers Pattern Problem An application running in the cloud may be expected to handle a large number of requests. Rather than process each request synchronously, a common technique is for the application to pass them through a messaging system to another service (a consumer service) that handles them asynchronously. Using a single instance of the consumer service might cause that instance to become flooded with requests or the messaging system may be overloaded by an influx of messages coming from the application. Main Idea Enable multiple concurrent consumers to process messages received on the same messaging channel (Queue). This pattern enables a system to process multiple messages concurrently to optimize throughput, to improve scalability and availability, and to balance the workload. Consider Consumers must be coordinated to ensure that each message is only delivered to a single consumer. The workload also needs to be load balanced across consumers to prevent a certain instance from becoming a bottleneck. Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications 55

Competing Consumers Pattern Benefits Enables an load-leveled system that can handle wide variations in the volume of requests sent by application instances. The queue acts as a buffer between the application instances and the consumer service instances Handling a message that requires some longrunning processing to be performed does not prevent other messages from being handled concurrently by other instances of the consumer service. Improves reliability. The message queue ensures that each message is delivered at least once. Makes a solution easily scalable. Issues and Considerations Message being processed multiple times Message processing should be idempotent Detecting Poison Messages Detect and handle malformed messages Handling Result handling In case the consumer generates a result which is needed by the producer of the message, there needs to be a way for it to access this result Scaling the Messaging System It is quite possible that the messaging systems (queue) itself becomes overwhelmed by the amount of messages it needs to manage. In such a case the messaging system itself also needs to be scalable Ensuring Reliability of the Messaging System A reliable messaging system is needed to guarantee that, once the application enqueues a message, it will not be lost. This is essential to ensure that all messages are delivered at least once 56

Event Sourcing Problem The typical approach for application is to maintain the current state of the data by updating it directly in the data store (typical CRUD model). Limitations: The fact that CRUD systems perform update operations directly against a data store may hinder performance and responsiveness, and limit scalability, due to the processing overhead it requires. In a collaborative domain with many concurrent users, data update conflicts are more likely to occur because the update operations take place on a single item of data. Unless there is an additional auditing mechanism, which records the details of each operation in a separate log, history is lost. Main idea Use an append-only store to record the full series of events that describe actions taken on data in a domain, rather than storing just the current state. At any point in time it is possible to read the history of events, and use it to materialize the current state of an entity. This may occur on demand in order to materialize a domain object, or through a scheduled task so that the state of the entity can be stored as a materialized view to support the presentation layer. This pattern can simplify tasks in complex domains by avoiding the requirement to synchronize the data model and the business domain; improve performance, scalability, and responsiveness; provide consistency for transactional data; and maintain full audit trails and history that may enable compensating actions. Source: Cloud Design Patterns: Prescriptive Architecture Guidance For Cloud Applications An overview and example of the Event Sourcing pattern 57

Command and Query Responsibility Segregation Pattern (CQRS) Problem In traditional data management systems, both commands (updates to the data) and queries (requests for data) are executed against the same set of entities in a single data repository. Often there is a mismatch between the read and write representations of the data, such as additional columns or properties that must be updated correctly even though they are not required as part of an operation. Risks data contention in a collaborative domain when records are locked, or update conflicts caused by concurrent updates when optimistic locking is used. These risks increase as the complexity and throughput of the system grows. It can make managing security and permissions more cumbersome because each entity is subject to both read and write operations, which might inadvertently expose data in the wrong context. Main idea Split the system into two parts. The command side handles create, update and delete requests and stores it in a write data store. The query side handles queries using one or more materialized views of the application s data. The read store can be a read-only replica of the write store, or the read and write stores may have a different structure altogether. Using multiple read-only replicas of the read store can considerably increase query performance. Separation of the read and write stores also allows each to be scaled appropriately to match the load. For example, read stores typically encounter a much higher load that write stores. CQRS is often used in combination with Event Sourcing 58