Microservices, Messaging and Science Gateways. Review microservices for science gateways and then discuss messaging systems.

Similar documents
Micro- Services. Distributed Systems. DevOps

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Project Midterms: March 22 nd : No Extensions

Knowns and Unknowns in Distributed Systems

MOM MESSAGE ORIENTED MIDDLEWARE OVERVIEW OF MESSAGE ORIENTED MIDDLEWARE TECHNOLOGIES AND CONCEPTS. MOM Message Oriented Middleware

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Data Acquisition. The reference Big Data stack

RabbitMQ Overview. Tony Garnock-Jones

Apache Storm. Hortonworks Inc Page 1

AMQP Advanced Message Queuing Protocol

Your Microservice Layout

AMQP Advanced Message Queuing Protocol

Data Acquisition. The reference Big Data stack

Which application/messaging protocol is right for me?

Producer sends messages to the "hello" queue. The consumer receives messages from that queue.

Create High Performance, Massively Scalable Messaging Solutions with Apache ActiveBlaze

Red Hat Summit 2009 Jonathan Robie

Polling Sucks. So what should we do instead?

Building Large Scale Distributed Systems with AMQP. Ted Ross

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

RabbitMQ: Messaging in the Cloud

RFC 003 Event Service October Computer Science Department October 2001 Request for Comments: 0003 Obsoletes: none.

A Survey of Distributed Message Broker Queues

Intra-cluster Replication for Apache Kafka. Jun Rao

Distributed Computation Models

Enhancing cloud applications by using messaging services IBM Corporation

Building loosely coupled and scalable systems using Event-Driven Architecture. Jonas Bonér Patrik Nordwall Andreas Källberg

Message Queueing. 20 March 2015

Ultra Messaging Queing Edition (Version ) Guide to Queuing

Indirect Communication

MSG: An Overview of a Messaging System for the Grid

Hands-On with IoT Standards & Protocols

ZooKeeper. Table of contents

Indirect Communication

Indirect Communication

Outline. Interprocess Communication. Interprocess Communication. Communication Models: Message Passing and shared Memory.

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

Science-as-a-Service

Real World Messaging With Apache ActiveMQ. Bruce Snyder 7 Nov 2008 New Orleans, Louisiana

Enterprise Integration Patterns: Designing, Building, and Deploying Messaging Solutions

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Kepware Whitepaper. IIoT Protocols to Watch. Aron Semle, R&D Lead. Introduction

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Maintaining Spatial Data Infrastructures (SDIs) using distributed task queues

Purplefinder Enterprise Platform Messagng with ActiveMQ. Peter Potts 13 th October 2010

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Introduction to MySQL InnoDB Cluster

AN EVENTFUL TOUR FROM ENTERPRISE INTEGRATION TO SERVERLESS. Marius Bogoevici Christian Posta 9 May, 2018

IEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services

Red Hat Enterprise MRG 3 Messaging Programming Reference

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Developing Microsoft Azure Solutions (70-532) Syllabus

Distributed Systems Fault Tolerance

SAS Event Stream Processing 5.1: Advanced Topics

05 Indirect Communication

Coordination 1. To do. Mutual exclusion Election algorithms Next time: Global state. q q q

High Availability Distributed (Micro-)services. Clemens Vasters Microsoft

Integration of the Managed Trading Service with AMQP

Building a Real-time Notification System

Catalyst. Uber s Serverless Platform. Shawn Burke - Staff Engineer Uber Seattle

IERG 4080 Building Scalable Internet-based Services

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

Developing Microsoft Azure Solutions (70-532) Syllabus

Replication Mechanism of ZEMIS Ref

Hi! NET Developer Group Braunschweig!

Celery-RabbitMQ Documentation

Broker Clusters. Cluster Models

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

Zumobi Brand Integration(Zbi) Platform Architecture Whitepaper Table of Contents

Alteryx Technical Overview

Distributed Systems. Messaging and JMS Distributed Systems 1. Master of Information System Management

<Insert Picture Here> QCon: London 2009 Data Grid Design Patterns

Service Bus Guide. September 21, 2018 Version For the most recent version of this document, visit our documentation website.

GFS Overview. Design goals/priorities Design for big-data workloads Huge files, mostly appends, concurrency, huge bandwidth Design for failures

Cloud-Native Applications. Copyright 2017 Pivotal Software, Inc. All rights Reserved. Version 1.0

PushyDB. Jeff Chan, Kenny Lam, Nils Molina, Oliver Song {jeffchan, kennylam, molina,

Event Streams using Apache Kafka

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

User Datagram Protocol

Distributed Data on Distributed Infrastructure. Claudius Weinberger & Kunal Kusoorkar, ArangoDB Jörg Schad, Mesosphere

Spring Cloud, Spring Boot and Netflix OSS

ECE 650 Systems Programming & Engineering. Spring 2018

Applications of Paxos Algorithm

Building Durable Real-time Data Pipeline

Remote Procedure Call. Tom Anderson

describe the functions of Windows Communication Foundation describe the features of the Windows Workflow Foundation solution

Today s Agenda. Today s Agenda 9/8/17. Networking and Messaging

Today CSCI Remote Method Invocation (RMI) Distributed Objects

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Microservices on AWS. Matthias Jung, Solutions Architect AWS

Time and Space. Indirect communication. Time and space uncoupling. indirect communication

Distributed Systems. The main method of distributed object communication is with remote method invocation

IEMS 5722 Mobile Network Programming and Distributed Server Architecture

Achieving Scalability and High Availability for clustered Web Services using Apache Synapse. Ruwan Linton WSO2 Inc.

Introduction to Kafka (and why you care)

CS 716: Introduction to communication networks th class; 11 th Nov Instructor: Sridhar Iyer IIT Bombay

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

利用 Mesos 打造高延展性 Container 環境. Frank, Microsoft MTC

GAVIN KING RED HAT CEYLON SWARM

Four times Microservices: REST, Kubernetes, UI Integration, Async. Eberhard Fellow

Transcription:

Microservices, Messaging and Science Gateways Review microservices for science gateways and then discuss messaging systems.

Micro- Services Distributed Systems DevOps

The Gateway Octopus Diagram Browser HTTPS Web Interface Server Client SDK HTTP or TCP/IP We will focus on this piece Karst: MOAB/Torque Server SDK Application Server Resource Plugins Stampede2: SLURM Comet: SLURM Jureca: SLURM Jobs sent to supercomputers have a nontrivial lifecycle

State in Science Gateways Scientific applications running on supercomputers have lifecycles Queued Executing Completed Gateways may add additional states Created (but not yet queued) Archived The lifecycle of an executing job may be several minutes, hours, days, or even longer Analogous to ordering a physical object delivered via Amazon. State management is an important concept in gateways. It guides component (microservice) design and implementation It governs messaging patterns

Basic Components of the Gateway App Server Become Microservices Server SDK API Server Application Server Resource Plugins Application Manager Metadata Server For now, we will not depict the communication connections between the microservices on the left. These are over-the-wire and need to be non-blocking in many cases.

Replicate the Microservices Application Manager API Server Metadata Server Communication patterns and system state are important API Server API Server API Server Application Application Manager Application Manager Application Manager Application Manager Application Manager Manager Metadata Metadata Server Metadata Server Metadata Server Metadata Server Metadata Server Server

Messaging in Distributed Systems Examining messaging in distributed systems though Advanced Message Queuing Protocol (AMQP) and RabbitMQ overviews

Basic Components of the Gateway App Server API Server Application Manager How do these services communicate? Metadata Server

Each service actually needs to be replicated. Application Manager API Server Metadata Server API Server API Server API Server API Server Application Application Manager Application Manager Application Manager Application Manager Application Manager Manager Metadata Metadata Server Metadata Server Metadata Server Metadata Server Metadata Server Server

Point to Point Communication Is Brittle API Server Application Manager Combinatorial problems when you have > 3 services How do you manage load balancing and service instance discovery? How do rewire communications when you need to add more services? What if multiple logical services need to receive the same message? Metadata Server

Messaging Systems Solve Many of These Problems API Server Application Manager Message Broker Metadata Server

They Work Well with Service Replication API Server API Server API Server Application Manager Application Manager Application Manager Message Broker Metadata Server Metadata Server Metadata Server

Messaging Summary Review RabbitMQ s wonderful set of tutorials for several programming languages. https://www.rabbitmq.com/getstarted.html RabbitMQ implements some basic AMQP patterns out of the box. You don t need to understand the full power and flexibility of AMQP-based systems to use messaging. We ll introduce some of the main concepts

AMQP: Network Protocol and Architecture, Not An API Programmable by endpoints Can create and manage their own queues, exchanges, etc. The broker manager does not need to configure these things. Many Implementations RabbitMQ, Apache ActiveMQ, Apache Qpid, SwiftMQ,... I ll focus on Version 0-9-1 and RabbitMQ RabbitMQ and Apache Zookeeper Similar philosophy: provide primitive operations that can be combined to support more complicated scenarios. This is not an AMQP tutorial

Value of Message Queuing Systems, Generally They are queues for messages... Even if you are doing point-to-point routing, messaging systems remove the need for publishers and consumers to know each other s IP addresses. Publishers and consumers just need to know how to connect to the message broker. The network locations and specific instances of the publishers and consumers can change over time. But logical instances persist Synchronous and asynchronous messages natively supported. Multiplex multiple channels of communication over a single TCP/IP connection

Basic Concepts

An AMQP Server (or Broker) Exchange Accepts producer messages Sends to 0 or more Message Queues using routing keys Message Queue Routes messages to different consumers depending on arbitrary criteria Buffers messages when consumers are not able to accept them fast enough.

Producers and Consumers Producers only interact with Exchanges Consumers interact with Message Queues Consumers aren t passive Can create and destroy message queues The same application can act as both a publisher and a consumer You can implement Request-Response with AMQP Except the publisher doesn t block Ex: your application may want an ACK or NACK when it publishes This is a reply queue

The Exchange Receives messages Inspects a message header, body, and properties Routes messages to appropriate message queues Routing usually done with routing keys in the message payload For point-to-point messages, the routing key is the name of the message queue For pub-sub routing, the routing key is the name of the topic Topics can be hierarchical

Message Queue Properties and Examples Basic queue properties: Private or shared Durable or temporary Client-named or server- named, etc. Combine these to make all kinds of queues, such as Store-and-forward queue: holds messages and distributes these between consumers on a round-robin basis. Durable and shared between multiple consumers. Private reply queue: holds messages and forwards these to a single consumer. Reply queues are typically temporary, server-named, and private to one consumer. Private subscription queue: holds messages collected from various "subscribed" sources, and forwards these to a single consumer. Temporary, server-named, and private

Consumers and Message Queues AMQP Consumers can create their own queues and bind them to Exchanges Queues can have more than one attached consumer AMQP queues are FIFO AMQP allows only one consumer per queue to receive the message. Use round-robin delivery if > 1 attached consumer. If you need > 1 consumer to receive a message, you can give each consumer their own queue. Each Queue can attach to the same Exchange, or you can use topic matching.

Publish-Subscribe Patterns Useful for many-to-many messaging In microservice-based systems, several different types of components may want to receive the same message But take different actions Ex: you can always add a logger service You can always do this with explicitly named routing keys. You may also want to use hierarchical (name space) key names and pattern matching. gateway.jobs.jobtype.gromacs gateway.jobs.jobtype.*

The Message Payload Read the specification for more details. In general AMQP follows the header-body format The message body payload is binary AMQP assumes the body content is handled by consumers The message body is opaque to the AMQP server. You could serialize your content with JSON or Thrift and deserialize it to directly send objects.

Message Exchange Patterns

Direct Exchange A publisher sends a message to an exchange with a specific routing key. The exchange routes this to the message queue bound to the routing key. A consumer receives the messages if listening to the queue. Default: round-robin queuing to deliver to multiple subscribers of same queue Queue.Declare queue=app.svc01 Basic.Consume queue=app.svc01 Basic.Publish routingkey=app.svc01

Fanout Exchange Message Queue binds to an Exchange with no argument Publisher sends a message to the Exchange The Exchange sends the message to the Message Queue All consumers listening to all Message Queues associated with an Exchange get the message

Topic Exchange Message Queues bind using routing patterns instead of routing keys. A Publisher sends a message with a routing key. Exchange will route to all Message Queues that match the routing key s pattern

More Examples RabbitMQ Tutorial Has several nice examples of classic message exchange patterns. https://www.rabbitmq.com/getstarted.html What It Omits Many publishers Absolute and partial event ordering are hard problems Broker failure and recovery

Some Useful Capabilities of Messaging Systems for Microservices Overarching Requirement: It should support your system s distributed state machine Let s brainstorm some

Useful Capabilities: My List (1/2) Supports both push and pull messaging Deliver messages in order Successfully delivered messages are delivered exactly once OK to have multiple recipients Deliver messages to one or more listeners based on pre-defined topics. Store messages persistently There are no active recipients. All recipients are busy Determine if critical messages were delivered correctly

Useful Capabilities: My List (2/2) Redeliver messages that weren t correctly received Corrupted messages, no recipients, etc Recipient can change Redeliver messages on request Helps clients resynch their states Allow other components to inspect message delivery metadata. Supports elasticity, fault tolerance Priority messaging? Qualities of Service Security, fault tolerance

Which Messaging Software to Choose? AMQP does not cover all the capabilities listed above. It can be extended to cover these in many cases AMQP messaging system implementations are not necessarily cloudready They have to be configured as highly available services. Primary + failover No fancy leader elections, etc as used in Zookeeper + Zab or Apache Kafka Have scaling limitations, although these may not matter at our scales. Other messaging systems (Kafka, HedWig) are alternatives

Some Applications

Simple Work Queue Queue up work to be done. Publisher: pushes a request for work into the queue Queue should be a simple Direct Exchange Message Queue should implement only deliver message once to once consumer. Round-robin scheduling. RabbitMQ does this out of the box Consumer: Sends an ACK after completing the task If a Queue-Client closes before an ACK, resend message to a new consumer. RabbitMQdetects these types of failures.

Simple Work Queue MSG 1 AppMan Worker Gateway API Server MSG1, 2, 3 Broker ACK ACK MSG 2 AppMan Worker ACK 3 Job requests come to the API Server API Server publishes to Broker (Exchange + Message Queue) Broker sends to workers in round-robin Workers send ACKs when done If Broker detects a closed connection before an ACK, sends MSG to a different worker MSG 3 AppMan Worker

What Could Possibly Go Wrong? Jobs take a long time to finish, so ACKs may not come for hours. Durable connections needed between Consumers and Message Queues Alternatively, the ACK could come from a different process Jobs may actually get launched on the external supercomputer, so you don t want to launch twice just because of a missing ACK Clients have to implement their own queues Could get another work request while doing work.

Work Queue, Take Two Orchestrator pushes work into a queue. Have workers request work when they are not busy. RabbitMQsupports this as prefetchcount Use round-robin but don t send work to busy workers with outstanding ACKs. Workers do not receive work requests when they are busy. Worker sends ACK after successfully submitting the job. This only means the job has been submitted Worker can take more work A separate process handles the state changes on the supercomputer Publishes queued, executing, completed or failed messages When job is done, Orchestrator creates a cleanup job Any worker available can take this.

Work Queue, Take 2 2. RunJob AppMan Worker 1a. Ready 3. ACK Orchestrator 0. RunJob 7. Cleanup Broker 5. Done Job Monitoror 9. ACK 1b. Ready Experiment Metadata Manager 4., 6., 10. Job Status 8. CleanUp AppMan Worker

What Could Possibly Go Wrong? A Worker may not be able to submit the job Remote supercomputer is unreachable, for example We need a NACK The Orchestrator and Experiment Metadata components are also consumers. Should send ACKs to make sure messages are delivered. Orchestrator and Experiment Metadata Manager may also die and get replaced. Unlike AppMan workers, Orchestrator and EMM may need a leader-follower implementation Broker crashes RabbitMQprovides some durability for restarting Possible to lose cached messages that haven t gone to persistent storage

What Else Could Go Wrong? Lots of things. How do you debug unexpected errors? Logs A logger like LogStash should be one of your consumers No one-to-one messages any more. Everything has at least 2 subscribers Your log service The main target Or you could use Fanout

Summary Message systems provide an abstract system for routing communications between distributed entities. You don t need to know the physical addresses Support multiple messaging patterns out of the box. You don t have to implement them. Queues are a powerful concept within distributed systems. Entities can save messages in order and deliver/accept them at a desirable rate. Queues are a primitive (foundational) concept that you can use to build more sophisticated systems.