A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

Similar documents
Intra-cluster Replication for Apache Kafka. Jun Rao

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack

Microservices, Messaging and Science Gateways. Review microservices for science gateways and then discuss messaging systems.

Introduction to Kafka (and why you care)

Tools for Social Networking Infrastructures

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

Building LinkedIn s Real-time Data Pipeline. Jay Kreps

Project Midterms: March 22 nd : No Extensions

Micro- Services. Distributed Systems. DevOps

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Voldemort. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Overview Design Evaluation

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Map-Reduce. Marco Mura 2010 March, 31th

Fluentd + MongoDB + Spark = Awesome Sauce

Knowns and Unknowns in Distributed Systems

Distributed Coordination with ZooKeeper - Theory and Practice. Simon Tao EMC Labs of China Oct. 24th, 2015

How VoltDB does Transactions

Introduction to Apache Kafka

Distributed Computation Models

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

STORM AND LOW-LATENCY PROCESSING.

ZooKeeper. Table of contents

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

ZooKeeper & Curator. CS 475, Spring 2018 Concurrent & Distributed Systems

CS 138: Google. CS 138 XVII 1 Copyright 2016 Thomas W. Doeppner. All rights reserved.

Evolution of an Apache Spark Architecture for Processing Game Data

Agreement and Consensus. SWE 622, Spring 2017 Distributed Software Engineering

Outline. Spanner Mo/va/on. Tom Anderson

Building Durable Real-time Data Pipeline

Microservices Lessons Learned From a Startup Perspective

Diving into Open Source Messaging: What Is Kafka?

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Applications of Paxos Algorithm

Transactions. CS 475, Spring 2018 Concurrent & Distributed Systems

FLAT DATACENTER STORAGE. Paper-3 Presenter-Pratik Bhatt fx6568

CS 138: Google. CS 138 XVI 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CA485 Ray Walshe Google File System

Replication. Feb 10, 2016 CPSC 416

Over the last few years, we have seen a disruption in the data management

Distributed File Systems II

Replication in Distributed Systems

Distributed Data Management Replication

A Survey of Distributed Message Broker Queues

Apache Kafka a system optimized for writing. Bernhard Hopfenmüller. 23. Oktober 2018

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

Modern Stream Processing with Apache Flink

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Distributed ETL. A lightweight, pluggable, and scalable ingestion service for real-time data. Joe Wang

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

The Google File System (GFS)

PNUTS: Yahoo! s Hosted Data Serving Platform. Reading Review by: Alex Degtiar (adegtiar) /30/2013

BigData and Map Reduce VITMAC03

Flat Datacenter Storage. Edmund B. Nightingale, Jeremy Elson, et al. 6.S897

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein. Copyright 2003 Philip A. Bernstein. Outline

Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Yahoo! Sunnyvale, California USA {Shv, Hairong, SRadia,

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.

The Google File System

Event Streams using Apache Kafka

Microservice Splitting the Monolith. Software Engineering II Sharif University of Technology MohammadAmin Fazli

Auto Management for Apache Kafka and Distributed Stateful System in General

GFS: The Google File System

GFS: The Google File System. Dr. Yingwu Zhu

BookKeeper overview. Table of contents

The Power of Snapshots Stateful Stream Processing with Apache Flink

AGREEMENT PROTOCOLS. Paxos -a family of protocols for solving consensus

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

NPTEL Course Jan K. Gopinath Indian Institute of Science

The Google File System

Putting it together. Data-Parallel Computation. Ex: Word count using partial aggregation. Big Data Processing. COS 418: Distributed Systems Lecture 21

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Intuitive distributed algorithms. with F#

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

MapReduce. U of Toronto, 2014

Distributed Systems COMP 212. Revision 2 Othon Michail

<Insert Picture Here> QCon: London 2009 Data Grid Design Patterns

A day in the life of a log message Kyle Liberti, Josef

Apache Kafka Your Event Stream Processing Solution

Distributed Systems 16. Distributed File Systems II

REAL-TIME ANALYTICS WITH APACHE STORM

Comparative Analysis of Big Data Stream Processing Systems

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

Distributed Operating Systems

18-hdfs-gfs.txt Thu Oct 27 10:05: Notes on Parallel File Systems: HDFS & GFS , Fall 2011 Carnegie Mellon University Randal E.

Distributed Architectures & Microservices. CS 475, Spring 2018 Concurrent & Distributed Systems

Spark, Shark and Spark Streaming Introduction

Exploiting Commutativity For Practical Fast Replication. Seo Jin Park and John Ousterhout

The Google File System

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

Dynamic Reconfiguration of Primary/Backup Clusters

Recovering from a Crash. Three-Phase Commit

microsoft

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

Transcription:

A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers

As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented as described.

LinkedIn s Requirements LinkedIn needs to push billions of messages per day to a wide variety of consumers Real-time processing applied to activity streams Keep people engaged with the site through social network-supplied content. Asynchronous processing Find people I may want to connect to Find topics that may be interesting to me Identify advertisements that I may be interested in System logs: Identify problems with operations Machine learning is critical to all of these. But is OK to lose some data.

Consequences of Use Case Requirements Consumers must decide when to pull the data Fast or slow, small or large This means that the messaging system needs to store a lot of data (TBs) This is not what traditional message systems are designed to do. Kafka will need an efficient way to find the requested message Virtue from Necessity: support message rewind and replay This is not a normal operation for a queue, which removes messages after they are delivered. Treat the accumulated, ordered messages as input for a state machine

Apache Kafka and Related Work Apache Kafka is a hybrid of a log aggregator and a messaging system Messaging systems Usually assume low latency of delivery. Messages are delivered quickly This doesn t work for batch systems Have complicated delivery guarantees that aren t needed for real-time data Are not designed for high throughput techniques such as message batching Weakly designed as distributed systems: consistency rather than availability (CAP) Distributed Log Aggregators (Scribe, Flume, Hedwig) Are only for server logs Only support push model: producer sets the rate Kafka use case: pull model lets consumers set the rate, support rewind

Kafka supports many different types of consumers https://image.slidesharecdn.com/current-141113081750-conversion-gate02/95/current-and-future-of-apache-kafka-9-638.jpg?cb=1415866728

Apache Kafka Terminology: Topic-Based Publish-Subscribe Component Topic Producer Broker Consumer Description The label for a stream of messages of a particular type. An entity that publishes to a topic by sending messages to a broker An entity on a network that receives, stores, and routes messages. An entity that subscribes to one or more topics. Kafka generalizes this to Consumer Groups

Kafka Brokers Connecting message producers to consumers

Kafka Uses Clusters of Brokers Kafka is run as a cluster of brokers on one or more servers. Note use of Zookeeper. Zookeeper also used by Producers and Consumers

Distributed Brokers Kafka brokers are designed to be distributed Multiple brokers Messages to each topic are stored in partitions Partitions are allocated across brokers Partitions are stored as multiple segment files on a specific broker.

Topics and Partitions Topics are broken up into partitions that span multiple brokers

Kafka Partitions Kafka topics are divided into partitions. Partitions parallelize topics by splitting the data across multiple brokers Each partition can be placed on a separate machine Allows multiple consumers to read from a topic in parallel. Topic content size can be larger than physical storage on any one broker Consumers can also be parallelized to read from multiple partitions Kafka guarantees all messages in a partition are time-ordered, but it does not guarantee order across partitions. Choose partitioning strategies accordingly.

Sequential, Deterministic Lookups Random access is anathema to Kafka s goals for high throughput. Partitions consist of one or more segment files Each message stored in a segment file has a local ID that is determined by the size of all the messages that come before. An index stores the message ID of the first message in a segment

Aside: Write-Ahead Logging Write-Ahead Logging: this is a technique of writing your file first and then having your broker read the file. This is the reverse of the way logging normally works Why? If the broker needs to be restarted, it reads it log to recover its state. You don t need to worry so much about lost messages from publishers. Kafka uses the file system Linux file systems already have many sophisticated features for balancing inmemory versus on-disk files

Kafka Partitions Are Replicated If a leader for a partition replica fails, a follower becomes the new leader

Producers

Kafka Producers Producers publish data to the topics of their choice. The producer is responsible for choosing which record to assign to which partition within the topic. This can be done in a round-robin fashion simply to balance load Other distribution strategies can be used. Producers write to the partition s leader. The broker acting as lead for that partition replicates it to other brokers

A producer writes to Partition 1 of a topic. Broker 2 is the leader. It writes replicas to Brokers 1 and 3

Consumer Groups

Kafka Consumer Groups Consumer groups contain one or more consumers of a particular topic. Many consumer groups can subscribe to the same topic. A consumer group is whatever is useful for a particular consuming application Only one member of a consumer group consumes the messages in a partition Avoid locking and other state management issues Kafka wants to divide messages stored on brokers evenly among consumers for load balancing But keep it simple, avoid complicated coordination.

Consumers and Consumer Groups Kafka lets you collect consumers into consumer groups In a group, each consumer instance is associated with a specific partition. Collectively, a group receives all messages on a topic. If the group expands or contracts, Kafka will rebalance. Consumer groups resemble queues

Consumer Group Scenarios Consumer Load Balancing: All consumers are in one group Consumer Group B, previous slide Broadcast: each consumer is its own group Each consumer receives messages from all partitions Broker Load Balancing: N(Brokers) > N(Consumer Groups) Consumer Group A, previous slide But messages between partitions may not be ordered Consumer ordering: N(Partitions) == N(Consumers in a Group) Each member gets messages from only one partition We assume round-robin message distribution to partitions. Note message order is preserved only within a partition

Rewinding and Replaying Messages Kafka persistently stores messages much longer than conventional messaging systems Doesn t assume low-latency delivery. The state of a topic is the message order, stored in partition files. A consumer can request the same messages many times if it needs to. Why? Rollback. A consumer may have had a bug, so fix the bug and consume the message again with the corrected code. Recall Blue-Green deployments Or the consumer may have crashed before processing the message This is NOT a typical queue pattern. Rewinding is much more straightforward in a pull-based architecture.

Kafka and Zookeeper Managing brokers, consumers, and producers

Zookeeper Registry Broker Registry Consumer Registry Ownership Registry Offset Registry Description Contains brokers host names, ports, topics, and partitions. Used by the brokers to coordinate themselves. Ex: deal with a broker failure. Contains the consumer groups and their constituent consumers. Contains the ID of the consumer of a particular consumer group that is reading all the messages. This is the owner. Stores the last consumed message in a partition for a particular consumer group. Node Type EPHEMERAL EPHEMERAL EPHEMERAL PERSISTENT Each consumer places a watch on the broker registry and the consumer registry and will be notified if anything changes.

Delivery Guarantees Kafka chooses at least once delivery. It is up to the consuming application to know what to do with duplicates Duplicates are rare, occur when an owning consumer crashes and is replaced Two-phase commits are the classic way to ensure exactly once delivery. Messages from a specific partition are guaranteed to come in order. Kafka stores a CRC (a hash) for each message in the log to check for I/O errors

What About the Message Payload? Apache Kafka supports clients in multiple programming languages. This means that the message must be serialized in a programming language-neutral format. You can make your own with JSON or XML Kafka also supports Apache Avro, which is a schema-based binary serialization format. Compare Avro with Apache Thrift and Protobuf Efficient message formats are essential for high throughput systems

Kafka, Airavata, and Microservices Some thought exercises

Three microservices, replicated Application Manager API Server Metadata Server API Server API Server API Server API Server Application Application Manager Application Manager Application Manager Application Manager Application Manager Manager Metadata Metadata Server Metadata Server Metadata Server Metadata Server Metadata Server Server

Messaging and Microservices In Assignment 2, we that message queues are a good way to deliver messages to service replicas. Each replica of a service gets a message Topics are a good way to associate message types with different service groupings. Application manager and metadata manager In Kafka, we could put each of these into a consumer group However, work queues are very important to Airavata This is not directly implemented in Kafka, so we would need to implement This is probably not a good strategy. We could always just use Kafka as a log aggregator.

Some General Distributed Systems Principals Kafka, logs, and REST

Log-Centric Architecture Distributed servers use a replicated log to maintain a consistent state The log records system states as sequential messages. New servers can be added to expand the system or replace malfunctioning servers by reading the log No in-memory state needs to be preserved The server just needs to know that it has an uncorrupted (not necessarily latest) version of the log. You can use this approach for both highly consistent and highly available systems (CAP) Kafka is a log-oriented system that can be used to build other log-oriented systems

This just leaves one little problem... How do you keep the log replicas up to date?

Primary-Backup Replication 1 leader has the master copy and followers have backups Quorum-Based Replication 1 leader has the master copy and followers have backups On WRITE, the master awaits the appending to all backup for acknowledging the client On WRITE, the master waits on only a majority of the followers to confirm backups before it returns Supports strong consistency for distributed READS, but doesn t scale easily and has lower throughput If the master is lost, restore from a backup Supports eventual consistency and higher throughput; doesn t require good networking between leader and followers If a master is lost, elect a new leader from the replicas that have the latest data F+1 replicas can tolerate F failures 2F+1 replicas can tolerate F failures

Kafka State Management Kafka brokers are stateless They don t track which messages a client has consumed or not. This is the client s job Brokers simply send whatever the client requests Compare to REST Brokers eventually must delete data How does a broker know if all consumers have retrieved data? It doesn t. Kafka has a Service Level Agreement: Delete all data older than N days for example

Kafka Brokers are stateless. This makes broker distribution simpler. State management is done by producers and consumers. Messages can be delivered in batches At least once delivery Eventual consistency model for messages partitioned across one broker Optimized for highly variable latency, large message throughput: streaming data, logs to late night batch pulls Assumes replay of messages Traditional Messaging: AMQP, JMS, Etc Brokers are state-full. This makes distribution more difficult since load balancing and fault recovery are harder. Messages are individually delivered Exactly once delivery Strong consistency between brokers in a distributed system Optimized for low latency delivery of smaller messages Replay is an add-on Apache Kafka resembles in some ways the REST architecture

Sources Kreps, J., Narkhede, N. and Rao, J., 2011, June. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (pp. 1-7). Wang, G., Koshy, J., Subramanian, S., Paramasivam, K., Zadeh, M., Narkhede, N., Rao, J., Kreps, J. and Stein, J., 2015. Building a replicated logging system with Apache Kafka. Proceedings of the VLDB Endowment, 8(12), pp.1654-1655. https://kafka.apache.org/documentation/ https://sookocheff.com/post/kafka/kafka-in-a-nutshell/

Log Processing and Distributed Messaging Companies like LinkedIn have both real-time and batch processing needs Batch processing: traditional business analytics Done on Hadoop hourly or daily Too slow to capture social network data: likes, views, updates to news feeds, etc Real-time processing needs to live side by side with batch processing They built a system that could do both: Apache Kafka