CON Apache Kafka

Size: px

Start display at page:

Download "CON Apache Kafka"

Hannah Baldwin
6 years ago
Views:

1 CON Apache Kafka Scalable Message Processing and more! Guido Schmutz guidoschmutz.wordpress.com BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

2 Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: Slideshare: Twitter: gschmutz

3 With over 600 specialists and IT experts in your region. COPENHAGEN HAMBURG 14 Trivadis branches and more than 600 employees 200 Service Level Agreements Over 4,000 training participants DÜSSELDORF Research and development budget: CHF 5.0 million FRANKFURT Financially self-supporting and sustainably profitable BASEL FREIBURG STUTTGART BRUGG ZURICH MUNICH VIENNA Experience from more than 1,900 projects per year at over 800 customers GENEVA BERN LAUSANNE

4 Agenda 1. What is Apache Kafka? 2. Kafka Connect 3. Kafka Streams 4. KSQL 5. Kafka and "Big Data" / "Fast Data" Ecosystem 6. Kafka in Enterprise Architecture

5 What is Apache Kafka?

6 Apache Kafka History 0.11 Exactly Once Semantics Performance Improvements 0.10 Data Processing (Streams API) 0.9 Data Integration (Connect API) KSQL Developer Preview 0.7 Cluster mirroring data compression 0.8 Intra-cluster replication

7 Apache Kafka - Unix Analogy KSQL Kafka Connect API Kafka Streams API Kafka Connect API $ cat < in.txt grep "kafka" tr a-z A-Z > out.txt Kafka Core (Cluster) Adapted from: Confluent

8 Kafka High Level Architecture The who is who Producers write data to brokers. Consumers read data from brokers. All this is distributed. The data Data is stored in topics. Topics are split into partitions, which are replicated. Zookeeper Ensemble Producer Producer Producer Kafka Cluster Broker 1 Broker 2 Broker 3 Consumer Consumer Consumer

9 Apache Kafka P 0 Kafka Broker 1 Movement Topic P Truck Kafka Broker 2 Movement Topic P P Kafka Broker 3 Movement Topic P Movement Processor Movement Processor Movement Processor P

10 Kafka Producer Write Ahead Log / Commit Log Producers always append to tail (append to file, i.e. segment) Truck Order is preserved for messages within same partition Kafka Broker Movement Topic

11 Kafka Consumer - Partition offsets Offset A sequential id number assigned to messages in the partitions. Uniquely identifies a message within a partition. Consumers track their pointers via (offset, partition, topic) tuples Kafka 0.10: seek to offset by given timestamp using method KafkaConsumer#offsetsForTimes New data from Producer Consumer Group A Consumer Group B Consumer at earliest offset Consumer at specific offset Consumer at latest offset

12 Data Retention 3 options 1. Never: 2. Time based (TTL): log.retention.{ms minutes hours} 3. Size based: log.retention.bytes 4. Log compaction based (entries with same key are removed): kafka-topics.sh --zookeeper zk:2181 \ --create --topic customers \ --replication-factor 1 \ --partitions 1 \ --config cleanup.policy=compact

13 Data Retention - Log Compaction ensures that Kafka always retain at least the last known value for each message key within a single topic partition compaction is done in the background by periodically recopying log segments. Offset Key Value K1 K2 K1 K1 K3 K2 K4 K5 K5 K2 K6 K2 V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 Compaction Offset Key Value K1 K3 K4 K5 K2 K6 V4 V5 V7 V9 V10 V11

14 Topic Viewed as Event Stream or State Stream (Change Log) Event Stream State Stream (Change Log Stream) T20:18: T20:18: T20:18: T20:19: T20:19: T20:19:23 11,Normal,41.87, ,Normal,40.38, ,Normal,42.23, ,Normal,41.71, ,Normal,38.65, ,Normal41.71, T20:18:46,11,Normal,41.87, T20:18:55,11,Normal,40.38, T20:18:59, 21,Normal,42.23, T20:19:01,21,Normal,41.71, T20:19:02,11,Normal,38.65, T20:19:23,21,Normal41.71,-91.32

Demo (I) Truck-1 Truck-2 truck position console consumer Truck-3 2016-06-02 14:39:56.

15 Demo (I) Truck-1 Truck-2 truck position console consumer Truck :39: Wichita to Little Rock Route2 Normal Testdata-Generator by Hortonworks

$Demo (I) Create Kafka Topic $ kafka-topics --zookeeper zookeeper:2181 --create \ --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics$

16 Demo (I) Create Kafka Topic $ kafka-topics --zookeeper zookeeper: create \ --topic truck_position --partitions 8 --replication-factor 1 $ kafka-topics --zookeeper zookeeper:2181 list consumer_offsets _confluent-metrics _schemas docker-connect-configs docker-connect-offsets docker-connect-status truck_position

17 Demo (I) Run Producer and Kafka-Console-Consumer

18 Demo (I) Java Producer to truck_position Constructing a Kafka Producer private Properties kafkaprops = new Properties(); kafkaprops.put("bootstrap.servers","broker-1:9092); kafkaprops.put("key.serializer", "...StringSerializer"); kafkaprops.put("value.serializer", "...StringSerializer"); producer = new KafkaProducer<String, String>(kafkaProps); ProducerRecord<String, String> record = new ProducerRecord<>( truck_position", driverid, eventdata); try { metadata = producer.send(record).get(); } catch (Exception e) {}

19 Demo (II) devices send to MQTT instead of Kafka Truck-1 Truck-2 truck/nn/ position Truck :39: Wichita to Little Rock Route2 Normal

20 Demo (II) devices send to MQTT instead of Kafka

21 Demo (II) - devices send to MQTT instead of Kafka how to get the data into Kafka? Truck-1 Truck-2 truck/nn/ position? truck position raw Truck :39: Wichita to Little Rock Route2 Normal

22 Kafka Connect

23 Kafka Connect - Overview Source Connector Sink Connector

24 Kafka Connect Single Message Transforms (SMT) Simple Transformations for a single message Defined as part of Kafka Connect some useful transforms provided out-of-the-box Easily implement your own Optionally deploy 1+ transforms with each connector Modify messages produced by source connector Modify messages sent to sink connectors Makes it much easier to mix and match connectors Some of currently available transforms: InsertField ReplaceField MaskField ValueToKey ExtractField TimestampRouter RegexRouter SetSchemaMetaData Flatten TimestampConverter

25 Kafka Connect Many Connectors 60+ since first release (0.9+) 20+ from Confluent and Partners Certified Connectors Community Connectors Confluent supported Connectors Source:

26 Demo (III) Truck-1 Truck-2 truck/nn/ position mqtt to kafka truck_ position console consumer Truck :39: Wichita to Little Rock Route2 Normal

$138:8083/connectors" \ -H "Content-Type: application/json" \ -d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.mqttsourceconnector", "connect.$

27 Demo (III) Create MQTT Connect through REST API #!/bin/bash curl -X "POST" " \ -H "Content-Type: application/json" \ -d $'{ "name": "mqtt-source", "config": { "connector.class": "com.datamountaineer.streamreactor.connect.mqtt.source.mqttsourceconnector", "connect.mqtt.connection.timeout": "1000", "tasks.max": "1", "connect.mqtt.kcql": "INSERT INTO truck_position SELECT * FROM truck/+/position", "name": "MqttSourceConnector", "connect.mqtt.service.quality": "0", "connect.mqtt.client.id": "tm-mqtt-connect-01", "connect.mqtt.converter.throw.on.error": "true", "connect.mqtt.hosts": "tcp://mosquitto:1883 } }'

28 Demo (III) Call REST API and Kafka Console Consumer

29 Demo (III) Truck-1 what about some analytics? Truck-2 truck/nn/ position mqtt to kafka truck_ position console consumer Truck :39: Wichita to Little Rock Route2 Normal

30 Kafka Streams

Kafka Streams - Overview Designed as a simple and lightweight library in Apache Kafka no external dependencies on systems other than Apache Kafka Part of open source Apache Kafka, introduced in 0.

31 Kafka Streams - Overview Designed as a simple and lightweight library in Apache Kafka no external dependencies on systems other than Apache Kafka Part of open source Apache Kafka, introduced in Leverages Kafka as its internal messaging layer Supports fault-tolerant local state Event-at-a-time processing (not microbatch) with millisecond latency Windowing with out-of-order data using a Google DataFlow-like model

32 Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream( in-1"); KStream<Integer, String> stream2= builder.stream( in-2"); KStream<Integer, String> joined = stream1.leftjoin(stream2, ); KTable<> aggregated = joined.groupby( ).count( store ); aggregated.to( out-1 ); 1 2 lj a t State

33 Kafka Stream DSL and Processor Topology KStream<Integer, String> stream1 = builder.stream( in-1"); KStream<Integer, String> stream2= builder.stream( in-2"); KStream<Integer, String> joined = stream1.leftjoin(stream2, ); KTable<> aggregated = joined.groupby( ).count( store ); aggregated.to( out-1 ); 1 2 lj a t State

34 Processor Topology Kafka Streams Cluster Kafka Cluster input input-2 lj a store (changelog) t State output

35 Processor Topology Kafka Streams 1 Kafka Cluster input-1 Partition 0 Partition 1 Partition 2 Partition 3 Kafka Streams 2 input-2 Partition 0 Partition 1 Partition 2 Partition 3

36 Processor Topology Kafka Streams 1 Kafka Streams 2 Kafka Cluster input-1 Partition 0 Partition 1 Partition 2 Partition 3 input-2 Partition 0 Kafka Streams 3 Kafka Streams 4 Partition 1 Partition 2 Partition 3

37 KSQL

KSQL: a Streaming SQL Engine for Apache Kafka Enables stream processing with zero coding required The simples way to process streams of data in real-time Powered by Kafka and Kafka Streams: scalable,

38 KSQL: a Streaming SQL Engine for Apache Kafka Enables stream processing with zero coding required The simples way to process streams of data in real-time Powered by Kafka and Kafka Streams: scalable, distributed, mature All you need is Kafka no complex deployments available as Developer preview! STREAM and TABLE as first-class citizens STREAM = data in motion TABLE = collected state of a stream join STREAM and TABLE

39 KSQL Deployment Models Standalone Mode Cluster Mode Source: Confluent

40 Demo (IV) Truck-1 Truck-2 truck/nn/ position mqtt to kafka truck_ position_s detect_danger ous_driving dangerous_ driving console consumer Truck :39: Wichita to Little Rock Route2 Normal

$Demo (IV) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ = = / // / \ = = ' / ( = = < \ \ = =.$

41 Demo (IV) - Start Kafka KSQL $ docker-compose exec ksql-cli ksql-cli local --bootstrap-server broker-1:9092 ====================================== = _ = = / // / \ = = ' / ( = = < \ \ = =. \ ) = = _ \_\ / \ \_\ = = = = Streaming SQL Engine for Kafka = Copyright 2017 Confluent Inc. CLI v0.1, Server v0.1 located at Having trouble? Type 'help' (case-insensitive) for a rundown of how things work! ksql>

42 Demo (IV) - Create Stream ksql> CREATE STREAM truck_position_s \ (ts VARCHAR, \ truckid VARCHAR, \ driverid BIGINT, \ routeid BIGINT, \ routename VARCHAR, \ eventtype VARCHAR, \ latitude DOUBLE, \ longitude DOUBLE, \ correlationid VARCHAR) \ WITH (kafka_topic='truck_position', \ value_format='delimited'); Message Stream created

43 Demo (IV) - Create Stream ksql> CREATE STREAM truck_position_s \ (ts VARCHAR, \ truckid VARCHAR, \ driverid BIGINT, \ routeid BIGINT, \ routename VARCHAR, \ eventtype VARCHAR, \ latitude DOUBLE, \ longitude DOUBLE, \ correlationid VARCHAR) \ WITH (kafka_topic='truck_position', \ value_format='delimited'); Message Stream created

44 Demo (IV) - Create Stream ksql> describe truck_position_s; Field Type ROWTIME BIGINT ROWKEY VARCHAR(STRING) TS VARCHAR(STRING) TRUCKID VARCHAR(STRING) DRIVERID BIGINT ROUTEID BIGINT ROUTENAME VARCHAR(STRING) EVENTTYPE VARCHAR(STRING) LATITUDE DOUBLE LONGITUDE DOUBLE CORRELATIONID VARCHAR(STRING)

Demo (IV) - Create Stream ksql> SELECT * FROM truck_position_s; 1506922133306 "truck/13/position0!2017-10-02t07:28:53 31 13 371182829 Memphis to Little Rock Normal 41.76-89.

45 Demo (IV) - Create Stream ksql> SELECT * FROM truck_position_s; "truck/13/position0! t07:28: Memphis to Little Rock Normal "truck/16/position0! t07:28: Joplin to Kansas City Route 2 Normal "truck/30/position0! t07:28: Des Moines to Chicago Route 2 Normal "truck/23/position0! t07:28: Peoria to Ceder Rapids Route 2 Normal "truck/12/position0! t07:28: Saint Louis to Memphis Normal "truck/14/position0! t07:28: Springfield to KC Via Columbia Normal

Demo (IV) - Create Stream ksql> SELECT * FROM truck_position_s WHERE eventtype!= 'Normal'; 1506922264016 "truck/11/position0!

46 Demo (IV) - Create Stream ksql> SELECT * FROM truck_position_s WHERE eventtype!= 'Normal'; "truck/11/position0! t07:31: Saint Louis to Tulsa Route2 Lane Departure "truck/11/position0! t07:31: Saint Louis to Tulsa Route2 Unsafe tail distance "truck/10/position0! t07:31: Joplin to Kansas City Unsafe following distance "truck/11/position0! t07:31: Saint Louis to Tulsa Route2 Unsafe following distance

$Demo (IV) - Create Stream ksql> CREATE STREAM dangerous_driving_s \ WITH (kafka_topic= dangerous_driving_s', \ value_format='json') \ AS SELECT * FROM truck_position_s \ WHERE eventtype!$

47 Demo (IV) - Create Stream ksql> CREATE STREAM dangerous_driving_s \ WITH (kafka_topic= dangerous_driving_s', \ value_format='json') \ AS SELECT * FROM truck_position_s \ WHERE eventtype!= 'Normal'; Message Stream created and running ksql> select * from dangerous_driving_s; "truck/11/position0! t07:40: Des Moines to Chicago Route 2 Overspeed "truck/11/position0! t07:41: Des Moines to Chicago Route 2 Overspeed

$trucking_ driver {"id":10,"name":"george Vetticaden","last_update":15$ 06923052012} Truck-1 join_truck_ position_driver truck_position _driver

14:39:56.605 98 27 803014426 Wichita to Little Rock Route2 Normal 38.65 90.

48 Demo (V) Truck Driver 27, Mark Lochbihler, :19:00 jdbc-source trucking_ driver {"id":10,"name":"george Vetticaden","last_update": } Truck-1 join_truck_ position_driver truck_position _driver Truck-2 truck/nn/ position mqttsource truck_ position Truck :39: Wichita to Little Rock Route2 Normal detect_danger ous_driving dangerous_ driving console consumer

$138:8083/connectors" \ -H "Content-Type: application/json" \ -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?$

49 Demo (V) Create JDBC Connect through REST API #!/bin/bash curl -X "POST" " \ -H "Content-Type: application/json" \ -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"trucking_", "key.converter":"org.apache.kafka.connect.json.jsonconverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.jsonconverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createkey,extractint", "transforms.createkey.type":"org.apache.kafka.connect.transforms.valuetokey", "transforms.createkey.fields":"id", "transforms.extractint.type":"org.apache.kafka.connect.transforms.extractfield$key", "transforms.extractint.field":"id" } }'

50 Demo (V) Create JDBC Connect through REST API

$Demo (V) - Create Table with Driver State ksql> CREATE TABLE driver_t \ (id BIGINT, \ name VARCHAR)$

51 Demo (V) - Create Table with Driver State ksql> CREATE TABLE driver_t \ (id BIGINT, \ name VARCHAR) \ WITH (kafka_topic= trucking_driver', \ value_format='json'); Message Table created

$Demo (V) - Create Table with Driver State ksql> CREATE STREAM truck_position_and_driver_s \ WITH (kafka_topic='truck_position_and_driver_s', \ value_format='json') \ AS SELECT driverid, name,$

52 Demo (V) - Create Table with Driver State ksql> CREATE STREAM truck_position_and_driver_s \ WITH (kafka_topic='truck_position_and_driver_s', \ value_format='json') \ AS SELECT driverid, name, truckid, routeid,routename, eventtype \ FROM truck_position_s \ LEFT JOIN driver_t \ ON truck_position_s.driverid = driver_t.id; Message Stream created and running ksql> select * from truck_position_and_driver_s; "truck/11/position0! t07:40: Des Moines to Chicago Route 2 Overspeed "truck/11/position0! t07:41: Des Moines to Chicago Route 2 Overspeed

$Demo (V) - Create Table with Driver State ksql> CREATE STREAM truck_position_and_driver_s \ WITH$ $routeid,routename, eventtype \ FROM truck_position_s \ LEFT JOIN driver_t \ ON truck_position_s.$ id; Message ---------------------------- Stream created and running ksql> select * from

id; Message ---------------------------- Stream created and running ksql> select * from

53 Demo (V) - Create Table with Driver State ksql> CREATE STREAM truck_position_and_driver_s \ WITH (kafka_topic='truck_position_and_driver_s', \ value_format='json') \ AS SELECT driverid, name, truckid, routeid,routename, eventtype \ FROM truck_position_s \ LEFT JOIN driver_t \ ON truck_position_s.driverid = driver_t.id; Message Stream created and running ksql> select * from truck_position_and_driver_s; Jamie Engesser Saint Louis to Memphis Normal Jamie Engesser Saint Louis to Memphis Normal Jamie Engesser Saint Louis to Memphis Overspeed

54 Kafka and "Big Data" / "Fast Data" Ecosystem

55 Kafka and the Big Data / Fast Data ecosystem Kafka integrates with many popular products / frameworks Apache Spark Streaming Apache Flink Apache Storm Apache Apex Apache NiFi StreamSets Oracle Stream Analytics Oracle Service Bus Oracle GoldenGate Oracle Event Hub Cloud Service Debezium CDC Additional Info:

56 Kafka in Enterprise Architecture

57 Traditional Big Data Architecture Enterprise Data Warehouse Billing & Ordering Hadoop Clusterd Hadoop Cluster Big Data Cluster SQL BI Tools CRM / Profile Marketing Campaigns File Import / SQL Import Distributed Filesystem Parallel Batch Processing NoSQL Search Search / Explore Machine Learning Graph Algorithms Natural Language Processing Online & Mobile Apps

58 Event Hub handle event stream data Enterprise Data Warehouse Billing & Ordering CRM / Profile Marketing Campaigns Hadoop Clusterd Hadoop Cluster Big Data Cluster SQL BI Tools Parallel Batch Processing Search Search / Explore Location Social Click stream Mobile Apps Weather Data Call Center Data Flow Event Event Event Hub Hub Hub Distributed Filesystem NoSQL Machine Learning Graph Algorithms Natural Language Processing Online & Mobile Apps Sensor Data

59 Event Hub taking Velocity into account Enterprise Data Warehouse Billing & Ordering File Import / SQL Import Hadoop Clusterd Hadoop Cluster Big Data Cluster CRM / Profile Marketing Campaigns Event Event Event Hub Hub Hub Distributed Filesystem Parallel Batch Processing Results Batch Analytics SQL Search BI Tools Search / Explore Location Mobile Apps Streaming Analytics Social Weather Data Stream Analytics NoSQL Online & Mobile Apps Click stream Call Center Sensor Data Reference / Models Dashboard

60 Event Hub Asynchronous Microservice Architecture Billing & Ordering File Import / SQL Import Hadoop Clusterd Hadoop Cluster Big Data Cluster Enterprise Data Warehouse CRM / Profile Marketing Campaigns Location Mobile Apps Event Event Event Hub Hub Hub Distributed Filesystem Container Parallel Batch Processing { } SQL Search BI Tools Search / Explore Social Click stream Weather Data Call Center Microservice RDBMS API NoSQL Online & Mobile Apps Sensor Data

61 Guido Schmutz Technology guidoschmutz.wordpress.com

Microservices with Kafka Ecosystem. Guido Schmutz

Microservices with Kafka Ecosystem Guido Schmutz @gschmutz doag2017 Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software