Kafka Connect the Dots

Similar documents
Through O Shaped Glasses

Introducing Kafka Connect. Large-scale streaming data import/export for

Dbvisit Software. The 3 fundamental principles of Oracle replication. Mike Donovan CTO Dbvisit Software Dbvisit Software dbvisit.

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH

Data pipelines with PostgreSQL & Kafka

Lenses 2.1 Enterprise Features PRODUCT DATA SHEET

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Fluentd + MongoDB + Spark = Awesome Sauce

Tungsten Replicator for Kafka, Elasticsearch, Cassandra

Streaming Integration and Intelligence For Automating Time Sensitive Events

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Dbvisit Replicate Connector for Kafka documentation

Event Streams using Apache Kafka

Deploying SQL Stream Processing in Kubernetes with Ease

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

WHITEPAPER. MemSQL Enterprise Feature List

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Working with Database Connections. Version: 18.1

Developing Microsoft Azure Solutions (70-532) Syllabus

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Data Acquisition. The reference Big Data stack

Ingest. David Pilato, Developer Evangelist Paris, 31 Janvier 2017

Ingest. Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017

Oracle GoldenGate for Big Data

Down the event-driven road: Experiences of integrating streaming into analytic data platforms

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Kafka Connect FileSystem Connector Documentation

Developing Microsoft Azure Solutions (70-532) Syllabus

FROM ZERO TO PORTABILITY

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Big Data Architect.

Flash Storage Complementing a Data Lake for Real-Time Insight

Streaming Log Analytics with Kafka

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Towards a Real- time Processing Pipeline: Running Apache Flink on AWS

Evolution of an Apache Spark Architecture for Processing Game Data

Importing and Exporting Data Between Hadoop and MySQL

Search Engines and Time Series Databases

Apache BookKeeper. A High Performance and Low Latency Storage Service

IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Building Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Data Acquisition. The reference Big Data stack

Developing Microsoft Azure Solutions (70-532) Syllabus

Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch

Building Durable Real-time Data Pipeline

Microsoft Big Data and Hadoop

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Accelerate Your Data Pipeline for Data Lake, Streaming and Cloud Architectures

Exam C IBM Cloud Platform Application Development v2 Sample Test

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018

Installing Data Sync Version 2.3

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

Spotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017

The age of Big Data Big Data for Oracle Database Professionals

Big Data Applications with Spring XD

IBM Data Replication for Big Data

Turning Relational Database Tables into Spark Data Sources

StorageTapper. Real-time MySQL Change Data Uber. Ovais Tariq, Shriniket Kale & Yevgeniy Firsov. October 03, 2017

Installing HDF Services on an Existing HDP Cluster

Introduction to Apache Beam

An Information Asset Hub. How to Effectively Share Your Data

Apache Storm. Hortonworks Inc Page 1

70-532: Developing Microsoft Azure Solutions

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Kafka pours and Spark resolves! Alexey Zinovyev, Java/BigData Trainer in EPAM

The Future of Real-Time in Spark

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Streaming OLAP Applications

Google File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo

Apache Flink. Alessandro Margara

Distributed systems for stream processing

Shen PingCAP 2017

Distributed File Systems II

Hacking PostgreSQL Internals to Solve Data Access Problems

PUBLIC SAP Vora Sizing Guide

Realtime visitor analysis with Couchbase and Elasticsearch

A day in the life of a log message Kyle Liberti, Josef

Working with Database Connections. Version: 7.3

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Intra-cluster Replication for Apache Kafka. Jun Rao

Tools for Social Networking Infrastructures

microsoft

Intellicus Enterprise Reporting and BI Platform

Data Lake Based Systems that Work

Introduction to NoSQL Databases

Migrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring

Spread the Database Love with Heterogeneous Replication. MC Brown, VP, Products

Installing and configuring Apache Kafka

Griddable.io architecture

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

Course AZ-100T01-A: Manage Subscriptions and Resources

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Qualys Cloud Platform

Transcription:

Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software

Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..) Conference speaker: OOW, RMOUG, dbtech Showcase, Collaborate, nloug NZOUG member Technical Writer and Editor Kafka enthusiast (new) Oracle ACE Old furniture at Dbvisit Professional not-knower of things 2

Dbvisit Software Real-time Oracle Database Streaming software solutions In the Cloud Hybrid On-Premises New Zealand-based, US office, Asia Sales office, EU office (Prague) Unique offering: disaster recovery solutions for Oracle Standard Edition Logical replication for moving data where ever and whenever you wish Flexible licensing, cost effective pricing models available Exceptional growth, 1300+ customers Peerless customer support

BEFORE: Many Ad Hoc Pipelines

Stream Data Platform with Kafka Distributed Fault Tolerant o o o Stream Processing Data Integration Message Store

Quick Recap: what is Kafka? An open-source publish-subscribe messaging implemented as a distributed commit log A scalable, fault tolerant, distributed system where messages are kept in topics that are partitioned and replicated across multiple nodes. Developed at LinkedIn ~2010 Confluent and the OS project (NB!)

Quick Recap: what is Kafka? Data is written to Kafka in the form of key-value pair messages (can have null) Each message belongs to a topic Messages as a continuous flow (stream) of events Producers (writers) decoupled from Consumers (readers) A delivery channel/platform (if you like) crossing systems (data Integration)

Kafka - a log writer/reader Partition 0 Partition 1 Partition 2 Old Organized by topics Sub-categorization by partitions (log files on disk) Replicated between nodes for redundancy New

Making use of Kafka For what? Messaging system Data streaming platform Data storage To do what? Messaging Website Activity Tracking Metrics Log Aggregation Stream Processing Event Sourcing Commit Log

Kafka - components data Schema Registry Zookeeper REST Proxy Kafka What about KSQL and Kafka Streams? Kafka Connect

Data Pipelines Bridging the Old World and the New... Indicative use cases: Real-time System Monitoring and Alerting (financial trading, fraud detection) Real-time Business Intelligence and Analytics Kleppmann: Update search indexes, invalidate caches, create snapshots, generate recommendations, copy data into another database

Kafka Connect - export/import tool Cluster-able Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline

Kafka Connect - export/import tool STANDALONE mode Key Differences: Topic Storage Rebalancing Interaction DISTRIBUTED mode Core Processes: Connectors Workers Tasks

Kafka Connect - serious power What about topic creation Offset management SMT (single message transformations) Override Kafka settings

Kafka Connect - export/import tool SOURCE CONNECTORS JDBC Couchbase Vertica Blockchain Files/Directories GitHub FTP Google PubSub MongoDB PostgreSQL Salesforce Twitter SINK CONNECTORS Cassandra Elasticsearch Google BigQuery Hbase HDFS JDBC Kudu MongoDB Postgres S3 SAP HANA Solr Vertica

Kafka Connect - export/import tool

Kafka Connect - export/import tool Filesource.properties NAME=local-file-source CONNECTOR.CLASS=FileStreamSource TASKS.MAX=1 FILE=/u01/app/oracle/diag/rdbms/xe/XE/trace/alert_XE.log TOPIC=alertlog_test Look ma, no code! - @RMOFF

Building Oracle CDC Data Pipelines Beyond the plumbing... Example use cases: caches, mviews, aggregates pre-computes, alerts Your Oracle data tells a story logon activity sales channel data

The New World of data Data centralization Real time delivery Integration Stream data processing New data end points/stores

INPUT (source connectors)

Kafka Connect setting up Finding your way around the directories Installing new connectors Connector JAR file (Java class) Properties file Properties files (JSON and properties) Running connect Make use of the Confluent CLI!

Example 1: SOURCE File Connector Write to a file with UTL_FILE package need to do the following as SYS (sysdba) CREATE OR REPLACE DIRECTORY MIKESDIR AS '/home/oracle/'; GRANT READ ON DIRECTORY MIKESDIR TO PUBLIC; grant execute on utl_file to system; DECLARE out_file UTL_FILE.FILE_TYPE; BEGIN out_file := UTL_FILE.FOPEN('MIKESDIR', 'hellotest.txt', 'a'); UTL_FILE.PUT_LINE(out_file, 'hello world (a message from Oracle)'); UTL_FILE.FCLOSE(out_file); END;

Example 1: SOURCE File Connector CONFIG FILE: GOTCHAs: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test

Example 1: SOURCE File Connector DEMO

Example 2: SOURCE File Connector Read from the Oracle database alert log: /u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log GOTCHAs... CONFIG FILE: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test

Example 2: SOURCE File Connector DEMO

Example 2: SOURCE File Connector SPECIAL BONUS TOPIC: SMT single message transformations

SPECIAL BONUS TOPIC: SMTs CONFIG FILE transforms=hoistfield,insertsource transforms.hoistfield.type=org.apache.kafka.connect.transforms.hoistfield$value transforms.hoistfield.org.apache.kafka.connect.transforms.hoistfield transforms.hoistfield.field=alertlog_msg transforms.insertsource.type=org.apache.kafka.connect.transforms.insertfield$value transforms.insertsource.static.field=smt_database transforms.insertsource.static.value=xe

SPECIAL BONUS TOPIC: SMTs JSON OUTPUT: Struct{alertlog_msg=Mon Feb 12 15:30:34 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 advanced to log sequence 1721 (LGWR switch),smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Mon Feb 12 15:30:34 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Archived Log entry 1709 added for thread 1 sequence 1720 ID 0xa0fa1263 dest 1:,SMT_DATABASE=XE} Struct{alertlog_msg=Mon Feb 12 15:30:53 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 cannot allocate new log, sequence 1722,SMT_DATABASE=XE} Struct{alertlog_msg=Checkpoint not complete,smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Thread 1 advanced to log sequence 1722 (LGWR switch),smt_database=xe}

Example 3: SOURCE JDBC Connector Read from an Oracle database table MODE Incrementing column Timestamp column BATCH Table Whitelist

Example 3: SOURCE JDBC Connector CONFIG FILE: name=test-oracle-jdbc-autoincrement connector.class=io.confluent.connect.jdbc.jdbcsourceconnector tasks.max=1 connection.password = examplepassword connection.url = jdbc:oracle:thin:@example.oracle.server.com:1521/exampleservicename connection.user = exampleuser table.whitelist=users mode=incrementing incrementing.column.name=id topic.prefix=test-oracle-jdbc- GOTCHAs: Installing the JDBC drivers

Example 3: SOURCE JDBC Connector DEMO

Example 4: SOURCE CDC Connector Read Oracle database change data from the Oracle redo log Redolog scanner applications

Oracle Change Data delivered to Kafka metadata INSERT... into SCOTT.TEST9

Example 4: SOURCE CDC Connector CONFIG FILE: GOTCHAs: name=dbvisit-replicate connector.class=com.dbvisit.replicate.kafkaconnect.replicatesourceconnector tasks.max=4 topic.prefix=rep2- plog.location.uri=file:/home/oracle/rq-3595/mine plog.data.flush.size=100 topic.name.transaction.info=tx.meta connector.publish.cdc.format=changerow connector.publish.transaction.info=true connector.publish.keys=true connector.publish.no.schema.evolution=false connector.catalog.topic.name=replicate-info

Example 4: SOURCE JDBC Connector DEMO

OUTPUT (sink connectors)

Example 5: SINK File Connector CONFIG FILE: GOTCHAs: name=local-file-sink connector.class=filestreamsink tasks.max=1 file=/home/oracle/mike/test.sink.txt topics=alertlog_test

Example 5: SINK File Connector DEMO

Building Oracle CDC Data Pipelines Elasticsearch Storage Search Analytics Kibana Visualization Reports Dashboards

Example 6: SINK File Connector CONFIG FILE: name=elasticsearch-sink connector.class=io.confluent.connect.elasticsearch.elasticsearchsinkconnector tasks.max=1 topics=rep-soe.customers connection.url=http://localhost:9200 type.name=kafka-connect key.ignore=true topic.index.map=rep-soe.customers:rep-soe.customers topic.key.ignore=rep-soe.customers GOTCHAs: Key values

Example 6: SINK File Connector DEMO Elastic queries Kibana visualizations Slack alerts

Get started with Kafka Connect Kafka and Kafka Connect www.confluent.io Download the Confluent Platform (bundled connectors) Check out the available community connectors Try running it in Docker

Thank you @dbvisitmike mike.donovan@dbvisit.com