Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software
Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..) Conference speaker: OOW, RMOUG, dbtech Showcase, Collaborate, nloug NZOUG member Technical Writer and Editor Kafka enthusiast (new) Oracle ACE Old furniture at Dbvisit Professional not-knower of things 2
Dbvisit Software Real-time Oracle Database Streaming software solutions In the Cloud Hybrid On-Premises New Zealand-based, US office, Asia Sales office, EU office (Prague) Unique offering: disaster recovery solutions for Oracle Standard Edition Logical replication for moving data where ever and whenever you wish Flexible licensing, cost effective pricing models available Exceptional growth, 1300+ customers Peerless customer support
BEFORE: Many Ad Hoc Pipelines
Stream Data Platform with Kafka Distributed Fault Tolerant o o o Stream Processing Data Integration Message Store
Quick Recap: what is Kafka? An open-source publish-subscribe messaging implemented as a distributed commit log A scalable, fault tolerant, distributed system where messages are kept in topics that are partitioned and replicated across multiple nodes. Developed at LinkedIn ~2010 Confluent and the OS project (NB!)
Quick Recap: what is Kafka? Data is written to Kafka in the form of key-value pair messages (can have null) Each message belongs to a topic Messages as a continuous flow (stream) of events Producers (writers) decoupled from Consumers (readers) A delivery channel/platform (if you like) crossing systems (data Integration)
Kafka - a log writer/reader Partition 0 Partition 1 Partition 2 Old Organized by topics Sub-categorization by partitions (log files on disk) Replicated between nodes for redundancy New
Making use of Kafka For what? Messaging system Data streaming platform Data storage To do what? Messaging Website Activity Tracking Metrics Log Aggregation Stream Processing Event Sourcing Commit Log
Kafka - components data Schema Registry Zookeeper REST Proxy Kafka What about KSQL and Kafka Streams? Kafka Connect
Data Pipelines Bridging the Old World and the New... Indicative use cases: Real-time System Monitoring and Alerting (financial trading, fraud detection) Real-time Business Intelligence and Analytics Kleppmann: Update search indexes, invalidate caches, create snapshots, generate recommendations, copy data into another database
Kafka Connect - export/import tool Cluster-able Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline
Kafka Connect - export/import tool STANDALONE mode Key Differences: Topic Storage Rebalancing Interaction DISTRIBUTED mode Core Processes: Connectors Workers Tasks
Kafka Connect - serious power What about topic creation Offset management SMT (single message transformations) Override Kafka settings
Kafka Connect - export/import tool SOURCE CONNECTORS JDBC Couchbase Vertica Blockchain Files/Directories GitHub FTP Google PubSub MongoDB PostgreSQL Salesforce Twitter SINK CONNECTORS Cassandra Elasticsearch Google BigQuery Hbase HDFS JDBC Kudu MongoDB Postgres S3 SAP HANA Solr Vertica
Kafka Connect - export/import tool
Kafka Connect - export/import tool Filesource.properties NAME=local-file-source CONNECTOR.CLASS=FileStreamSource TASKS.MAX=1 FILE=/u01/app/oracle/diag/rdbms/xe/XE/trace/alert_XE.log TOPIC=alertlog_test Look ma, no code! - @RMOFF
Building Oracle CDC Data Pipelines Beyond the plumbing... Example use cases: caches, mviews, aggregates pre-computes, alerts Your Oracle data tells a story logon activity sales channel data
The New World of data Data centralization Real time delivery Integration Stream data processing New data end points/stores
INPUT (source connectors)
Kafka Connect setting up Finding your way around the directories Installing new connectors Connector JAR file (Java class) Properties file Properties files (JSON and properties) Running connect Make use of the Confluent CLI!
Example 1: SOURCE File Connector Write to a file with UTL_FILE package need to do the following as SYS (sysdba) CREATE OR REPLACE DIRECTORY MIKESDIR AS '/home/oracle/'; GRANT READ ON DIRECTORY MIKESDIR TO PUBLIC; grant execute on utl_file to system; DECLARE out_file UTL_FILE.FILE_TYPE; BEGIN out_file := UTL_FILE.FOPEN('MIKESDIR', 'hellotest.txt', 'a'); UTL_FILE.PUT_LINE(out_file, 'hello world (a message from Oracle)'); UTL_FILE.FCLOSE(out_file); END;
Example 1: SOURCE File Connector CONFIG FILE: GOTCHAs: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test
Example 1: SOURCE File Connector DEMO
Example 2: SOURCE File Connector Read from the Oracle database alert log: /u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log GOTCHAs... CONFIG FILE: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test
Example 2: SOURCE File Connector DEMO
Example 2: SOURCE File Connector SPECIAL BONUS TOPIC: SMT single message transformations
SPECIAL BONUS TOPIC: SMTs CONFIG FILE transforms=hoistfield,insertsource transforms.hoistfield.type=org.apache.kafka.connect.transforms.hoistfield$value transforms.hoistfield.org.apache.kafka.connect.transforms.hoistfield transforms.hoistfield.field=alertlog_msg transforms.insertsource.type=org.apache.kafka.connect.transforms.insertfield$value transforms.insertsource.static.field=smt_database transforms.insertsource.static.value=xe
SPECIAL BONUS TOPIC: SMTs JSON OUTPUT: Struct{alertlog_msg=Mon Feb 12 15:30:34 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 advanced to log sequence 1721 (LGWR switch),smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Mon Feb 12 15:30:34 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Archived Log entry 1709 added for thread 1 sequence 1720 ID 0xa0fa1263 dest 1:,SMT_DATABASE=XE} Struct{alertlog_msg=Mon Feb 12 15:30:53 2018,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 cannot allocate new log, sequence 1722,SMT_DATABASE=XE} Struct{alertlog_msg=Checkpoint not complete,smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Thread 1 advanced to log sequence 1722 (LGWR switch),smt_database=xe}
Example 3: SOURCE JDBC Connector Read from an Oracle database table MODE Incrementing column Timestamp column BATCH Table Whitelist
Example 3: SOURCE JDBC Connector CONFIG FILE: name=test-oracle-jdbc-autoincrement connector.class=io.confluent.connect.jdbc.jdbcsourceconnector tasks.max=1 connection.password = examplepassword connection.url = jdbc:oracle:thin:@example.oracle.server.com:1521/exampleservicename connection.user = exampleuser table.whitelist=users mode=incrementing incrementing.column.name=id topic.prefix=test-oracle-jdbc- GOTCHAs: Installing the JDBC drivers
Example 3: SOURCE JDBC Connector DEMO
Example 4: SOURCE CDC Connector Read Oracle database change data from the Oracle redo log Redolog scanner applications
Oracle Change Data delivered to Kafka metadata INSERT... into SCOTT.TEST9
Example 4: SOURCE CDC Connector CONFIG FILE: GOTCHAs: name=dbvisit-replicate connector.class=com.dbvisit.replicate.kafkaconnect.replicatesourceconnector tasks.max=4 topic.prefix=rep2- plog.location.uri=file:/home/oracle/rq-3595/mine plog.data.flush.size=100 topic.name.transaction.info=tx.meta connector.publish.cdc.format=changerow connector.publish.transaction.info=true connector.publish.keys=true connector.publish.no.schema.evolution=false connector.catalog.topic.name=replicate-info
Example 4: SOURCE JDBC Connector DEMO
OUTPUT (sink connectors)
Example 5: SINK File Connector CONFIG FILE: GOTCHAs: name=local-file-sink connector.class=filestreamsink tasks.max=1 file=/home/oracle/mike/test.sink.txt topics=alertlog_test
Example 5: SINK File Connector DEMO
Building Oracle CDC Data Pipelines Elasticsearch Storage Search Analytics Kibana Visualization Reports Dashboards
Example 6: SINK File Connector CONFIG FILE: name=elasticsearch-sink connector.class=io.confluent.connect.elasticsearch.elasticsearchsinkconnector tasks.max=1 topics=rep-soe.customers connection.url=http://localhost:9200 type.name=kafka-connect key.ignore=true topic.index.map=rep-soe.customers:rep-soe.customers topic.key.ignore=rep-soe.customers GOTCHAs: Key values
Example 6: SINK File Connector DEMO Elastic queries Kibana visualizations Slack alerts
Get started with Kafka Connect Kafka and Kafka Connect www.confluent.io Download the Confluent Platform (bundled connectors) Check out the available community connectors Try running it in Docker
Thank you @dbvisitmike mike.donovan@dbvisit.com