Kafka Connect the Dots
|
|
- Olivia Garrison
- 5 years ago
- Views:
Transcription
1 Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software
2 Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..) Conference speaker: OOW, RMOUG, dbtech Showcase, Collaborate, nloug NZOUG member Technical Writer and Editor Kafka enthusiast (new) Oracle ACE Old furniture at Dbvisit Professional not-knower of things 2
3 Dbvisit Software Real-time Oracle Database Streaming software solutions In the Cloud Hybrid On-Premises New Zealand-based, US office, Asia Sales office, EU office (Prague) Unique offering: disaster recovery solutions for Oracle Standard Edition Logical replication for moving data where ever and whenever you wish Flexible licensing, cost effective pricing models available Exceptional growth, customers Peerless customer support
4 BEFORE: Many Ad Hoc Pipelines
5 Stream Data Platform with Kafka Distributed Fault Tolerant o o o Stream Processing Data Integration Message Store
6 Quick Recap: what is Kafka? An open-source publish-subscribe messaging implemented as a distributed commit log A scalable, fault tolerant, distributed system where messages are kept in topics that are partitioned and replicated across multiple nodes. Developed at LinkedIn ~2010 Confluent and the OS project (NB!)
7 Quick Recap: what is Kafka? Data is written to Kafka in the form of key-value pair messages (can have null) Each message belongs to a topic Messages as a continuous flow (stream) of events Producers (writers) decoupled from Consumers (readers) A delivery channel/platform (if you like) crossing systems (data Integration)
8 Kafka - a log writer/reader Partition 0 Partition 1 Partition 2 Old Organized by topics Sub-categorization by partitions (log files on disk) Replicated between nodes for redundancy New
9 Making use of Kafka For what? Messaging system Data streaming platform Data storage To do what? Messaging Website Activity Tracking Metrics Log Aggregation Stream Processing Event Sourcing Commit Log
10 Kafka - components data Schema Registry Zookeeper REST Proxy Kafka What about KSQL and Kafka Streams? Kafka Connect
11 Data Pipelines Bridging the Old World and the New... Indicative use cases: Real-time System Monitoring and Alerting (financial trading, fraud detection) Real-time Business Intelligence and Analytics Kleppmann: Update search indexes, invalidate caches, create snapshots, generate recommendations, copy data into another database
12 Kafka Connect - export/import tool Cluster-able Connect is a tool for scalably and reliably streaming data between Apache Kafka and other systems. It makes it simple to quickly define connectors that move large collections of data into and out of Kafka. Kafka Connect can ingest entire databases or collect metrics from all your application servers into Kafka topics, making the data available for stream processing with low latency. An export job can deliver data from Kafka topics into secondary storage and query systems or into batch systems for offline
13 Kafka Connect - export/import tool STANDALONE mode Key Differences: Topic Storage Rebalancing Interaction DISTRIBUTED mode Core Processes: Connectors Workers Tasks
14 Kafka Connect - serious power What about topic creation Offset management SMT (single message transformations) Override Kafka settings
15 Kafka Connect - export/import tool SOURCE CONNECTORS JDBC Couchbase Vertica Blockchain Files/Directories GitHub FTP Google PubSub MongoDB PostgreSQL Salesforce Twitter SINK CONNECTORS Cassandra Elasticsearch Google BigQuery Hbase HDFS JDBC Kudu MongoDB Postgres S3 SAP HANA Solr Vertica
16 Kafka Connect - export/import tool
17 Kafka Connect - export/import tool Filesource.properties NAME=local-file-source CONNECTOR.CLASS=FileStreamSource TASKS.MAX=1 FILE=/u01/app/oracle/diag/rdbms/xe/XE/trace/alert_XE.log TOPIC=alertlog_test Look ma, no code!
18 Building Oracle CDC Data Pipelines Beyond the plumbing... Example use cases: caches, mviews, aggregates pre-computes, alerts Your Oracle data tells a story logon activity sales channel data
19 The New World of data Data centralization Real time delivery Integration Stream data processing New data end points/stores
20 INPUT (source connectors)
21 Kafka Connect setting up Finding your way around the directories Installing new connectors Connector JAR file (Java class) Properties file Properties files (JSON and properties) Running connect Make use of the Confluent CLI!
22 Example 1: SOURCE File Connector Write to a file with UTL_FILE package need to do the following as SYS (sysdba) CREATE OR REPLACE DIRECTORY MIKESDIR AS '/home/oracle/'; GRANT READ ON DIRECTORY MIKESDIR TO PUBLIC; grant execute on utl_file to system; DECLARE out_file UTL_FILE.FILE_TYPE; BEGIN out_file := UTL_FILE.FOPEN('MIKESDIR', 'hellotest.txt', 'a'); UTL_FILE.PUT_LINE(out_file, 'hello world (a message from Oracle)'); UTL_FILE.FCLOSE(out_file); END;
23 Example 1: SOURCE File Connector CONFIG FILE: GOTCHAs: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test
24 Example 1: SOURCE File Connector DEMO
25 Example 2: SOURCE File Connector Read from the Oracle database alert log: /u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log GOTCHAs... CONFIG FILE: name=local-file-source connector.class=filestreamsource tasks.max=1 file=/u01/app/oracle/diag/rdbms/xe/xe/trace/alert_xe.log topic=alertlog_test
26 Example 2: SOURCE File Connector DEMO
27 Example 2: SOURCE File Connector SPECIAL BONUS TOPIC: SMT single message transformations
28 SPECIAL BONUS TOPIC: SMTs CONFIG FILE transforms=hoistfield,insertsource transforms.hoistfield.type=org.apache.kafka.connect.transforms.hoistfield$value transforms.hoistfield.org.apache.kafka.connect.transforms.hoistfield transforms.hoistfield.field=alertlog_msg transforms.insertsource.type=org.apache.kafka.connect.transforms.insertfield$value transforms.insertsource.static.field=smt_database transforms.insertsource.static.value=xe
29 SPECIAL BONUS TOPIC: SMTs JSON OUTPUT: Struct{alertlog_msg=Mon Feb 12 15:30: ,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 advanced to log sequence 1721 (LGWR switch),smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Mon Feb 12 15:30: ,SMT_DATABASE=XE} Struct{alertlog_msg=Archived Log entry 1709 added for thread 1 sequence 1720 ID 0xa0fa1263 dest 1:,SMT_DATABASE=XE} Struct{alertlog_msg=Mon Feb 12 15:30: ,SMT_DATABASE=XE} Struct{alertlog_msg=Thread 1 cannot allocate new log, sequence 1722,SMT_DATABASE=XE} Struct{alertlog_msg=Checkpoint not complete,smt_database=xe} Struct{alertlog_msg= Current log# 1 seq# 1721 mem# 0: /u01/app/oracle/fast_recovery_area/xe/onlinelog/o1_mf_1_8x1y15xj_.log,smt_database=xe} Struct{alertlog_msg=Thread 1 advanced to log sequence 1722 (LGWR switch),smt_database=xe}
30 Example 3: SOURCE JDBC Connector Read from an Oracle database table MODE Incrementing column Timestamp column BATCH Table Whitelist
31 Example 3: SOURCE JDBC Connector CONFIG FILE: name=test-oracle-jdbc-autoincrement connector.class=io.confluent.connect.jdbc.jdbcsourceconnector tasks.max=1 connection.password = examplepassword connection.url = jdbc:oracle:thin:@example.oracle.server.com:1521/exampleservicename connection.user = exampleuser table.whitelist=users mode=incrementing incrementing.column.name=id topic.prefix=test-oracle-jdbc- GOTCHAs: Installing the JDBC drivers
32 Example 3: SOURCE JDBC Connector DEMO
33 Example 4: SOURCE CDC Connector Read Oracle database change data from the Oracle redo log Redolog scanner applications
34 Oracle Change Data delivered to Kafka metadata INSERT... into SCOTT.TEST9
35 Example 4: SOURCE CDC Connector CONFIG FILE: GOTCHAs: name=dbvisit-replicate connector.class=com.dbvisit.replicate.kafkaconnect.replicatesourceconnector tasks.max=4 topic.prefix=rep2- plog.location.uri=file:/home/oracle/rq-3595/mine plog.data.flush.size=100 topic.name.transaction.info=tx.meta connector.publish.cdc.format=changerow connector.publish.transaction.info=true connector.publish.keys=true connector.publish.no.schema.evolution=false connector.catalog.topic.name=replicate-info
36 Example 4: SOURCE JDBC Connector DEMO
37 OUTPUT (sink connectors)
38 Example 5: SINK File Connector CONFIG FILE: GOTCHAs: name=local-file-sink connector.class=filestreamsink tasks.max=1 file=/home/oracle/mike/test.sink.txt topics=alertlog_test
39 Example 5: SINK File Connector DEMO
40 Building Oracle CDC Data Pipelines Elasticsearch Storage Search Analytics Kibana Visualization Reports Dashboards
41 Example 6: SINK File Connector CONFIG FILE: name=elasticsearch-sink connector.class=io.confluent.connect.elasticsearch.elasticsearchsinkconnector tasks.max=1 topics=rep-soe.customers connection.url= type.name=kafka-connect key.ignore=true topic.index.map=rep-soe.customers:rep-soe.customers topic.key.ignore=rep-soe.customers GOTCHAs: Key values
42 Example 6: SINK File Connector DEMO Elastic queries Kibana visualizations Slack alerts
43 Get started with Kafka Connect Kafka and Kafka Connect Download the Confluent Platform (bundled connectors) Check out the available community connectors Try running it in Docker
44 Thank
Through O Shaped Glasses
Through O Shaped Glasses Introducing Kafka to the Oracle DBA Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..) Conference speaker:
More informationIntroducing Kafka Connect. Large-scale streaming data import/export for
Introducing Kafka Connect Large-scale streaming data import/export for Kafka @tlberglund My Secret Agenda 1. Review of Kafka 2. Why do we need Connect? 3. How does Connect work? 4. Tell me about these
More informationDbvisit Software. The 3 fundamental principles of Oracle replication. Mike Donovan CTO Dbvisit Software Dbvisit Software dbvisit.
Dbvisit Software The 3 fundamental principles of Oracle replication Mike Donovan CTO Dbvisit Software 2017 Dbvisit Software dbvisit.com Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform
More informationLet the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH
Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast
More informationData pipelines with PostgreSQL & Kafka
Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL
More informationLenses 2.1 Enterprise Features PRODUCT DATA SHEET
Lenses 2.1 Enterprise Features PRODUCT DATA SHEET 1 OVERVIEW DataOps is the art of progressing from data to value in seconds. For us, its all about making data operations as easy and fast as using the
More informationBig Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara
Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationTungsten Replicator for Kafka, Elasticsearch, Cassandra
Tungsten Replicator for Kafka, Elasticsearch, Cassandra Topics In todays session Replicator Basics Filtering and Glue Kafka and Options Elasticsearch and Options Cassandra Future Direction 2 Asynchronous
More informationStreaming Integration and Intelligence For Automating Time Sensitive Events
Streaming Integration and Intelligence For Automating Time Sensitive Events Ted Fish Director Sales, Midwest ted@striim.com 312-330-4929 Striim Executive Summary Delivering Data for Time Sensitive Processes
More informationIncrease Value from Big Data with Real-Time Data Integration and Streaming Analytics
Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time
More informationDbvisit Replicate Connector for Kafka documentation
Dbvisit Replicate Connector for Kafka documentation Release 2.9.00-SNAPSHOT Dbvisit Software Limited Aug 02, 2017 Contents 1 Dbvisit Replicate Connector for Kafka 3 1.1 Overview.................................................
More informationEvent Streams using Apache Kafka
Event Streams using Apache Kafka And how it relates to IBM MQ Andrew Schofield Chief Architect, Event Streams STSM, IBM Messaging, Hursley Park Event-driven systems deliver more engaging customer experiences
More informationDeploying SQL Stream Processing in Kubernetes with Ease
Deploying SQL Stream Processing in Kubernetes with Ease Andrew Stevenson CTO Landoop Big Data Fast Data Financial Markets andrew@landoop.com www.landoop.com Antonios Chalkiopoulos CEO Landoop Big Data
More informationThe Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou
The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component
More informationWe are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info
We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423
More informationWHITEPAPER. MemSQL Enterprise Feature List
WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure
More informationOverview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::
Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized
More informationWorking with Database Connections. Version: 18.1
Working with Database Connections Version: 18.1 Copyright 2018 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationMODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS
MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationIngest. David Pilato, Developer Evangelist Paris, 31 Janvier 2017
Ingest David Pilato, Developer Evangelist Paris, 31 Janvier 2017 Data Ingestion The process of collecting and importing data for immediate use in a datastore 2 ? Simple things should be simple. Shay Banon
More informationIngest. Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017
Ingest Aaron Mildenstein, Consulting Architect Tokyo Dec 14, 2017 Data Ingestion The process of collecting and importing data for immediate use 2 ? Simple things should be simple. Shay Banon Elastic{ON}
More informationOracle GoldenGate for Big Data
Oracle GoldenGate for Big Data The Oracle GoldenGate for Big Data 12c product streams transactional data into big data systems in real time, without impacting the performance of source systems. It streamlines
More informationDown the event-driven road: Experiences of integrating streaming into analytic data platforms
Down the event-driven road: Experiences of integrating streaming into analytic data platforms Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH Confluent Meetup Munich, 8.10.2018 Integrate
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationKafka Connect FileSystem Connector Documentation
Kafka Connect FileSystem Connector Documentation Release 0.1 Mario Molina Dec 25, 2017 Contents 1 Contents 3 1.1 Connector................................................ 3 1.2 Configuration Options..........................................
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationFROM ZERO TO PORTABILITY
FROM ZERO TO PORTABILITY? Maximilian Michels mxm@apache.org APACHE BEAM S JOURNEY TO CROSS-LANGUAGE DATA PROCESSING @stadtlegende maximilianmichels.com FOSDEM 2019 What is Beam? What does portability mean?
More informationTransformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira
More informationBig Data Architect.
Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationStreaming Log Analytics with Kafka
Streaming Log Analytics with Kafka Kresten Krab Thorup, Humio CTO Log Everything, Answer Anything, In Real-Time. Why this talk? Humio is a Log Analytics system Designed to run on-prem High volume, real
More informationEsper EQC. Horizontal Scale-Out for Complex Event Processing
Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA
More informationTowards a Real- time Processing Pipeline: Running Apache Flink on AWS
Towards a Real- time Processing Pipeline: Running Apache Flink on AWS Dr. Steffen Hausmann, Solutions Architect Michael Hanisch, Manager Solutions Architecture November 18 th, 2016 Stream Processing Challenges
More informationEvolution of an Apache Spark Architecture for Processing Game Data
Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead
More informationImporting and Exporting Data Between Hadoop and MySQL
Importing and Exporting Data Between Hadoop and MySQL + 1 About me Sarah Sproehnle Former MySQL instructor Joined Cloudera in March 2010 sarah@cloudera.com 2 What is Hadoop? An open-source framework for
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationApache BookKeeper. A High Performance and Low Latency Storage Service
Apache BookKeeper A High Performance and Low Latency Storage Service Hello! I am Sijie Guo - PMC Chair of Apache BookKeeper Co-creator of Apache DistributedLog Twitter Messaging/Pub-Sub Team Yahoo! R&D
More informationIoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow
1 IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow Kafka-Native End-to-End IoT Data Integration and Processing Kai Waehner - Technology Evangelist kontakt@kai-waehner.de - LinkedIn Twitter :
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationBuilding Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer
Building Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer October 26, 2018 Agenda Change Data Capture (CDC) Overview Configuring
More informationArchitectural challenges for building a low latency, scalable multi-tenant data warehouse
Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationDeveloping Microsoft Azure Solutions (70-532) Syllabus
Developing Microsoft Azure Solutions (70-532) Syllabus Cloud Computing Introduction What is Cloud Computing Cloud Characteristics Cloud Computing Service Models Deployment Models in Cloud Computing Advantages
More informationBuilding a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch
Nick Pentreath Nov / 14 / 16 Building a Scalable Recommender System with Apache Spark, Apache Kafka and Elasticsearch About @MLnick Principal Engineer, IBM Apache Spark PMC Focused on machine learning
More informationBuilding Durable Real-time Data Pipeline
Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services
More informationMicrosoft Big Data and Hadoop
Microsoft Big Data and Hadoop Lara Rubbelke @sqlgal Cindy Gross @sqlcindy 2 The world of data is changing The 4Vs of Big Data http://nosql.mypopescu.com/post/9621746531/a-definition-of-big-data 3 Common
More informationIntroduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent
Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage
More informationAccelerate Your Data Pipeline for Data Lake, Streaming and Cloud Architectures
WHITE PAPER : REPLICATE Accelerate Your Data Pipeline for Data Lake, Streaming and Cloud Architectures INTRODUCTION Analysis of a wide variety of data is becoming essential in nearly all industries to
More informationExam C IBM Cloud Platform Application Development v2 Sample Test
Exam C5050 384 IBM Cloud Platform Application Development v2 Sample Test 1. What is an advantage of using managed services in IBM Bluemix Platform as a Service (PaaS)? A. The Bluemix cloud determines the
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationAccelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017
Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017 About the Presentation Problems Existing Solutions Denis Magda
More informationUn'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018
Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 R E T H I N K I N G Stream Processing with Apache Kafka Kafka the Streaming Data Platform 1.0 Enterprise
More informationInstalling Data Sync Version 2.3
Oracle Cloud Data Sync Readme Release 2.3 DSRM-230 May 2017 Readme for Data Sync This Read Me describes changes, updates, and upgrade instructions for Data Sync Version 2.3. Topics: Installing Data Sync
More informationContainer 2.0. Container: check! But what about persistent data, big data or fast data?!
@unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein
More informationSpotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017
Spotfire Advanced Data Services Lunch & Learn Tuesday, 21 November 2017 CONFIDENTIALITY The following information is confidential information of TIBCO Software Inc. Use, duplication, transmission, or republication
More informationThe age of Big Data Big Data for Oracle Database Professionals
The age of Big Data Big Data for Oracle Database Professionals Oracle OpenWorld 2017 #OOW17 SessionID: SUN5698 Tom S. Reddy tom.reddy@datareddy.com About the Speaker COLLABORATE & OpenWorld Speaker IOUG
More informationBig Data Applications with Spring XD
Big Data Applications with Spring XD Thomas Darimont, Software Engineer, Pivotal Inc. @thomasdarimont Unless otherwise indicated, these slides are 2013-2015 Pivotal Software, Inc. and licensed under a
More informationIBM Data Replication for Big Data
IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source
More informationTurning Relational Database Tables into Spark Data Sources
Turning Relational Database Tables into Spark Data Sources Kuassi Mensah Jean de Lavarene Director Product Mgmt Director Development Server Technologies October 04, 2017 3 Safe Harbor Statement The following
More informationStorageTapper. Real-time MySQL Change Data Uber. Ovais Tariq, Shriniket Kale & Yevgeniy Firsov. October 03, 2017
StorageTapper Real-time MySQL Change Data Streaming @ Uber Ovais Tariq, Shriniket Kale & Yevgeniy Firsov October 03, 2017 Overview What we will cover today Background & Motivation High Level Features System
More informationInstalling HDF Services on an Existing HDP Cluster
3 Installing HDF Services on an Existing HDP Cluster Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Upgrade Ambari and HDP...3 Installing Databases...3 Installing MySQL... 3 Configuring
More informationIntroduction to Apache Beam
Introduction to Apache Beam Dan Halperin JB Onofré Google Beam podling PMC Talend Beam Champion & PMC Apache Member Apache Beam is a unified programming model designed to provide efficient and portable
More informationAn Information Asset Hub. How to Effectively Share Your Data
An Information Asset Hub How to Effectively Share Your Data Hello! I am Jack Kennedy Data Architect @ CNO Enterprise Data Management Team Jack.Kennedy@CNOinc.com 1 4 Data Functions Your Data Warehouse
More informationApache Storm. Hortonworks Inc Page 1
Apache Storm Page 1 What is Storm? Real time stream processing framework Scalable Up to 1 million tuples per second per node Fault Tolerant Tasks reassigned on failure Guaranteed Processing At least once
More information70-532: Developing Microsoft Azure Solutions
70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.
More informationBring Context To Your Machine Data With Hadoop, RDBMS & Splunk
Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC Forward-Looking Statements During the course of this presentation, we may
More informationNOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe
NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks
More informationKafka pours and Spark resolves! Alexey Zinovyev, Java/BigData Trainer in EPAM
Kafka pours and Spark resolves! Alexey Zinovyev, Java/BigData Trainer in EPAM With IT since 2007 With Java since 2009 With Hadoop since 2012 With Spark since 2014 With EPAM since 2015 About Contacts E-mail
More informationThe Future of Real-Time in Spark
The Future of Real-Time in Spark Reynold Xin @rxin Spark Summit, New York, Feb 18, 2016 Why Real-Time? Making decisions faster is valuable. Preventing credit card fraud Monitoring industrial machinery
More information! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like
Cloud background Google File System! Warehouse scale systems " 10K-100K nodes " 50MW (1 MW = 1,000 houses) " Power efficient! Located near cheap power! Passive cooling! Power Usage Effectiveness = Total
More informationPutting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt
Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start
More informationStreaming OLAP Applications
Streaming OLAP Applications From square one to multi-gigabit streams and beyond C. Scott Andreas HPTS 2013 @cscotta Roadmap Framing the problem Four phases of an architecture s evolution Code: A general-purpose
More informationGoogle File System. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google fall DIP Heerak lim, Donghun Koo
Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google 2017 fall DIP Heerak lim, Donghun Koo 1 Agenda Introduction Design overview Systems interactions Master operation Fault tolerance
More informationApache Flink. Alessandro Margara
Apache Flink Alessandro Margara alessandro.margara@polimi.it http://home.deib.polimi.it/margara Recap: scenario Big Data Volume and velocity Process large volumes of data possibly produced at high rate
More informationDistributed systems for stream processing
Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine
More informationShen PingCAP 2017
Shen Li @ PingCAP About me Shen Li ( 申砾 ) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer WHY DO WE NEED A NEW DATABASE? Brief History Standalone RDBMS NoSQL
More informationDistributed File Systems II
Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation
More informationHacking PostgreSQL Internals to Solve Data Access Problems
Hacking PostgreSQL Internals to Solve Data Access Problems Sadayuki Furuhashi Treasure Data, Inc. Founder & Software Architect A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure
More informationPUBLIC SAP Vora Sizing Guide
SAP Vora 2.0 Document Version: 1.1 2017-11-14 PUBLIC Content 1 Introduction to SAP Vora....3 1.1 System Architecture....5 2 Factors That Influence Performance....6 3 Sizing Fundamentals and Terminology....7
More informationRealtime visitor analysis with Couchbase and Elasticsearch
Realtime visitor analysis with Couchbase and Elasticsearch Jeroen Reijn @jreijn #nosql13 About me Jeroen Reijn Software engineer Hippo @jreijn http://blog.jeroenreijn.com About Hippo Visitor Analysis OneHippo
More informationA day in the life of a log message Kyle Liberti, Josef
A day in the life of a log message Kyle Liberti, Josef Karasek @Pepe_CZ Order is vital for scale Abstractions make systems manageable Problems of Distributed Systems Reliability Data throughput Latency
More informationWorking with Database Connections. Version: 7.3
Working with Database Connections Version: 7.3 Copyright 2015 Intellicus Technologies This document and its content is copyrighted material of Intellicus Technologies. The content may not be copied or
More informationModern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.
Modern ETL Tools for Cloud and Big Data Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc. Agenda Landscape Cloud ETL Tools Big Data ETL Tools Best Practices
More informationIntra-cluster Replication for Apache Kafka. Jun Rao
Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture
More informationTools for Social Networking Infrastructures
Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes
More informationmicrosoft
70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series
More informationIntellicus Enterprise Reporting and BI Platform
Working with Database Connections Intellicus Enterprise Reporting and BI Platform Intellicus Technologies info@intellicus.com www.intellicus.com Copyright 2014 Intellicus Technologies This document and
More informationData Lake Based Systems that Work
Data Lake Based Systems that Work There are many article and blogs about what works and what does not work when trying to build out a data lake and reporting system. At DesignMind, we have developed a
More informationIntroduction to NoSQL Databases
Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction
More informationMigrating massive monitoring to Bigtable without downtime. Martin Parm, Infrastructure Engineer for Monitoring
Migrating massive monitoring to Bigtable without downtime Martin Parm, Infrastructure Engineer for Monitoring This is a big deal. -- Nicholas Harteau/VP, Engineering & Infrastructure https://news.spotify.com/dk/2016/02/23/announcing-spotify-infrastructures-googley-future/
More informationSpread the Database Love with Heterogeneous Replication. MC Brown, VP, Products
Spread the Database Love with Heterogeneous Replication MC Brown, VP, Products Heterogeneous Replication is NOT Exporting and Importing Data One Time Exports Moving to a different database platform ETL
More informationInstalling and configuring Apache Kafka
3 Installing and configuring Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Kafka...3 Prerequisites... 3 Installing Kafka Using Ambari... 3... 9 Preparing the Environment...9
More informationGriddable.io architecture
Griddable.io architecture Executive summary This whitepaper presents the architecture of griddable.io s smart grids for synchronized data integration. Smart transaction grids are a novel concept aimed
More informationIndex. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /
Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic
More informationCourse AZ-100T01-A: Manage Subscriptions and Resources
Course AZ-100T01-A: Manage Subscriptions and Resources Module 1: Managing Azure Subscriptions In this module, you ll learn about the components that make up an Azure subscription and how management groups
More informationGain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.
Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources
More informationQualys Cloud Platform
18 QUALYS SECURITY CONFERENCE 2018 Qualys Cloud Platform Looking Under the Hood: What Makes Our Cloud Platform so Scalable and Powerful Dilip Bachwani Vice President, Engineering, Qualys, Inc. Cloud Platform
More information