Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Similar documents
DB Connect Is Back. and it is better than ever. Tyler Muth Denis Vergnes. September 2017 Washington, DC

Measuring HEC Performance For Fun and Profit

Stages of Data Processing

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

FFIEC Cybersecurity Assessment Tool

Oracle GoldenGate for Big Data

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Atlassian s Journey Into Splunk

Data Obfuscation and Field Protection in Splunk

Docker and Splunk Development

Data Onboarding. Where Do I begin? Luke Netto Senior Professional Services Splunk. September 26, 2017 Washington, DC

Create Dashboards that People Love

Dragons and Splunk Do Not Do Well In Captivity

Visualizing the Health of Your Mobile App

Making the Most of the Splunk Scheduler

Big Data Architect.

Monitoring Docker Containers with Splunk

Using Splunk Enterprise To Optimize Tailored Long-term Data Retention

Syncsort DMX-h. Simplifying Big Data Integration. Goals of the Modern Data Architecture SOLUTION SHEET

Next Generation Dashboards

Introducing Splunk Validated Architectures (SVA)

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Dashboard Time Selection

Microsoft Big Data and Hadoop

Flash Storage Complementing a Data Lake for Real-Time Insight

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

Running Splunk Enterprise within Docker

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

The Power of Data Normalization. A look at the Common Information Model

@Pentaho #BigDataWebSeries

Metrics Analysis with the Splunk Platform

A Trip Through The Splunk Data Ingestion And Retrieval Pipeline

Oracle Big Data. A NA LYT ICS A ND MA NAG E MENT.

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

The Technology of the Business Data Lake. Appendix

Talend Big Data Sandbox. Big Data Insights Cookbook

Big Data Hadoop Developer Course Content. Big Data Hadoop Developer - The Complete Course Course Duration: 45 Hours

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

HDInsight > Hadoop. October 12, 2017

Scaling Indexer Clustering

Modern Data Warehouse The New Approach to Azure BI

Search Head Clustering Basics To Best Practices

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

Innovatus Technologies

SpagoBI and Talend jointly support Big Data scenarios

CERTIFICATE IN SOFTWARE DEVELOPMENT LIFE CYCLE IN BIG DATA AND BUSINESS INTELLIGENCE (SDLC-BD & BI)

Question: 1 You need to place the results of a PigLatin script into an HDFS output directory. What is the correct syntax in Apache Pig?

Hadoop An Overview. - Socrates CCDH

Blended Learning Outline: Developer Training for Apache Spark and Hadoop (180404a)

Streaming Integration and Intelligence For Automating Time Sensitive Events

Integrating Oracle Databases with NoSQL Databases for Linux on IBM LinuxONE and z System Servers

Getting Started with Intellicus. Version: 16.0

How Apache Hadoop Complements Existing BI Systems. Dr. Amr Awadallah Founder, CTO Cloudera,

Oracle Big Data Fundamentals Ed 2

Indexer Clustering Internals & Performance

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

Data Lake Based Systems that Work

Indexer Clustering Fixups

Security and Performance advances with Oracle Big Data SQL

Configuring and Deploying Hadoop Cluster Deployment Templates

Getting Started With Intellicus. Version: 7.3

Analyze Big Data Faster and Store It Cheaper

Oracle Database 11g for Data Warehousing & Big Data: Strategy, Roadmap Jean-Pierre Dijcks, Hermann Baer Oracle Redwood City, CA, USA

Big Data with Hadoop Ecosystem

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics. Description:

Oracle Big Data Connectors

SOLUTION TRACK Finding the Needle in a Big Data Innovator & Problem Solver Cloudera

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Certified Big Data and Hadoop Course Curriculum

VOLTDB + HP VERTICA. page

August Oracle - GoldenGate Statement of Direction

Best practices for building a Hadoop Data Lake Solution CHARLOTTE HADOOP USER GROUP

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

Big Data com Hadoop. VIII Sessão - SQL Bahia. Impala, Hive e Spark. Diógenes Pires 03/03/2018

1Z Oracle Big Data 2017 Implementation Essentials Exam Summary Syllabus Questions

Modernizing InfoSec Training and IT Operations at USF

Certified Big Data Hadoop and Spark Scala Course Curriculum

Tracking Logs at Zillow with Lookups & JIRA

Dashboard Wizardry. Advanced Dashboard Interactivity. Siegfried Puchbauer Principal Software Engineer Yuxiang Kou Software Engineer

Splunk N Box. Splunk Multi-Site Clusters In 20 Minutes or Less! Mohamad Hassan Sales Engineer. 9/25/2017 Washington, DC

This is a brief tutorial that explains how to make use of Sqoop in Hadoop ecosystem.

WHITEPAPER. MemSQL Enterprise Feature List

Importing and Exporting Data Between Hadoop and MySQL

2014 年 3 月 13 日星期四. From Big Data to Big Value Infrastructure Needs and Huawei Best Practice

Deploying Spatial Applications in Oracle Public Cloud

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Oracle GoldenGate 12c

Hadoop Overview. Lars George Director EMEA Services

Architecting Splunk For High Availability And Disaster Recovery

BIG DATA COURSE CONTENT

Capture Business Opportunities from Systems of Record and Systems of Innovation

Things Every Oracle DBA Needs to Know about the Hadoop Ecosystem. Zohar Elkayam

Microsoft Perform Data Engineering on Microsoft Azure HDInsight.

Modern ETL Tools for Cloud and Big Data. Ken Beutler, Principal Product Manager, Progress Michael Rainey, Technical Advisor, Gluent Inc.

Dashboards & Visualizations: What s New

Evolution of Big Data Facebook. Architecture Summit, Shenzhen, August 2012 Ashish Thusoo

Spotfire Advanced Data Services. Lunch & Learn Tuesday, 21 November 2017

How to choose the right approach to analytics and reporting

Big Data Syllabus. Understanding big data and Hadoop. Limitations and Solutions of existing Data Analytics Architecture

Transcription:

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk Raanan Dagan and Rohit Pujari September 25, 2017 Washington, DC

Forward-Looking Statements During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. 2017 Splunk Inc. All rights reserved.

Agenda Splunk Big Data Architecture Alternative Open Source Approach Real-World Customer Architecture End-to-end Demonstration

Big Data Technologies Relational Database Structured Schema at Write NoSQL Semi-Structured Schema at Read Hadoop Semi-Structured Schema at Read Splunk Schema at Read SQL ETL Key-Value, Column, Document & Other Stores MapReduce HDFS Storage Search Real-Time Indexing RDBMS Oracle, MySQL, IBM DB2, Teradata Cassandra, HBase, MongoDB MapReduce Distributed File System Time-Series, Unstructured, Heterogeneous 4

Splunk: Open And Extensible Databases = Splunk DB Connect (Hive, Impala, Oracle) Hadoop = Analytics for Hadoop, Hadoop Data Roll, Connect Kafka = Splunk Kafka Add-On, Kafka with HEC Spark = Spark SQL NoSQL = MongoDB, Hbase, Cassandra apps

Splunk Enterprise Architecture 2 RDBMS. Search Head Cluster NoSQL 1.. Indexing Tier 3 Splunk Hadoop Data Roll 0101010 0010101 1010010 Modular Syslog TCP/UDP Wire Data Forwarder Windows/*NIX HTTP

Splunk And Hadoop - Products Splunk Analytics for Hadoop: Analyze Hadoop Data using Hadoop MapReduce Processing Splunk Hadoop Connect: Export data from Splunk to Hadoop Hadoop Data Roll Archive Splunk indexers to Hadoop Splunk Monitor Hadoop: Monitor Hadoop

Splunk & Hadoop Architecture 2. Search Head Cluster Splunk Analytics for Hadoop 1 Sqoop Flume Kafka Manual scripts 0101010 0010101 1010010 RDBMS Syslog TCP/UDP Wire Data HTTP

Splunk Big Data Technologies DB Connect Splunk Analytics for Hadoop Splunk Schema at Write Schema at Read Schema at Read Schema at Read SQL ETL Key-Value, Column, Document & Other Stores MapReduce HDFS Storage Search Real-Time Indexing RDBMS Oracle, MySQL, IBM DB2, Teradata Cassandra, Hbase, MongoDB MapReduce Distributed File System Time-Series, Unstructured, Heterogeneous

Splunk Scalability Enterprise-class Availability and Scale Automatic load balancing linearly scales indexing Distributed search and MapReduce linearly scales search and reporting Offload search load to Splunk Search Heads Auto load-balanced forwarding to Splunk Indexers Send data from thousands of servers using any combination of Splunk forwarders

Splunk Real-Time Analytics Data Monitor Input TCP/UDP Input Scripted Input Parsing Queue Parsing Pipeline Source, event typing Character set normalization Line breaking Timestamp identification Regex transforms Index Queue Real-time Buffer Indexing Pipeline Raw data Index Files Real-time Search Process Splunk Index 11

Splunk With Hadoop - Unique Features Virtual Index Schema-on-the-fly Flexibility and Fast Time to Value Enables seamless use of the Splunk technology stack on data wherever it rests Natively handles MapReduce Structure applied at search time No brittle schema Automatically find patterns and trends Interactive search Preview results while MapReduce jobs run Drag-and-drop analytics Security: Access Control, Pass Through Authentication, Kerberos

What About Structured Data? Customer profile Product attributes Employee details Pricing and Rate plans Asset info

Use Cases For Structured Data In Splunk Index machine data from databases, such as logs or sales records Enrich machine data with high-level data, such as customer records Update structured databases with Splunk info, such as risk scores Interactively browse structured and unstructured data from Splunk reports

Machine Data Delivers Real-time Insights Media server logs (machine data) Phone Number IP Address Track ID Mar 01 19:18:50:000 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct start for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK. 2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053-80 - 10.164.232.181 "Mozilla/5.0 (iphone; CPU iphone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680 Mar 01 19:18:50:163 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct stop for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK.

Structured Data Contains Business Context Media server logs (machine data) Phone number IP address Track ID Mar 01 19:18:50:000 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct start for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK. 2013-03-01 19:18:50:150 10.2.1.34 GET /sync/addtolibrary/01011207201000005652000000000053-80 - 10.164.232.181 "Mozilla/5.0 (iphone; CPU iphone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A405 Safari/7534.48.3" 503 0 0 825 1680 Mar 01 19:18:50:163 aaa2 radiusd[12548]:[id 959576 local1.info] INFO RADOP(13) acct stop for 2172618992@splunktel.com 10.164.232.181 from 12.130.60.5 recorded OK. Track ID Artist Title Format ID Run time 01011207201000005652000000000053 Maroon 5 Moves like Jagger MP3 4:30 Customer, product databases Phone # 2172618992 53546 Subscriber ID Subscrib er ID First Name Last Name Age State Custome r Score 53546 Jim Morrison 25 CA 93

Splunk DB Connect Reliable, scalable, real-time integration between Splunk and traditional relational databases Enrich search results with additional business context Easily import data into Splunk for deeper analysis Ingest, transform machine data in Splunk and export it to relational databases Integrate multiple DBs concurrently Simple set-up, non-evasive and secure Database Lookup Oracle Database Connection Pooling JDBC Microsoft SQL Server Database Query Other Databases

Open Source Alternatives

Hadoop Complexity

Hadoop Use Cases Splunk use cases Application Delivery IT Operations Security and Compliance Splunk or Hadoop use cases Business Analytics Business IoT Analytics Hadoop use cases ETL for RDBMS

Customer Architecture

Summary Architecture 3 instances Splunk / Hadoop / DB Connect Search Heads Real Time Data - 25 Indexers Historical data (VIX) 60 Hortonworks nodes Enrichment data (lookup) - MySQL DB 2000 Forwarders

Splunk Deployment Architecture indexer Web server 2,000 forwarders ~2TB per day indexer 25 indexers 3 search head ~250 Users ~30 Concurrent Users Web server forwarder

Hadoop Architecture WebLogic app server ~30 Flume Agents ~60 Data Nodes ~1.2 PB of storage ~2 Years data retention data node WebLogic app server data node data node data node

Splunk + Hadoop = All The Data indexer Web server indexer app server Real Time Analytics Alerts Apps Batch Compliment Splunk Analytics Historical searches data node data node data node

DB Connect Architecture Install DB Connect on a Search Head Use DB Connect for Lookup Several Lookups coming from two different MySQL Databases Lookup Enrich log data with business insight Search Head DB-1 DB-2 MySQL JDBC Driver

DB - Architecture Performance Impact Command Connection Architecture Indexing Inputs - dbmon-tail ** Recommended Inputs dbmon-dump Outputs Not Indexing Medium number of connections (Small amount of data - only delta) Small amount of connections (Lots of data per connection) Lots of DB Connections (Small amount of data) DB to Index (connection pooling) DB to Index (connection pooling) Search Head to DB (connection pooling) Search DBXQuery Lots of DB Connections DB to Search Head Lookups ** Selected this option Lots of DB Connections DB to Search Head

Summary Architecture 3 instances Splunk / Hadoop / DB Connect Search Heads Real Time Data - 25 Indexers Historical data (VIX) 60 Hortonworks nodes Enrichment data (lookup) - MySQL DB 2000 Forwarders

Customer Architecture Demo

Summary Splunk is open and extensible Splunk enables you to combine data from multiple sources for enriched insights Splunk can complement and fill the gaps in open source technologies

2017 SPLUNK INC. Thank You Don't forget to rate this session in the.conf2017 mobile app