Atlassian s Journey Into Splunk

Similar documents
Measuring HEC Performance For Fun and Profit

Splunk & AWS. Gain real-time insights from your data at scale. Ray Zhu Product Manager, AWS Elias Haddad Product Manager, Splunk

DB Connect Is Back. and it is better than ever. Tyler Muth Denis Vergnes. September 2017 Washington, DC

Create Dashboards that People Love

Architecting Splunk For High Availability And Disaster Recovery

Introducing Splunk Validated Architectures (SVA)

Using Splunk Enterprise To Optimize Tailored Long-term Data Retention

Indexer Clustering Internals & Performance

Dashboard Time Selection

Indexer Clustering Fixups

Running Splunk Enterprise within Docker

Data Onboarding. Where Do I begin? Luke Netto Senior Professional Services Splunk. September 26, 2017 Washington, DC

Bring Context To Your Machine Data With Hadoop, RDBMS & Splunk

Scaling Indexer Clustering

FFIEC Cybersecurity Assessment Tool

Search Head Clustering Basics To Best Practices

Docker and Splunk Development

Visualizing the Health of Your Mobile App

A Trip Through The Splunk Data Ingestion And Retrieval Pipeline

Monitoring Docker Containers with Splunk

Next Generation Dashboards

Metrics Analysis with the Splunk Platform

Splunk N Box. Splunk Multi-Site Clusters In 20 Minutes or Less! Mohamad Hassan Sales Engineer. 9/25/2017 Washington, DC

Tracking Logs at Zillow with Lookups & JIRA

Making the Most of the Splunk Scheduler

Modernizing InfoSec Training and IT Operations at USF

Need for Speed: Unleashing the Power of SecOps with Adaptive Response. Malhar Shah CEO, Crest Data Systems Meera Shankar Alliance Manager, Splunk

Squeezing all the Juice out of Splunk Enterprise Security

Data Obfuscation and Field Protection in Splunk

Dragons and Splunk Do Not Do Well In Captivity

Bringing Sweetness to Sour Patch Tuesday

The Power of Data Normalization. A look at the Common Information Model

Dashboard Wizardry. Advanced Dashboard Interactivity. Siegfried Puchbauer Principal Software Engineer Yuxiang Kou Software Engineer

Extending SPL with Custom Search Commands

Manage AWS Services. Cost, Security, Best Practice and Troubleshooting. Principal Software Engineer. September 2017 Washington, DC

Dashboards & Visualizations: What s New

Splunking with Multiple Personalities

Making Sense of Web Fraud With Splunk Stream

Integrating Splunk And AWS Lambda

AWS Solutions Architect Associate (SAA-C01) Sample Exam Questions

Replication of summary data in indexer cluster

Real Time Monitoring Of A Cloud Based Micro Service Architecture Using Splunkcloud And The HTTP Eventcollector

PrepAwayExam. High-efficient Exam Materials are the best high pass-rate Exam Dumps

Building a Threat-Based Cyber Team

Alan Williams Principal Engineer alanwill on Twitter & GitHub

Best Practices and Better Practices for Users

Essentials to creating your own Security Posture using Splunk Enterprise

Wrangling Your IOT Data Into Splunk

SAA-C01. AWS Solutions Architect Associate. Exam Summary Syllabus Questions

Building Your First Splunk App with the Splunk Web Framework

Intro to Big Data on AWS Igor Roiter Big Data Cloud Solution Architect

AWS Agility + Splunk Visibility = Cloud Success. Splunk App for AWS Demo. Laura Ripans, AWS Alliance Manager

Fields, Indexed Tokens, And You

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Best Prac:ces + New Feature Overview for the Latest Version of Splunk Deployment Server

HTTP Event Collector in Splunk 6.5 More Super Powers!

Splunk Helping in Productivity

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

KV Store: Hammer Time

How can you implement this through a script that a scheduling daemon runs daily on the application servers?

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Amazon Web Services and Feb 28 outage. Overview presented by Divya

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Beating the Final Boss: Launch your game!

DevOps Made Easy. Shireesh Thanneru, Platform Architect. Intel. Linoy Alexander, Director, DevOps

Log Analytics with Amazon Elasticsearch Service. Christoph Schmitter

AWS Lambda and Cassandra

Splunk Enterprise on the AWS Cloud

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

This document (including, without limitation, any product roadmap or statement of direction data) illustrates the planned testing, release and

EXAM - AWS-Solution-Architect- Associate. AWS Certified Solutions Architect - Associate. Buy Full Product

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

ELASTIC DATA PLATFORM

ROCK INK PAPER COMPUTER

Document Sub Title. Yotpo. Technical Overview 07/18/ Yotpo

Scaling MongoDB. Percona Webinar - Wed October 18th 11:00 AM PDT Adamo Tonete MongoDB Senior Service Technical Service Engineer.

Migrating a critical high-performance platform to Azure with zero downtime

Enterprise Security Biology

Microservices without the Servers: AWS Lambda in Action

Ceph Intro & Architectural Overview. Abbas Bangash Intercloud Systems

Example Azure Implementation for Government Agencies. Indirect tax-filing system. By Alok Jain Azure Customer Advisory Team (AzureCAT)

Malware Analysis: Suricata & Splunk for better rule writing

Amazon Web Services (AWS) Training Course Content

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Your Voice is Your Passport: Implementing Voice-driven Applications with Amazon Alexa

Search Head Clustering

Welcome to Tomorrow... Today

Reactive Microservices Architecture on AWS

Accenture Cloud Platform Serverless Journey

Technical Deep Dive Splunk Cloud. Copyright 2015 Splunk Inc.

REST APIs on z/os. How to use z/os Connect RESTful APIs with Modern Cloud Native Applications. Bill Keller

WHITEPAPER. MemSQL Enterprise Feature List

At Course Completion Prepares you as per certification requirements for AWS Developer Associate.

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

Dynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements

HOW TO PLAN & EXECUTE A SUCCESSFUL CLOUD MIGRATION

<Insert Picture Here> MySQL Cluster What are we working on

Streaming Log Analytics with Kafka

Amazon. Exam Questions AWS-Certified-Solutions-Architect- Professional. AWS-Certified-Solutions-Architect-Professional.

Microservices Architekturen aufbauen, aber wie?

Transcription:

Atlassian s Journey Into Splunk The Building Of Our Logging Pipeline On AWS Tim Clancy Engineering Manager, Observability James Mackie Infrastructure Engineer, Observability September 2017 Washington, DC

Forward-Looking Statements During the course of this presentation, we may make forward-looking statements regarding future events or the expected performance of the company. We caution you that such statements reflect our current expectations and estimates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward-looking statements made in this presentation are being made as of the time and date of its live presentation. If reviewed after its live presentation, this presentation may not contain current or accurate information. We do not assume any obligation to update any forward looking statements we may make. In addition, any information about our roadmap outlines our general product direction and is subject to change at any time without notice. It is for informational purposes only and shall not be incorporated into any contract or other commitment. Splunk undertakes no obligation either to develop the features or functionality described or to include any such feature or functionality in a future release. Splunk, Splunk>, Listen to Your Data, The Engine for Machine Data, Splunk Cloud, Splunk Light and SPL are trademarks and registered trademarks of Splunk Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective owners. 2017 Splunk Inc. All rights reserved.

Where We Started

2017 SPLUNK INC. 2015

Searchable Query logs in an easy fashion Secure Control who can see the contents Stored Retained indefinitely

2017 SPLUNK INC. Centralized Logging Core team With knowledge of how to run a logging platform

2017 SPLUNK INC. Centralized Logging Core team With knowledge of how to run a logging platform One common way To send, receive and store logs

Architecture Event Pipeline Cold Storage Events Created Search Platform

2017 SPLUNK INC. Event Structure JSON With knowledge of how to run a logging platform

2017 SPLUNK INC. Event Structure JSON With knowledge of how to run a logging platform Service ID Unique way to identify the generating service

2017 SPLUNK INC. Event Structure JSON With knowledge of how to run a logging platform Service ID Unique way to identify the generating service Time Time of creation in UTC

2017 SPLUNK INC. Pipeline Requirements Scalable Easy and quickly add capacity

2017 SPLUNK INC. Pipeline Requirements Scalable Easy and quickly add capacity Queued Store events even if our consumers stop processing

2017 SPLUNK INC. Pipeline Requirements Scalable Easy and quickly add capacity Queued Store events even if our consumers stop processing Large Volume Thousands of producers, many consumers

http://fbrnc.net/blog/2016/03/messaging-on-aws

Event Pipeline Custom metadata service Ingress stream(s) Processing service Many stream(s) Stream service

Recap Event Pipeline Cold Storage Events Created Search Platform

2015 Capacity

2017 SPLUNK INC. 2016

2015 Capacity

2017 SPLUNK INC. Problems Stability Ingestion not keeping up resulting in long delays

2017 SPLUNK INC. Problems Stability Ingestion not keeping up resulting in long delays Scale Indexing model didn t keep pace as more services were brought onboard

2017 SPLUNK INC. Problems Stability Ingestion not keeping up resulting in long delays Scale Indexing model didn t keep pace as more services were brought onboard User Experience One of the top complaints on our shared development platform

2017 SPLUNK INC. Let s Move To Splunk! (oh, and do it in 3-4 months)

How We Did It

2017 SPLUNK INC. Splunk Had A Place In Security Security incident detection

2017 SPLUNK INC. Splunk Had A Place In Security Security incident detection Limited users & data Very restricted user base and subset of logs

Architecture Event Pipeline Cold Storage Events Created Security Search Platform

2017 SPLUNK INC. HTTP Event Collector Our log producers and pipeline already support JSON structured logs. Input to Splunk what is already being produced.

Goal Event Pipeline Cold Storage Events Created All Atlassian

Splunk Ingestion Splunk stream(s) Ingestion service ELB Custom metadata service Indexers running HEC

Indexing Strategy Thousands Of Services Handle having many thousands of services - too many for index per service Make Them Easy To Find Map our serviceid to source Auto generate eventtype No Custom Retention

Clustering Strategy AZ is Site Multiple Indexing Clusters Single SHC AWS AZ maps nicely onto Splunk site, allowed multi-site clusters Different indexer clusters for different types of logs Initially just production or non production No special treatment for teams

Ansible Infrastructure Provision our AWS infrastructure/roles. Heavy use of dynamic cloud formation lookups Configure Make changes to.conf files - join clusters Business As Usual Create indexes, deploy apps

Metadata Service Control Access Provide LDAP restrictions on access to particular sets of logs - mapped to splunk groups Manage Capacity Top services provide their throughput and retention periods - assists in provisioning Routing Split logs to particular clusters and indexes

What Metadata Service Looks Like If it was backed by Google Docs

What Metadata Service Looks Like

Splunk Changes Custom metadata service crontab Infrastructure ASG, CF Configuration indexes, apps

How Did We Go

2015 Capacity

Splunk makes me feel smart 2017 SPLUNK INC.

There are so many things we can do on Splunk that were never possible in the old system! 2017 SPLUNK INC.

The Splunk team helped me get my feet wet, and showed me that there is life after the old system. 2017 SPLUNK INC.

I ve been Splunking all day, much more fun than writing stupid Java code 2017 SPLUNK INC.

2017 SPLUNK INC. Logging went from a constant pain point to a feature of the platform.

2015 Capacity

2015 Capacity

90,000 Searches Every Day

Over 4x Our Planned Capacity

What Worked And What Didn t

2017 SPLUNK INC. Using Kinesis & the HEC at scale Read off Kinesis with KCL workers, not Lambda HEC on each indexer, all behind an ELB Load balancing pools need an accurate healthcheck

2017 SPLUNK INC. Testing At Scale Is Hard Difficult to replicate production load for tests Run into problems only in prod Game Week!

Game Week

2017 SPLUNK INC. An Open Platform Has Challenges Openness and flexibility are important to us Splunk per-user and global limits will save you Sledge hammer capabilities (can_delete, admin_all_objects)

Search Diagnostics

2017 SPLUNK INC. Know Your Resource Constraints Splunk likes disk IOPs, a lot Fixup tasks, bucket copying, heavy search load Limits help, but eventually you will run into resource constraints

2017 SPLUNK INC. Review Your Clustering Strategies Indexing and clustering choices affect performance Making the right choice is worth the time

The Future

Partitioned Services Make better use of the resources we have More Clusters Limit blast radius of bad actors Containers Docker/Kube

2017 SPLUNK INC. Thank You! Don't forget to rate this session in the.conf2017 mobile app