Fluentd + MongoDB + Spark = Awesome Sauce

Size: px
Start display at page:

Download "Fluentd + MongoDB + Spark = Awesome Sauce"

Transcription

1 Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here

2 Wipro Open Source Practice: Vision & Mission Vision Wipro will be the world leader in solving customer problems through the use of innovative and practical open source solutions. We will be a steward of every open source community in which we engage, and always act with sensitivity and integrity. Mission Wipro s Open Source mission is to be the guide and partner to companies seeking to leverage the strategic, financial, organizational and technological benefits of open source software and methods. Wipro will anticipate and solve customers needs through a commitment to research, and by taking a balanced approach to legacy and innovative technologies. Wipro s comprehensive suite of strategic and technology services will be delivered with passion and precision.

3 Wipro Open Source Practice Offerings Advisory Enterprise-wide adoption strategies Best fit analysis & recommendation Business Case Advisory Governance Technical Consulting Productized Services Legacy Migration Services Greenfield Development Open Source Stack Setup Open App Cross Industry Solutions and Process Stacks Support Application and Infrastructure Dev Ops Architecture, Development Open Source Community

4 Connected Warehouse Platform CSC SCP Warehouse Mobility & Dashboards Carrier Vendor Facility Inventory & Operations Orders Alerts & Notification Warehouse KPI s Performance Tracker Equipment Monitor Dashboards Master Data Connected Warehouse Platform Transaction Data Webservices Integration Mapping FTP (Flat file/xml) Subscriber Queues Automation Enabler Publisher Queues Sales Orders [Real-Time] Route Plan / Carrier Tracking Almost Real-Time Associate Performance PUT/PICK Status Purchase Orders Master Data [Scheduled] OMS TMS WMS LMS WCS IOT ERP/HOST Direct to Customer Warehouses Equipment Retailer Supplier

5 The Awesome Sauce ANALYTICS & PREDICTION

6 Clickstream Analytics User Behavior Analysis Product Affinity Website Resource Allocation Prediction & recommendation

7 PREDICTION & RECOMMENDATION Prediction Using Machine Learning Content Recommendation Conversion Prediction Visitor Segmentation Demand Forecasting

8 Sauce Raw Material LOGS

9 Logs, Logs Everywhere! SysLog Clickstre am Data Social Media Feeds Packet Data Sensor Data CDR Device Logs Custom App (C, Ruby,Pyt hon) Payment Data Applicati on Server Logs Web Access Logs Database Logs

10 What can be done with logs? Real time monitoring Root cause analysis Anomaly Detection and Predictive Monitoring Debugging Troubleshooting/Support

11 Challenges with Log Analytics No standard log formats Multiple logging frameworks Logs highly decentralized Limited real time visualization capability Scalability Issues Normalizing and correlating logs from disparate sources

12 What can be done with logs Business PoV? Input Data Analytics User Interactions /Behavior End user Experience/Improvements

13 Awesome Foursome The Ingredients

14 The Ingredients FLUENTD

15 Why Fluentd Unified Logging Simple and Flexible Proven Minimal Resources Reliable Open Source Community

16 Fluentd Plugin Architecture Input Input (udp,tcp,http,tail) Parser (regexp,apache2) Filter Filter (grep,enrich, delete.mask) Output Buffer Output out_mongo Format

17 HA Fluentd topology At Most once and At Least once transfers Log Forwarders Node1 Log Aggregators Destination Log File Node2 Log File Fluentd Fluentd PUSH Fluentd (Active) PUSH MongoD B Node3 Log File Fluentd Fluentd (Backup) Amazon S3

18 Fluentd Failure Scenarios Forwarder goes down Aggregator goes down

19 The Ingredients KAFKA

20 Kafka distributed streaming platform Producers Publish-Subscribe streams of records Store streams of records in fault tolerant way Process streams of records Apps App App DBs App Connectors DBs Kafka Cluster Stream Processor App Apps App App Consumers

21 Kafka Terms Topic Partition Producer Consumer Producer Topics Partition-1 Partition-2 Partition-3 Brokers p1 p2 p3 R1 R2 R3 Consumer Group Consumer Groups C1 C2 C2

22 Why Kafka Ideal unified platform to handle real time data feeds Has high throughput to support high volume event streams such as log aggregation Deals well with high volume data loads from offline systems Fault tolerance and Scalable Able to handle the low latency associated with traditional messaging systems

23 Kafka decouples data pipelines Producers Producers Producers Producers Broker Kafka Consumers Consumer Consumer Consumer

24 Kafka Guarantees Messages sent to the topic and partition are appended in the same order A consumer instance gets the message in the same order as they are produced A topic with replication factor N can tolerate n-1 failures

25 Kafka Replication Producer Producer Logs Logs Logs Logs Follower Leader Topic1- part1 Topic1- part1 Follower Follower Topic1- part1 Follower Leader Topic1- part2 Topic1- part2 Topic1- part2 Broker1 Broker2 Broker3 Broker4

26 Zookeeper Zookeeper enables highly reliable distributed coordination Kafka bundles single node ZooKeeper instance Metadata includes broker addresses, message offsets metadata Zookeeper metadata Producers metadata Consumers messages Kafka Cluster messages

27 Kafka Persistence - File System Sequential File I/O very fast Uses OS page cache for data storage Batching of messages speeds up disk operations, network transfers and in memory iterations.

28 Batch Processing One of the big drivers for efficiency Producers accumulate data in memory and send larger batches in a single request Fix the number of messages in a batch - batch.size Wait no longer than a fixed latency bound - linger.ms Trade off small amount of latency for better throughput

29 Log Compaction Per-record retention, rather than the coarsergrained timebased retention

30 Fluentd Kafka Integration Kafka Fluentd Consumer Fluentd kafka plugin Log Forwarders Fluentd Kafka Ecosystem Consumers Fluentd Destination MongoD B Fluentd PUSH Kafka Clusters PULL Fluentd PUSH Fluentd Fluentd Amazon S3

31 Advantage - Fluentd-Kafka Backpressure - Pull versus Push Reliable, Flexible data pipeline

32 Connected Warehouse Kafka Cluster Architecture Fluentd-Kafka Plugin Data Center 1 - Active Data Center 2 - Active Kafka Cluster Kafka Broker -1 Topic 1, Partition 0..n ZK 1 Leader Zookeeper Ensemble Kafka Broker 2 Topic 1, Partition n+1, n+n ZK 2 Follower

33 The Ingredients MONGODB

34 Why MongoDB Cross platform document-oriented NOSQL database Simple and Flexible Data Model Field Level Indexing Built In Query Capabilities High Performance

35 System Architecture With Shards Config Server Data Sources mongos mongos mongos Primary Primary Primary Primary Primary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary

36 MongoDB For Analytics Denormalization with support of Embedded Documents Connector for almost all kind of data source Aggregation Framework Text Search Queries Range Queries, Key value queries

37 The Ingredients SPARK

38 Spark Logical Architecture Scala, Java, Python, R Spark SQL Spark Streaming MLlib GraphX Apache Spark Spark MongoDB Connector

39 Putting It All Together Click Stream + Inventory Mgmt Micro-Service Data Sync Processing Ingestion Collection

40 QUESTIONS & ANSWERS

41 Thank you

42

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect

Application monitoring with BELK. Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Application monitoring with BELK Nishant Sahay, Sr. Architect Bhavani Ananth, Architect Why logs Business PoV Input Data Analytics User Interactions /Behavior End user Experience/ Improvements 2017 Wipro

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Big Data Architect.

Big Data Architect. Big Data Architect www.austech.edu.au WHAT IS BIG DATA ARCHITECT? A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Data pipelines with PostgreSQL & Kafka

Data pipelines with PostgreSQL & Kafka Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL

More information

Sizing Guidelines and Performance Tuning for Intelligent Streaming

Sizing Guidelines and Performance Tuning for Intelligent Streaming Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the

More information

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS

MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS MODERN BIG DATA DESIGN PATTERNS CASE DRIVEN DESINGS SUJEE MANIYAM FOUNDER / PRINCIPAL @ ELEPHANT SCALE www.elephantscale.com sujee@elephantscale.com HI, I M SUJEE MANIYAM Founder / Principal @ ElephantScale

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics

Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Increase Value from Big Data with Real-Time Data Integration and Streaming Analytics Cy Erbay Senior Director Striim Executive Summary Striim is Uniquely Qualified to Solve the Challenges of Real-Time

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017. Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI /

Index. Raul Estrada and Isaac Ruiz 2016 R. Estrada and I. Ruiz, Big Data SMACK, DOI / Index A ACID, 251 Actor model Akka installation, 44 Akka logos, 41 OOP vs. actors, 42 43 thread-based concurrency, 42 Agents server, 140, 251 Aggregation techniques materialized views, 216 probabilistic

More information

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM

CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED DATA PLATFORM CONSOLIDATING RISK MANAGEMENT AND REGULATORY COMPLIANCE APPLICATIONS USING A UNIFIED PLATFORM Executive Summary Financial institutions have implemented and continue to implement many disparate applications

More information

IBM Data Replication for Big Data

IBM Data Replication for Big Data IBM Data Replication for Big Data Highlights Stream changes in realtime in Hadoop or Kafka data lakes or hubs Provide agility to data in data warehouses and data lakes Achieve minimum impact on source

More information

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira

More information

Data Analytics at Logitech Snowflake + Tableau = #Winning

Data Analytics at Logitech Snowflake + Tableau = #Winning Welcome # T C 1 8 Data Analytics at Logitech Snowflake + Tableau = #Winning Avinash Deshpande I am a futurist, scientist, engineer, designer, data evangelist at heart Find me at Avinash Deshpande Chief

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development::

Overview. Prerequisites. Course Outline. Course Outline :: Apache Spark Development:: Title Duration : Apache Spark Development : 4 days Overview Spark is a fast and general cluster computing system for Big Data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized

More information

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou

The Hadoop Ecosystem. EECS 4415 Big Data Systems. Tilemachos Pechlivanoglou The Hadoop Ecosystem EECS 4415 Big Data Systems Tilemachos Pechlivanoglou tipech@eecs.yorku.ca A lot of tools designed to work with Hadoop 2 HDFS, MapReduce Hadoop Distributed File System Core Hadoop component

More information

Streaming Integration and Intelligence For Automating Time Sensitive Events

Streaming Integration and Intelligence For Automating Time Sensitive Events Streaming Integration and Intelligence For Automating Time Sensitive Events Ted Fish Director Sales, Midwest ted@striim.com 312-330-4929 Striim Executive Summary Delivering Data for Time Sensitive Processes

More information

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015

Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL. May 2015 Lambda Architecture for Batch and Real- Time Processing on AWS with Spark Streaming and Spark SQL May 2015 2015, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document

More information

Developing Enterprise Cloud Solutions with Azure

Developing Enterprise Cloud Solutions with Azure Developing Enterprise Cloud Solutions with Azure Java Focused 5 Day Course AUDIENCE FORMAT Developers and Software Architects Instructor-led with hands-on labs LEVEL 300 COURSE DESCRIPTION This course

More information

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 Wednesday 28 th June, 2017 rkafka Shruti Gupta Wednesday 28 th June, 2017 Contents 1 Introduction

More information

BIG DATA REVOLUTION IN JOBRAPIDO

BIG DATA REVOLUTION IN JOBRAPIDO BIG DATA REVOLUTION IN JOBRAPIDO Michele Pinto Big Data Technical Team Leader @ Jobrapido Big Data Tech 2016 Firenze - October 20, 2016 ABOUT ME NAME Michele Pinto LINKEDIN https://www.linkedin.com/in/pintomichele

More information

Distributed systems for stream processing

Distributed systems for stream processing Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Managing IoT and Time Series Data with Amazon ElastiCache for Redis Managing IoT and Time Series Data with ElastiCache for Redis Darin Briskman, ElastiCache Developer Outreach Michael Labib, Specialist Solutions Architect 2016, Web Services, Inc. or its Affiliates. All

More information

Down the event-driven road: Experiences of integrating streaming into analytic data platforms

Down the event-driven road: Experiences of integrating streaming into analytic data platforms Down the event-driven road: Experiences of integrating streaming into analytic data platforms Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH Confluent Meetup Munich, 8.10.2018 Integrate

More information

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud

Microsoft Azure Databricks for data engineering. Building production data pipelines with Apache Spark in the cloud Microsoft Azure Databricks for data engineering Building production data pipelines with Apache Spark in the cloud Azure Databricks As companies continue to set their sights on making data-driven decisions

More information

Real-time Streaming Applications on AWS Patterns and Use Cases

Real-time Streaming Applications on AWS Patterns and Use Cases Real-time Streaming Applications on AWS Patterns and Use Cases Paul Armstrong - Solutions Architect (AWS) Tom Seddon - Data Engineering Tech Lead (Deliveroo) 28 th June 2017 2016, Amazon Web Services,

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

Big Data Analytics using Apache Hadoop and Spark with Scala

Big Data Analytics using Apache Hadoop and Spark with Scala Big Data Analytics using Apache Hadoop and Spark with Scala Training Highlights : 80% of the training is with Practical Demo (On Custom Cloudera and Ubuntu Machines) 20% Theory Portion will be important

More information

BIG DATA COURSE CONTENT

BIG DATA COURSE CONTENT BIG DATA COURSE CONTENT [I] Get Started with Big Data Microsoft Professional Orientation: Big Data Duration: 12 hrs Course Content: Introduction Course Introduction Data Fundamentals Introduction to Data

More information

20777A: Implementing Microsoft Azure Cosmos DB Solutions

20777A: Implementing Microsoft Azure Cosmos DB Solutions 20777A: Implementing Microsoft Azure Solutions Course Details Course Code: Duration: Notes: 20777A 3 days This course syllabus should be used to determine whether the course is appropriate for the students,

More information

Apache Kafka Your Event Stream Processing Solution

Apache Kafka Your Event Stream Processing Solution Apache Kafka Your Event Stream Processing Solution Introduction Data is one among the newer ingredients in the Internet-based systems and includes user-activity events related to logins, page visits, clicks,

More information

Hadoop An Overview. - Socrates CCDH

Hadoop An Overview. - Socrates CCDH Hadoop An Overview - Socrates CCDH What is Big Data? Volume Not Gigabyte. Terabyte, Petabyte, Exabyte, Zettabyte - Due to handheld gadgets,and HD format images and videos - In total data, 90% of them collected

More information

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure

<Insert Picture Here> MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure MySQL Web Reference Architectures Building Massively Scalable Web Infrastructure Mario Beck (mario.beck@oracle.com) Principal Sales Consultant MySQL Session Agenda Requirements for

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011

Data Infrastructure at LinkedIn. Shirshanka Das XLDB 2011 Data Infrastructure at LinkedIn Shirshanka Das XLDB 2011 1 Me UCLA Ph.D. 2005 (Distributed protocols in content delivery networks) PayPal (Web frameworks and Session Stores) Yahoo! (Serving Infrastructure,

More information

Kafka Connect the Dots

Kafka Connect the Dots Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..)

More information

Microservices Lessons Learned From a Startup Perspective

Microservices Lessons Learned From a Startup Perspective Microservices Lessons Learned From a Startup Perspective Susanne Kaiser @suksr CTO at Just Software @JustSocialApps Each journey is different People try to copy Netflix, but they can only copy what they

More information

Achieving Horizontal Scalability. Alain Houf Sales Engineer

Achieving Horizontal Scalability. Alain Houf Sales Engineer Achieving Horizontal Scalability Alain Houf Sales Engineer Scale Matters InterSystems IRIS Database Platform lets you: Scale up and scale out Scale users and scale data Mix and match a variety of approaches

More information

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

Fast Innovation requires Fast IT

Fast Innovation requires Fast IT Fast Innovation requires Fast IT Cisco Data Virtualization Puneet Kumar Bhugra Business Solutions Manager 1 Challenge In Data, Big Data & Analytics Siloed, Multiple Sources Business Outcomes Business Opportunity:

More information

Evolution of an Apache Spark Architecture for Processing Game Data

Evolution of an Apache Spark Architecture for Processing Game Data Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead

More information

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Esper EQC. Horizontal Scale-Out for Complex Event Processing Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA

More information

Introduction to Kafka (and why you care)

Introduction to Kafka (and why you care) Introduction to Kafka (and why you care) Richard Nikula VP, Product Development and Support Nastel Technologies, Inc. 2 Introduction Richard Nikula VP of Product Development and Support Involved in MQ

More information

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight ESG Lab Review InterSystems Data Platform: A Unified, Efficient Data Platform for Fast Business Insight Date: April 218 Author: Kerry Dolan, Senior IT Validation Analyst Abstract Enterprise Strategy Group

More information

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32

More information

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT

Big data streaming: Choices for high availability and disaster recovery on Microsoft Azure. By Arnab Ganguly DataCAT : Choices for high availability and disaster recovery on Microsoft Azure By Arnab Ganguly DataCAT March 2019 Contents Overview... 3 The challenge of a single-region architecture... 3 Configuration considerations...

More information

Building Durable Real-time Data Pipeline

Building Durable Real-time Data Pipeline Building Durable Real-time Data Pipeline Apache BookKeeper at Twitter @sijieg Twitter Background Layered Architecture Agenda Design Details Performance Scale @Twitter Q & A Publish-Subscribe Online services

More information

MapR Enterprise Hadoop

MapR Enterprise Hadoop 2014 MapR Technologies 2014 MapR Technologies 1 MapR Enterprise Hadoop Top Ranked Cloud Leaders 500+ Customers 2014 MapR Technologies 2 Key MapR Advantage Partners Business Services APPLICATIONS & OS ANALYTICS

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST

Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0. WEBINAR MAY 15 th, PM EST 10AM PST Making Data Integration Easy For Multiplatform Data Architectures With Diyotta 4.0 WEBINAR MAY 15 th, 2018 1PM EST 10AM PST Welcome and Logistics If you have problems with the sound on your computer, switch

More information

A day in the life of a log message Kyle Liberti, Josef

A day in the life of a log message Kyle Liberti, Josef A day in the life of a log message Kyle Liberti, Josef Karasek @Pepe_CZ Order is vital for scale Abstractions make systems manageable Problems of Distributed Systems Reliability Data throughput Latency

More information

Security and Performance advances with Oracle Big Data SQL

Security and Performance advances with Oracle Big Data SQL Security and Performance advances with Oracle Big Data SQL Jean-Pierre Dijcks Oracle Redwood Shores, CA, USA Key Words SQL, Oracle, Database, Analytics, Object Store, Files, Big Data, Big Data SQL, Hadoop,

More information

How to Route Internet Traffic between A Mobile Application and IoT Device?

How to Route Internet Traffic between A Mobile Application and IoT Device? Whitepaper How to Route Internet Traffic between A Mobile Application and IoT Device? Website: www.mobodexter.com www.paasmer.co 1 Table of Contents 1. Introduction 3 2. Approach: 1 Uses AWS IoT Setup

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

WHITEPAPER. MemSQL Enterprise Feature List

WHITEPAPER. MemSQL Enterprise Feature List WHITEPAPER MemSQL Enterprise Feature List 2017 MemSQL Enterprise Feature List DEPLOYMENT Provision and deploy MemSQL anywhere according to your desired cluster configuration. On-Premises: Maximize infrastructure

More information

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast

More information

Designing High-Performance Data Structures for MongoDB

Designing High-Performance Data Structures for MongoDB Designing High-Performance Data Structures for MongoDB The NoSQL Data Modeling Imperative Danny Sandwell, Product Marketing, erwin, Inc. Leigh Weston, Product Manager, erwin, Inc. Learn More at erwin.com

More information

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions

1 Big Data Hadoop. 1. Introduction About this Course About Big Data Course Logistics Introductions Big Data Hadoop Architect Online Training (Big Data Hadoop + Apache Spark & Scala+ MongoDB Developer And Administrator + Apache Cassandra + Impala Training + Apache Kafka + Apache Storm) 1 Big Data Hadoop

More information

REAL-TIME ANALYTICS WITH APACHE STORM

REAL-TIME ANALYTICS WITH APACHE STORM REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-

More information

MAPR DATA GOVERNANCE WITHOUT COMPROMISE

MAPR DATA GOVERNANCE WITHOUT COMPROMISE MAPR TECHNOLOGIES, INC. WHITE PAPER JANUARY 2018 MAPR DATA GOVERNANCE TABLE OF CONTENTS EXECUTIVE SUMMARY 3 BACKGROUND 4 MAPR DATA GOVERNANCE 5 CONCLUSION 7 EXECUTIVE SUMMARY The MapR DataOps Governance

More information

microsoft

microsoft 70-775.microsoft Number: 70-775 Passing Score: 800 Time Limit: 120 min Exam A QUESTION 1 Note: This question is part of a series of questions that present the same scenario. Each question in the series

More information

Hortonworks and The Internet of Things

Hortonworks and The Internet of Things Hortonworks and The Internet of Things Dr. Bernhard Walter Solutions Engineer About Hortonworks Customer Momentum ~700 customers (as of November 4, 2015) 152 customers added in Q3 2015 Publicly traded

More information

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved.

Gain Insights From Unstructured Data Using Pivotal HD. Copyright 2013 EMC Corporation. All rights reserved. Gain Insights From Unstructured Data Using Pivotal HD 1 Traditional Enterprise Analytics Process 2 The Fundamental Paradigm Shift Internet age and exploding data growth Enterprises leverage new data sources

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

An Information Asset Hub. How to Effectively Share Your Data

An Information Asset Hub. How to Effectively Share Your Data An Information Asset Hub How to Effectively Share Your Data Hello! I am Jack Kennedy Data Architect @ CNO Enterprise Data Management Team Jack.Kennedy@CNOinc.com 1 4 Data Functions Your Data Warehouse

More information

OPENSTACK BEIJING CONFERENCE. by: Steven Hallett Head of Cloud Infrastructure Engineering and Operations

OPENSTACK BEIJING CONFERENCE. by: Steven Hallett Head of Cloud Infrastructure Engineering and Operations OPENSTACK BEIJING CONFERENCE by: Steven Hallett Head of Cloud Infrastructure Engineering and Operations August 10, 2012 Agenda X.commerce, an Introduction Platform Vision Current Status The Opportunity

More information

Scalable Streaming Analytics

Scalable Streaming Analytics Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according

More information

Enable IoT Solutions using Azure

Enable IoT Solutions using Azure Internet Of Things A WHITE PAPER SERIES Enable IoT Solutions using Azure 1 2 TABLE OF CONTENTS EXECUTIVE SUMMARY INTERNET OF THINGS GATEWAY EVENT INGESTION EVENT PERSISTENCE EVENT ACTIONS 3 SYNTEL S IoT

More information

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka

WHITE PAPER. Reference Guide for Deploying and Configuring Apache Kafka WHITE PAPER Reference Guide for Deploying and Configuring Apache Kafka Revised: 02/2015 Table of Content 1. Introduction 3 2. Apache Kafka Technology Overview 3 3. Common Use Cases for Kafka 4 4. Deploying

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam

Percona Live September 21-23, 2015 Mövenpick Hotel Amsterdam Percona Live 2015 September 21-23, 2015 Mövenpick Hotel Amsterdam MongoDB, Elastic, and Hadoop: The What, When, and How Kimberly Wilkins Principal Engineer/Database Denizen ObjectRocket/Rackspace kimberly@objectrocket.com

More information

The Future of Real-Time in Spark

The Future of Real-Time in Spark The Future of Real-Time in Spark Reynold Xin @rxin Spark Summit, New York, Feb 18, 2016 Why Real-Time? Making decisions faster is valuable. Preventing credit card fraud Monitoring industrial machinery

More information

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS

OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS OPERATIONALIZING MACHINE LEARNING USING GPU ACCELERATED, IN-DATABASE ANALYTICS 1 Why GPUs? A Tale of Numbers 100x Performance Increase Infrastructure Cost Savings Performance 100x gains over traditional

More information

Lambda Architecture for Batch and Stream Processing. October 2018

Lambda Architecture for Batch and Stream Processing. October 2018 Lambda Architecture for Batch and Stream Processing October 2018 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Notices This document is provided for informational purposes only.

More information

Deploying, Managing and Reusing R Models in an Enterprise Environment

Deploying, Managing and Reusing R Models in an Enterprise Environment Deploying, Managing and Reusing R Models in an Enterprise Environment Making Data Science Accessible to a Wider Audience Lou Bajuk-Yorgan, Sr. Director, Product Management Streaming and Advanced Analytics

More information

A Single Source of Truth

A Single Source of Truth A Single Source of Truth is it the mythical creature of data management? In the world of data management, a single source of truth is a fully trusted data source the ultimate authority for the particular

More information

70-532: Developing Microsoft Azure Solutions

70-532: Developing Microsoft Azure Solutions 70-532: Developing Microsoft Azure Solutions Exam Design Target Audience Candidates of this exam are experienced in designing, programming, implementing, automating, and monitoring Microsoft Azure solutions.

More information

EMC s IT TRANSFORMATION

EMC s IT TRANSFORMATION EMC s IT TRANSFORMATION Sanjay Mirchandani Chief Information Officer 1 EMC IT At A Glance INTERNAL USERS IT ENVIRONMENT BUSINESS APPLICATIONS VIRTUALIZATION 2004 24,000 5 DATA CENTERS, 960 TB STORAGE ~400

More information

70-532: Developing Microsoft Azure Solutions

70-532: Developing Microsoft Azure Solutions 70-532: Developing Microsoft Azure Solutions Objective Domain Note: This document shows tracked changes that are effective as of January 18, 2018. Create and Manage Azure Resource Manager Virtual Machines

More information

The Evolution of Big Data Platforms and Data Science

The Evolution of Big Data Platforms and Data Science IBM Analytics The Evolution of Big Data Platforms and Data Science ECC Conference 2016 Brandon MacKenzie June 13, 2016 2016 IBM Corporation Hello, I m Brandon MacKenzie. I work at IBM. Data Science - Offering

More information

Microservices with Kafka Ecosystem. Guido Schmutz

Microservices with Kafka Ecosystem. Guido Schmutz Microservices with Kafka Ecosystem Guido Schmutz @gschmutz doag2017 Guido Schmutz Working at Trivadis for more than 20 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

Installing and configuring Apache Kafka

Installing and configuring Apache Kafka 3 Installing and configuring Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Kafka...3 Prerequisites... 3 Installing Kafka Using Ambari... 3... 9 Preparing the Environment...9

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

Talend Big Data Sandbox. Big Data Insights Cookbook

Talend Big Data Sandbox. Big Data Insights Cookbook Overview Pre-requisites Setup & Configuration Hadoop Distribution Download Demo (Scenario) Overview Pre-requisites Setup & Configuration Hadoop Distribution Demo (Scenario) About this cookbook What is

More information

Building Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer

Building Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer Building Event Driven Architectures using OpenEdge CDC Richard Banville, Fellow, OpenEdge Development Dan Mitchell, Principal Sales Engineer October 26, 2018 Agenda Change Data Capture (CDC) Overview Configuring

More information

Tools for Social Networking Infrastructures

Tools for Social Networking Infrastructures Tools for Social Networking Infrastructures 1 Cassandra - a decentralised structured storage system Problem : Facebook Inbox Search hundreds of millions of users distributed infrastructure inbox changes

More information

Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018

Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 Un'introduzione a Kafka Streams e KSQL and why they matter! ITOUG Tech Day Roma 1 Febbraio 2018 R E T H I N K I N G Stream Processing with Apache Kafka Kafka the Streaming Data Platform 1.0 Enterprise

More information

zspotlight: Spark on z/os

zspotlight: Spark on z/os zspotlight: Spark on z/os Avijit Chatterjee, Ph.D. achatter@us.ibm.com, @ChatterAvijit STSM, IBM Competitive Project Office 1 CEOs are increasingly focused on customers as individuals leveraging contextual

More information

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt Putting together the platform: Riak, Redis, Solr and Spark Bryan Hunt 1 $ whoami Bryan Hunt Client Services Engineer @binarytemple 2 Minimum viable product - the ideologically correct doctrine 1. Start

More information

Oracle NoSQL Database Enterprise Edition, Version 18.1

Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database Enterprise Edition, Version 18.1 Oracle NoSQL Database is a scalable, distributed NoSQL database, designed to provide highly reliable, flexible and available data management across

More information

Cloud Analytics and Business Intelligence on AWS

Cloud Analytics and Business Intelligence on AWS Cloud Analytics and Business Intelligence on AWS Enterprise Applications Virtual Desktops Sharing & Collaboration Platform Services Analytics Hadoop Real-time Streaming Data Machine Learning Data Warehouse

More information