10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case

Size: px
Start display at page:

Download "10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case"

Transcription

1 10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case May 16, 2017 NTT DATA CorporaAon Naoto Umemori, Yuji Hagiwara 2017 NTT DATA Corporation

2 Contents 1. Project outlines 2. Tips and pikalls from IoT use case: Tunes Performance Deals with unusual OperaAons Availability pikalls 3. Summary 2017 NTT DATA Corporation 2

3 Project Outlines 2017 NTT DATA Corporation 3

4 About us n Who are we? Naoto Umemori : Platform Engineer Yuji Hagiwara : Platform Engineer n OSS professional headquarter in NTT Data Corp. n Our main target IoT (Connected Vehicle) Cloud technology (OpenStack, Docker, ) Automation of platforms 2017 NTT DATA Corporation 4

5 Our Target: Connected Vehicle The assumed volume for connected vehicle Amount of ConnecAons > 1 million Simultaneous connec"ons Amount of TransacAons > 100k TPS Total Data rate > 100Gbps 2017 NTT DATA Corporation 5

6 Apache Ka7a: A distributed streaming plakorm Apache Ka7a is a distributed streaming plakorm as having three key capabiliaes: Publish/Subscribe is similar to a message queue Store streams of records in a fault tolerant way Process streams of records We have used Ka4a as a Messaging System in our IoT planorm. hpps://ka4a.apache.org/intro 2017 NTT DATA Corporation 6

7 Overview of Our IoT PlaKorm Key Architecture: Separation of Stream and Batch processing unit Devices Sensors Connect Connection & Collection Store IoT Platform Accumulation & Conversion Stream Proc. unit Utilize Analysis Applications Biz Systems Inventory info. Mobile phones Servers NW devices Data stores for Stream Batch Proc. unit Data stores for Batch Multiple Data stores for Analysis Map info. Traffic info. Auto mobile Assist Monitoring & Visualization Distribution User info NTT DATA Corporation 7

8 Architecture of Our IoT PlaKorm Devices IoT Platform Applications Device info. Collection Accumulation & Conversion Analysis Device info. Device info. Device info. Device info. Gateway (Kafka Producer) Message Broker Stream processing Data Buffering Stream process unit Batch process unit Batch Proc. Stream Data stores Archive Data stores Temporary Data stores Analysis ETL Analysis Data stores Analysis API Real-time Analysis API Monitoring & Visualization Distribution 2017 NTT DATA Corporation 8

9 Tips and pinalls from IoT use case 2017 NTT DATA Corporation 9

10 Tips and pikalls from IoT use case Tunes Performance Disk I/O of Ka4a Broker Concurrency of Ka4a Producer The number of Par""ons Async/Sync Bridge Deals with unusual OperaAons Offset Monitoring Purging Ka4a Topics Slow Pub/Sub Log Availability pikalls Undesired RAID Group Unstable Ka4a Topics A huge number of Par""ons makes Cluster unhealthy 2017 NTT DATA Corporation 10

11 Tips and pinalls from IoT use case: Tunes Performance 2017 NTT DATA Corporation 11

12 1. Disk I/O of Ka7a Broker Issues: Data wri"ng speed slow downed and the throughput degrada"on occurred The amount of data is too large, exceeding the upper limit of Buffer Cache When cache flush occurred, data could not be wripen and IOPS decreased Solu"ons/Ac"ons: We restricted the flow rate to the throughput which becomes IOPS or less considering Cache flush Result: Ka4a cluster is working stably Prac"ce: We need to take account impact of buffer cache flush in Ka4a Cluster sizing since Ka4a mainly requires Disk I/O resource 2017 NTT DATA Corporation 12

13 2. Concurrency of Ka7a Producer (ka7a-client ) Issues: Throughput satura"on Ka4a Broker seems not to be CPU, MEM, NW and Disk I/O satura"on Ka4a Producer seems not to be CPU and NW I/O satura"on Unknown where the bople neck is Solu"ons/Ac"ons: Watch Java thread dump (jstack) to find out what happening is Result: Found the bople neck - The usage of a CPU core reached to 100% Sender Java thread of Ka4a Producer is busy Prac"ce: Mul"ple process may be good idea 2017 NTT DATA Corporation 13

14 2. Concurrency of Ka7a Producer (ka7a-client ) Issues: Sender is a single Java thread Ka7a Producer (This is Gateway in our case) Ka4a Producer (ka4a-client) RecordAccumulator Sender Data Source(s) Worker (User s AP) send() append() Request Batch Batch Batch drain() Ka4a Broker(s) 2017 NTT DATA Corporation 14

15 3. The number of ParAAons Issues: There may be not right answer to choice of the number of par""ons If the number of par""ons is: too few: The performance of Consumer can t be scale too many: The latency from producing to consuming increase Solu"ons/Ac"ons: We have decided it by mul"-"mes of the number of physical disks heuris"cally Result: Two "mes to four "mes of the number of physical disks looks good Prac"ce: Depending on the physical performance of the disk and the number of messages Reference: hpps:// NTT DATA Corporation 15

16 4. Async/Sync Bridge Issues: When connec"ng Ka4a and something like Things of IoT to collect massive data from Things, each communica"on mode may not match: Ka4a: Asynchronous mode Things : Synchronous mode Solu"ons/Ac"ons: The approach so that Ka4a Producer should be more high performance is below: Few Ka4a Producer accept the connec"ons from many Things Things supports asynchronous communica"on Result: We could get the producer to be more performance un"l the limit of CPU Prac"ce: Need to take account the communica"on mode 2017 NTT DATA Corporation 16

17 4. Async/Sync Bridge: An example of a sequence diagram when the number of devices is 1 Asynchronous Synchronous (Turn around Ame = t1[s]) Throughput of Kafka Producer: T1 = 1 / t1 [TPS] 2017 NTT DATA Corporation 17

18 4. Async/Sync Bridge: An example of a sequence diagram when the number of devices is 3 Sync. t2 [s] Asynchronous Synchronous (Turn around Ame = t2[s) Sync. t2 [s] Sync. t2 [s] Throughput of Kafka Producer: T2 = 3 / t2 [TPS] (T2 > T1, normally) 2017 NTT DATA Corporation 18

19 Tips and pinalls from IoT use case: Deals with unusual Opera"ons 2017 NTT DATA Corporation 19

20 5. Offset Monitoring Issues: We want to monitor the difference between Produce Offset and Fetch Offset. For preven"ng a performance problem caused by Caching out. Solu"ons/Ac"ons: Visualize Offsets Get Produce Offset # bin/kafka-topics.sh --describe --topic <topic> Get Fetch Offset (for storm-ka4a 1.0.1) # zookeeper-cli get /<zkroot>/<id>/<topic>/<partition> Result: We could get a ka4a cluster sizing for stabilizing. Prac"ce: Understand your workload by monitoring performance metrics NTT DATA Corporation 20

21 5. Example of Offset Monitoring by Grafana We are using the visualizer as Grafana. The difference between Produce Offset and Fetch Offset 2017 NTT DATA Corporation 21

22 6. Purging Ka7a Topics Issues: Posterior to 0.8.2, Ka4a support Topic (logical) dele"on. but we cannot create a topic with a same name: inconvenience for regression test. If we delete ka4a segment files, the topic wouldn t be deleted. Solu"ons/Ac"ons: We operate this procedure in order: # bin/ka4a-topic.sh --delete topic <topic> (Logical Dele"on) # bin/ka4a-server-stop.sh (Stop the server) # rm <directory of ka4a log> (Delete segment files) # sysctl -w vm.drop_caches=3 (Drop caches) # bin/ka4a-server-start.sh (Start the server) Result: The topic was deleted successfully. Prac"ce: Define opera"on procedure strictly. Observe the order of instruc"ons NTT DATA Corporation 22

23 7. Slow Pub/Sub Log Issues: We want to iden"fy the performance bopleneck for analyzing a performance problem related to Ka4a. We can use Resource Monitoring / Broker metrics. Good point: Overview of Ka4a load (it is busy or not.) Bad point: No detail for performance of each requests. Solu"ons/Ac"ons: We measure processing "me on Producing/Consuming by our own applica"on implementa"on. (it is similar to Slowlog like RDB) Result: We can iden"fy the slow process and improve it. Prac"ce: Necessary of Slowlog Feature. You can implement your own measurement NTT DATA Corporation 23

24 Tips and pinalls from IoT use case: Availability pinalls 2017 NTT DATA Corporation 24

25 8. Undesired RAID Group Issues: We want to use Ka4a without RAID. * In generally, Ka4a has deployed to servers with mul" HDD connected to the RAID controller. Some cheap RAID controller don t support 1 Logical Volume / Physical Volume. Solu"ons/Ac"ons: Using RAID-0, Unfortunately. Result: We cannot configure the Ka4a cluster without RAID. Prac"ce: Don t buy cheap RAID controller. Check and compare specifica"ons. * The choice(using RAID or not) has tradeoffs(hpps://ka4a.apache.org/documenta"on/#diskandfs) but In IoT, No reason to use Ka4a with RAID if Each devices send the same amount of data NTT DATA Corporation 25

26 9. Unstable Ka7a Topics Issues: A part of topics wouldn t subscribed by the Storm applica"on. when we create topics aqer launching the Storm applica"on using Ka4aSpout (stormka4a 1.0.1) to subscribe topics with wildcards. Solu"ons/Ac"ons: Create topics before launching the applica"on. Check these topics are created: # bin/kafka-topics.sh -- describe Result: The Storm Applica"on subscribed all of topics successfully. Prac"ce: Crea"ng topics are very heavy opera"on. Confirm that the opera"on succeeded aqer execu"on NTT DATA Corporation 26

27 10. A huge number of ParAAons makes Cluster unhealthy Issues: Ka4a Brokers some"mes drop from the cluster during run"me. When we created 1,000 topics (96 par""ons/topic), and the Storm applica"on with Ka4aSpout (storm-ka4a 1.0.1) consume these topics. Ka4a Brokers weren t crashed. Only Zookeepers put logs such as below: WARN [NIOServerCxn.Factory: / :2181:NIOServerCnxn@357] caught end of stream exception Solu"ons/Ac"ons: Decreasing Topics per Ka4a cluster. Huge par""ons and/or Huge consumers caused an Overload of Zookeeper. Ka4aSpout will be seung Fetch Offset of each par""ons to Zookeeper every fetching. Result: The Ka4a cluster is stabilizing to healthy. Prac"ce: Check implementa"ons surrounding Ka4a NTT DATA Corporation 27

28 Summary 2017 NTT DATA Corporation 28

29 Summary We introduced the 10 things that we learn from our IoT use cases. These things are important when we op"mize the performance and we operate. Please be careful as these things cannot be learned from the documents only NTT DATA Corporation 29

30 Disclaimer 1. Any product name, service name, soiware name and other marks are trade mark or registered mark of corresponding companies. 2. This presentaaon is in a purpose of providing the knowledge gained from our acaviaes on IoT field. 3. A presenter and NTT DATA CorporaAon provide informaaon in asis basis and have no responsiveness for results that you got according to informaaon in this presentaaon material NTT DATA Corporation 30

31 Any quesaons? Naoto Umemori Yuji Hagiwara 2017 NTT DATA Corporation

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent

Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage

More information

REAL-TIME ANALYTICS WITH APACHE STORM

REAL-TIME ANALYTICS WITH APACHE STORM REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-

More information

Installing and configuring Apache Kafka

Installing and configuring Apache Kafka 3 Installing and configuring Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Kafka...3 Prerequisites... 3 Installing Kafka Using Ambari... 3... 9 Preparing the Environment...9

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data

More information

SEDA An architecture for Well Condi6oned, scalable Internet Services

SEDA An architecture for Well Condi6oned, scalable Internet Services SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October

More information

IEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services

IEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services IEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services Lecture 11 - Asynchronous Tasks and Message Queues Albert Au Yeung 22nd November, 2018 1 / 53 Asynchronous Tasks 2 / 53 Client

More information

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira

More information

Sizing Guidelines and Performance Tuning for Intelligent Streaming

Sizing Guidelines and Performance Tuning for Intelligent Streaming Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the

More information

Auto Management for Apache Kafka and Distributed Stateful System in General

Auto Management for Apache Kafka and Distributed Stateful System in General Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies

More information

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Opera&ng Systems: Principles and Prac&ce. Tom Anderson

Opera&ng Systems: Principles and Prac&ce. Tom Anderson Opera&ng Systems: Principles and Prac&ce Tom Anderson How This Course Fits in the UW CSE Curriculum CSE 333: Systems Programming Project experience in C/C++ How to use the opera&ng system interface CSE

More information

Scalable Streaming Analytics

Scalable Streaming Analytics Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according

More information

NFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges

NFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges NFS Overview Sharing files is useful Network file systems give users seamless integra>on of a shared file system with the local file system Many op>ons: NFS, SMB/CIFS, AFS, etc. Security an important considera>on

More information

Fluentd + MongoDB + Spark = Awesome Sauce

Fluentd + MongoDB + Spark = Awesome Sauce Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision

More information

NFS. CSE/ISE 311: Systems Administra5on

NFS. CSE/ISE 311: Systems Administra5on NFS CSE/ISE 311: Systems Administra5on Sharing files is useful Overview Network file systems give users seamless integra8on of a shared file system with the local file system Many op8ons: NFS, SMB/CIFS,

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

CSE Opera,ng System Principles

CSE Opera,ng System Principles CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which

More information

MapReduce. Cloud Computing COMP / ECPE 293A

MapReduce. Cloud Computing COMP / ECPE 293A Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems

More information

Creating a Recommender System. An Elasticsearch & Apache Spark approach

Creating a Recommender System. An Elasticsearch & Apache Spark approach Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused

More information

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 Wednesday 28 th June, 2017 rkafka Shruti Gupta Wednesday 28 th June, 2017 Contents 1 Introduction

More information

Flying Faster with Heron

Flying Faster with Heron Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN I! II ( III b OVERVIEW MOTIVATION HERON IV Z OPERATIONAL EXPERIENCES V K HERON PERFORMANCE END [! OVERVIEW TWITTER IS

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

CS / Cloud Compu1ng. Recita1on 8 March 4 th and 6 th, 2014

CS / Cloud Compu1ng. Recita1on 8 March 4 th and 6 th, 2014 CS15-319 / 15-619 Cloud Compu1ng Recita1on 8 March 4 th and 6 th, 2014 Announcements Encounter a general bug: Post on Piazza Encounter a grading bug: Post Privately on Piazza Don t ask if my answer is

More information

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies

Informa)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:

More information

No compromises: distributed transac2ons with consistency, availability, and performance

No compromises: distributed transac2ons with consistency, availability, and performance No compromises: distributed transac2ons with consistency, availability, and performance Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nigh2ngale, MaDhew Renzelmann, Alex Shamis, Anirudh Badam,

More information

Oracle Database 12c: JMS Sharded Queues

Oracle Database 12c: JMS Sharded Queues Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE

More information

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Esper EQC. Horizontal Scale-Out for Complex Event Processing Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA

More information

20+ MILLION RECORDS A SECOND

20+ MILLION RECORDS A SECOND 20+ MILLION RECORDS A SECOND Running Kafka with Dell EMC Isilon All Flash F800 Scale-out NAS Author: Boni Bruno, CISSP, CISM, CGEIT Chief Solutions Architect, Dell EMC Abstract This paper describes performance

More information

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng

How to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng How to sleep *ght and keep your applica*ons running on IPv6 transi*on The importance of IPv6 Applica*on Tes*ng About this presenta*on It presents a generic methodology to test the IPv6 func*onality of

More information

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka

Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another

More information

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services

Deep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure

More information

WebLogic JMS System Best Practices Daniel Joray Trivadis AG Bern

WebLogic JMS System Best Practices Daniel Joray Trivadis AG Bern WebLogic JMS System Best Practices Daniel Joray Trivadis AG Bern Keywords Weblogic, JMS, Performance, J2EE Introduction In many J2EE project the Java Message Service JMS is for exchange of information

More information

Enhancing cloud applications by using messaging services IBM Corporation

Enhancing cloud applications by using messaging services IBM Corporation Enhancing cloud applications by using messaging services After you complete this section, you should understand: Messaging use cases, benefits, and available APIs in the Message Hub service Message Hub

More information

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic

More information

Effec%vely using Amazon Web Services

Effec%vely using Amazon Web Services Effec%vely using Amazon Web Services Hobin Yoon, Jim Donahue, Ada Gavrilovska, Karsten Schwan CERCS, Georgia Tech ATL, Adobe Systems Message latency of SQS (Simple Queue Service) Op%mizing upload performance

More information

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona

MySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking

More information

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios

Breaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios Breaking Down Barriers: An Intro to GPU Synchronization Matt Pettineo Lead Engine Programmer Ready At Dawn Studios Who am I? Ready At Dawn for 9 years Lead Engine Programmer for 5 I like GPUs and APIs!

More information

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017. Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate

More information

Reliable Distributed Messaging with HornetQ

Reliable Distributed Messaging with HornetQ Reliable Distributed Messaging with HornetQ Lin Zhao Software Engineer, Groupon lin@groupon.com Agenda Introduction MessageBus Design Client API Monitoring Comparison with HornetQ Cluster Future Work Introduction

More information

Logisland Event mining at scale. Thomas [ ]

Logisland Event mining at scale. Thomas [ ] Logisland Event mining at scale Thomas Bailet @hurence [2017-01-19] Overview Logisland provides a stream analy0cs solu0on that can handle all enterprise-scale event data and processing Big picture Open

More information

OS-caused Long JVM Pauses - Deep Dive and Solutions

OS-caused Long JVM Pauses - Deep Dive and Solutions OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction

More information

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common

More information

Kafka Connect the Dots

Kafka Connect the Dots Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..)

More information

Outline. Failure Types

Outline. Failure Types Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 10 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus

More information

Outline. Spanner Mo/va/on. Tom Anderson

Outline. Spanner Mo/va/on. Tom Anderson Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable

More information

h7ps://bit.ly/citustutorial

h7ps://bit.ly/citustutorial Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul

More information

Confinement (Running Untrusted Programs)

Confinement (Running Untrusted Programs) Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs How to run untrusted programs and not harm your system? Answer: Confinement (some:mes called

More information

SCALE AND SECURE MOBILE / IOT MQTT TRAFFIC

SCALE AND SECURE MOBILE / IOT MQTT TRAFFIC APPLICATION NOTE SCALE AND SECURE MOBILE / IOT TRAFFIC Connecting millions of devices requires a simple implementation for fast deployments, adaptive security for protection against hacker attacks, and

More information

SAS Event Stream Processing 5.1: Advanced Topics

SAS Event Stream Processing 5.1: Advanced Topics SAS Event Stream Processing 5.1: Advanced Topics Starting Streamviewer from the Java Command Line Follow these instructions if you prefer to start Streamviewer from the Java command prompt. You must know

More information

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast

More information

Lies, Damn Lies and Benchmarks: How to Accurately Measure Distributed Application Performance. Heinz Schaffner

Lies, Damn Lies and Benchmarks: How to Accurately Measure Distributed Application Performance. Heinz Schaffner Lies, Damn Lies and Benchmarks: How to Accurately Measure Distributed Application Performance Heinz Schaffner Science Projects vs. Production Testing to Destruction vs. Distressed Processing Latency Schemes

More information

Apache Kafka Your Event Stream Processing Solution

Apache Kafka Your Event Stream Processing Solution Apache Kafka Your Event Stream Processing Solution Introduction Data is one among the newer ingredients in the Internet-based systems and includes user-activity events related to logins, page visits, clicks,

More information

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses

More information

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Upgrade Your MuleESB with Solace s Messaging Infrastructure The era of ubiquitous connectivity is upon us. The amount of data most modern enterprises must collect, process and distribute is exploding as a result of real-time process flows, big data, ubiquitous

More information

Bringing Multi-Threading to the I/O Level

Bringing Multi-Threading to the I/O Level Bringing Multi-Threading to the I/O Level Rodrigo Medeiros Solutions Architect, Data Platforms @rationalrodrigo @DataCore Copyright 2018 DataCore Software Corp. All Rights Reserved. Objectives Discuss

More information

Evolution of an Apache Spark Architecture for Processing Game Data

Evolution of an Apache Spark Architecture for Processing Game Data Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead

More information

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32

More information

Indirect Communication

Indirect Communication Indirect Communication Vladimir Vlassov and Johan Montelius KTH ROYAL INSTITUTE OF TECHNOLOGY Time and Space In direct communication sender and receivers exist in the same time and know of each other.

More information

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud

Scale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud Scale out Read Only Workload by sharing data files of InnoDB Zhai weixiang Alibaba Cloud Who Am I - My Name is Zhai Weixiang - I joined in Alibaba in 2011 and has been working on MySQL since then - Mainly

More information

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay

Submitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

Lecture 11 Hadoop & Spark

Lecture 11 Hadoop & Spark Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem

More information

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect

Big Data. Big Data Analyst. Big Data Engineer. Big Data Architect Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION

More information

Caching and Demand- Paged Virtual Memory

Caching and Demand- Paged Virtual Memory Caching and Demand- Paged Virtual Memory Defini8ons Cache Copy of data that is faster to access than the original Hit: if cache has copy Miss: if cache does not have copy Cache block Unit of cache storage

More information

The Lion of storage systems

The Lion of storage systems The Lion of storage systems Rakuten. Inc, Yosuke Hara Mar 21, 2013 1 The Lion of storage systems http://www.leofs.org LeoFS v0.14.0 was released! 2 Table of Contents 1. Motivation 2. Overview & Inside

More information

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion

More information

Chapter 11: File System Implementation. Objectives

Chapter 11: File System Implementation. Objectives Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block

More information

MySQL Database Scalability

MySQL Database Scalability MySQL Database Scalability Nextcloud Conference 2016 TU Berlin Oli Sennhauser Senior MySQL Consultant at FromDual GmbH oli.sennhauser@fromdual.com 1 / 14 About FromDual GmbH Support Consulting remote-dba

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems

More information

FUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes

FUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Release Notes J2UL-2170-01ENZ0(00) July 2016 Contents Contents About this Manual... 4 1 What's New?...6 1.1 Performance Improvements... 6 1.2

More information

VOLTDB + HP VERTICA. page

VOLTDB + HP VERTICA. page VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics

More information

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák

Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures Mar>n Rehák Mo>va>on Internet- based business models imposed new requirements on computa>onal architectures

More information

Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi

Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Electrical and Computer Engineering Dept. Northeastern University ningfang@ece.neu.edu 1 Research Focus To investigate

More information

PaaS SAE Top3 SuperAPP

PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP Pla$orm Services Group Sam Biwing Monika Rambone Skylee Kingho1d AWS S3 CDN ATS 1k 30+ 10+ Go FE Services Panel C++ Go C/C++ ACM FE Pla$orm Services Group

More information

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee

davidklee.net gplus.to/kleegeek linked.com/a/davidaklee @kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture

More information

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on

Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra

More information

hashfs Applying Hashing to Op2mize File Systems for Small File Reads

hashfs Applying Hashing to Op2mize File Systems for Small File Reads hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design

More information

Apache Storm. A framework for Parallel Data Stream Processing

Apache Storm. A framework for Parallel Data Stream Processing Apache Storm A framework for Parallel Data Stream Processing Storm Storm is a distributed real- ;me computa;on pla

More information

Large- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015

Large- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015 Large- Scale Sor,ng: Breaking World Records Mike Conley CSE 124 Guest Lecture 12 March 2015 Sor,ng Given an array of items, put them in order 5 2 8 0 2 5 4 9 0 1 0 0 0 0 0 0 1 2 2 4 5 5 8 9 Many algorithms

More information

UNIX Sockets. COS 461 Precept 1

UNIX Sockets. COS 461 Precept 1 UNIX Sockets COS 461 Precept 1 Socket and Process Communica;on application layer User Process Socket transport layer (TCP/UDP) OS network stack network layer (IP) link layer (e.g. ethernet) Internet Internet

More information

IBM InfoSphere Streams v4.0 Performance Best Practices

IBM InfoSphere Streams v4.0 Performance Best Practices Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related

More information

Designing MQ deployments for the cloud generation

Designing MQ deployments for the cloud generation Designing MQ deployments for the cloud generation WebSphere User Group, London Arthur Barr, Senior Software Engineer, IBM MQ 30 th March 2017 Top business drivers for cloud 2 Source: OpenStack user survey,

More information

Introduction to Kafka (and why you care)

Introduction to Kafka (and why you care) Introduction to Kafka (and why you care) Richard Nikula VP, Product Development and Support Nastel Technologies, Inc. 2 Introduction Richard Nikula VP of Product Development and Support Involved in MQ

More information

Cloud Monitoring as a Service. Built On Machine Learning

Cloud Monitoring as a Service. Built On Machine Learning Cloud Monitoring as a Service Built On Machine Learning Table of Contents 1 2 3 4 5 6 7 8 9 10 Why Machine Learning Who Cares Four Dimensions to Cloud Monitoring Data Aggregation Anomaly Detection Algorithms

More information

Single and mul,threaded processes

Single and mul,threaded processes 1 Single and mul,threaded processes Why threads? Express concurrency Web server (mul,ple requests), Browser (GUI + network I/O + rendering), most GUI programs for(;;) { struct request *req = get_request();

More information

10 Million Smart Meter Data with Apache HBase

10 Million Smart Meter Data with Apache HBase 10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on

More information

Streaming Log Analytics with Kafka

Streaming Log Analytics with Kafka Streaming Log Analytics with Kafka Kresten Krab Thorup, Humio CTO Log Everything, Answer Anything, In Real-Time. Why this talk? Humio is a Log Analytics system Designed to run on-prem High volume, real

More information

Profiling & Tuning Applica1ons. CUDA Course July István Reguly

Profiling & Tuning Applica1ons. CUDA Course July István Reguly Profiling & Tuning Applica1ons CUDA Course July 21-25 István Reguly Introduc1on Why is my applica1on running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA,

More information

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC

1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC RPC and Clocks Tom Anderson Go Synchroniza>on RPC Lab 1 RPC Last Time 1 Topics MapReduce Fault tolerance Discussion RPC At least once At most once Exactly once Lamport Clocks Mo>va>on MapReduce Fault Tolerance

More information

Time and Space. Indirect communication. Time and space uncoupling. indirect communication

Time and Space. Indirect communication. Time and space uncoupling. indirect communication Time and Space Indirect communication Johan Montelius In direct communication sender and receivers exist in the same time and know of each other. KTH In indirect communication we relax these requirements.

More information

NPTEL Course Jan K. Gopinath Indian Institute of Science

NPTEL Course Jan K. Gopinath Indian Institute of Science Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,

More information

Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University

Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng Jakub Krzywda Umeå University How to use DVFS and CPU Pinning to lower the power consump1on during periods

More information

Applications of Paxos Algorithm

Applications of Paxos Algorithm Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1

More information

NASPInet 2.0 The Evolu4on of Synchrophasor Networks

NASPInet 2.0 The Evolu4on of Synchrophasor Networks NASPInet 2.0 The Evolu4on of Synchrophasor Networks NASPI Working Group Mee4ng San Mateo, California March 24, 2015 Dick Willson and Dan LuKer Allied Partners LLC 1 Agenda Future Synchrophasor Networks

More information

Paxos Replicated State Machines as the Basis of a High- Performance Data Store

Paxos Replicated State Machines as the Basis of a High- Performance Data Store Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a

More information

Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models

Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau * work done while at UW-Madison

More information

Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide

Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide Page i THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT

More information