10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case
|
|
- Conrad Gallagher
- 5 years ago
- Views:
Transcription
1 10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case May 16, 2017 NTT DATA CorporaAon Naoto Umemori, Yuji Hagiwara 2017 NTT DATA Corporation
2 Contents 1. Project outlines 2. Tips and pikalls from IoT use case: Tunes Performance Deals with unusual OperaAons Availability pikalls 3. Summary 2017 NTT DATA Corporation 2
3 Project Outlines 2017 NTT DATA Corporation 3
4 About us n Who are we? Naoto Umemori : Platform Engineer Yuji Hagiwara : Platform Engineer n OSS professional headquarter in NTT Data Corp. n Our main target IoT (Connected Vehicle) Cloud technology (OpenStack, Docker, ) Automation of platforms 2017 NTT DATA Corporation 4
5 Our Target: Connected Vehicle The assumed volume for connected vehicle Amount of ConnecAons > 1 million Simultaneous connec"ons Amount of TransacAons > 100k TPS Total Data rate > 100Gbps 2017 NTT DATA Corporation 5
6 Apache Ka7a: A distributed streaming plakorm Apache Ka7a is a distributed streaming plakorm as having three key capabiliaes: Publish/Subscribe is similar to a message queue Store streams of records in a fault tolerant way Process streams of records We have used Ka4a as a Messaging System in our IoT planorm. hpps://ka4a.apache.org/intro 2017 NTT DATA Corporation 6
7 Overview of Our IoT PlaKorm Key Architecture: Separation of Stream and Batch processing unit Devices Sensors Connect Connection & Collection Store IoT Platform Accumulation & Conversion Stream Proc. unit Utilize Analysis Applications Biz Systems Inventory info. Mobile phones Servers NW devices Data stores for Stream Batch Proc. unit Data stores for Batch Multiple Data stores for Analysis Map info. Traffic info. Auto mobile Assist Monitoring & Visualization Distribution User info NTT DATA Corporation 7
8 Architecture of Our IoT PlaKorm Devices IoT Platform Applications Device info. Collection Accumulation & Conversion Analysis Device info. Device info. Device info. Device info. Gateway (Kafka Producer) Message Broker Stream processing Data Buffering Stream process unit Batch process unit Batch Proc. Stream Data stores Archive Data stores Temporary Data stores Analysis ETL Analysis Data stores Analysis API Real-time Analysis API Monitoring & Visualization Distribution 2017 NTT DATA Corporation 8
9 Tips and pinalls from IoT use case 2017 NTT DATA Corporation 9
10 Tips and pikalls from IoT use case Tunes Performance Disk I/O of Ka4a Broker Concurrency of Ka4a Producer The number of Par""ons Async/Sync Bridge Deals with unusual OperaAons Offset Monitoring Purging Ka4a Topics Slow Pub/Sub Log Availability pikalls Undesired RAID Group Unstable Ka4a Topics A huge number of Par""ons makes Cluster unhealthy 2017 NTT DATA Corporation 10
11 Tips and pinalls from IoT use case: Tunes Performance 2017 NTT DATA Corporation 11
12 1. Disk I/O of Ka7a Broker Issues: Data wri"ng speed slow downed and the throughput degrada"on occurred The amount of data is too large, exceeding the upper limit of Buffer Cache When cache flush occurred, data could not be wripen and IOPS decreased Solu"ons/Ac"ons: We restricted the flow rate to the throughput which becomes IOPS or less considering Cache flush Result: Ka4a cluster is working stably Prac"ce: We need to take account impact of buffer cache flush in Ka4a Cluster sizing since Ka4a mainly requires Disk I/O resource 2017 NTT DATA Corporation 12
13 2. Concurrency of Ka7a Producer (ka7a-client ) Issues: Throughput satura"on Ka4a Broker seems not to be CPU, MEM, NW and Disk I/O satura"on Ka4a Producer seems not to be CPU and NW I/O satura"on Unknown where the bople neck is Solu"ons/Ac"ons: Watch Java thread dump (jstack) to find out what happening is Result: Found the bople neck - The usage of a CPU core reached to 100% Sender Java thread of Ka4a Producer is busy Prac"ce: Mul"ple process may be good idea 2017 NTT DATA Corporation 13
14 2. Concurrency of Ka7a Producer (ka7a-client ) Issues: Sender is a single Java thread Ka7a Producer (This is Gateway in our case) Ka4a Producer (ka4a-client) RecordAccumulator Sender Data Source(s) Worker (User s AP) send() append() Request Batch Batch Batch drain() Ka4a Broker(s) 2017 NTT DATA Corporation 14
15 3. The number of ParAAons Issues: There may be not right answer to choice of the number of par""ons If the number of par""ons is: too few: The performance of Consumer can t be scale too many: The latency from producing to consuming increase Solu"ons/Ac"ons: We have decided it by mul"-"mes of the number of physical disks heuris"cally Result: Two "mes to four "mes of the number of physical disks looks good Prac"ce: Depending on the physical performance of the disk and the number of messages Reference: hpps:// NTT DATA Corporation 15
16 4. Async/Sync Bridge Issues: When connec"ng Ka4a and something like Things of IoT to collect massive data from Things, each communica"on mode may not match: Ka4a: Asynchronous mode Things : Synchronous mode Solu"ons/Ac"ons: The approach so that Ka4a Producer should be more high performance is below: Few Ka4a Producer accept the connec"ons from many Things Things supports asynchronous communica"on Result: We could get the producer to be more performance un"l the limit of CPU Prac"ce: Need to take account the communica"on mode 2017 NTT DATA Corporation 16
17 4. Async/Sync Bridge: An example of a sequence diagram when the number of devices is 1 Asynchronous Synchronous (Turn around Ame = t1[s]) Throughput of Kafka Producer: T1 = 1 / t1 [TPS] 2017 NTT DATA Corporation 17
18 4. Async/Sync Bridge: An example of a sequence diagram when the number of devices is 3 Sync. t2 [s] Asynchronous Synchronous (Turn around Ame = t2[s) Sync. t2 [s] Sync. t2 [s] Throughput of Kafka Producer: T2 = 3 / t2 [TPS] (T2 > T1, normally) 2017 NTT DATA Corporation 18
19 Tips and pinalls from IoT use case: Deals with unusual Opera"ons 2017 NTT DATA Corporation 19
20 5. Offset Monitoring Issues: We want to monitor the difference between Produce Offset and Fetch Offset. For preven"ng a performance problem caused by Caching out. Solu"ons/Ac"ons: Visualize Offsets Get Produce Offset # bin/kafka-topics.sh --describe --topic <topic> Get Fetch Offset (for storm-ka4a 1.0.1) # zookeeper-cli get /<zkroot>/<id>/<topic>/<partition> Result: We could get a ka4a cluster sizing for stabilizing. Prac"ce: Understand your workload by monitoring performance metrics NTT DATA Corporation 20
21 5. Example of Offset Monitoring by Grafana We are using the visualizer as Grafana. The difference between Produce Offset and Fetch Offset 2017 NTT DATA Corporation 21
22 6. Purging Ka7a Topics Issues: Posterior to 0.8.2, Ka4a support Topic (logical) dele"on. but we cannot create a topic with a same name: inconvenience for regression test. If we delete ka4a segment files, the topic wouldn t be deleted. Solu"ons/Ac"ons: We operate this procedure in order: # bin/ka4a-topic.sh --delete topic <topic> (Logical Dele"on) # bin/ka4a-server-stop.sh (Stop the server) # rm <directory of ka4a log> (Delete segment files) # sysctl -w vm.drop_caches=3 (Drop caches) # bin/ka4a-server-start.sh (Start the server) Result: The topic was deleted successfully. Prac"ce: Define opera"on procedure strictly. Observe the order of instruc"ons NTT DATA Corporation 22
23 7. Slow Pub/Sub Log Issues: We want to iden"fy the performance bopleneck for analyzing a performance problem related to Ka4a. We can use Resource Monitoring / Broker metrics. Good point: Overview of Ka4a load (it is busy or not.) Bad point: No detail for performance of each requests. Solu"ons/Ac"ons: We measure processing "me on Producing/Consuming by our own applica"on implementa"on. (it is similar to Slowlog like RDB) Result: We can iden"fy the slow process and improve it. Prac"ce: Necessary of Slowlog Feature. You can implement your own measurement NTT DATA Corporation 23
24 Tips and pinalls from IoT use case: Availability pinalls 2017 NTT DATA Corporation 24
25 8. Undesired RAID Group Issues: We want to use Ka4a without RAID. * In generally, Ka4a has deployed to servers with mul" HDD connected to the RAID controller. Some cheap RAID controller don t support 1 Logical Volume / Physical Volume. Solu"ons/Ac"ons: Using RAID-0, Unfortunately. Result: We cannot configure the Ka4a cluster without RAID. Prac"ce: Don t buy cheap RAID controller. Check and compare specifica"ons. * The choice(using RAID or not) has tradeoffs(hpps://ka4a.apache.org/documenta"on/#diskandfs) but In IoT, No reason to use Ka4a with RAID if Each devices send the same amount of data NTT DATA Corporation 25
26 9. Unstable Ka7a Topics Issues: A part of topics wouldn t subscribed by the Storm applica"on. when we create topics aqer launching the Storm applica"on using Ka4aSpout (stormka4a 1.0.1) to subscribe topics with wildcards. Solu"ons/Ac"ons: Create topics before launching the applica"on. Check these topics are created: # bin/kafka-topics.sh -- describe Result: The Storm Applica"on subscribed all of topics successfully. Prac"ce: Crea"ng topics are very heavy opera"on. Confirm that the opera"on succeeded aqer execu"on NTT DATA Corporation 26
27 10. A huge number of ParAAons makes Cluster unhealthy Issues: Ka4a Brokers some"mes drop from the cluster during run"me. When we created 1,000 topics (96 par""ons/topic), and the Storm applica"on with Ka4aSpout (storm-ka4a 1.0.1) consume these topics. Ka4a Brokers weren t crashed. Only Zookeepers put logs such as below: WARN [NIOServerCxn.Factory: / :2181:NIOServerCnxn@357] caught end of stream exception Solu"ons/Ac"ons: Decreasing Topics per Ka4a cluster. Huge par""ons and/or Huge consumers caused an Overload of Zookeeper. Ka4aSpout will be seung Fetch Offset of each par""ons to Zookeeper every fetching. Result: The Ka4a cluster is stabilizing to healthy. Prac"ce: Check implementa"ons surrounding Ka4a NTT DATA Corporation 27
28 Summary 2017 NTT DATA Corporation 28
29 Summary We introduced the 10 things that we learn from our IoT use cases. These things are important when we op"mize the performance and we operate. Please be careful as these things cannot be learned from the documents only NTT DATA Corporation 29
30 Disclaimer 1. Any product name, service name, soiware name and other marks are trade mark or registered mark of corresponding companies. 2. This presentaaon is in a purpose of providing the knowledge gained from our acaviaes on IoT field. 3. A presenter and NTT DATA CorporaAon provide informaaon in asis basis and have no responsiveness for results that you got according to informaaon in this presentaaon material NTT DATA Corporation 30
31 Any quesaons? Naoto Umemori Yuji Hagiwara 2017 NTT DATA Corporation
Introduc)on to Apache Ka1a. Jun Rao Co- founder of Confluent
Introduc)on to Apache Ka1a Jun Rao Co- founder of Confluent Agenda Why people use Ka1a Technical overview of Ka1a What s coming What s Apache Ka1a Distributed, high throughput pub/sub system Ka1a Usage
More informationREAL-TIME ANALYTICS WITH APACHE STORM
REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4-
More informationInstalling and configuring Apache Kafka
3 Installing and configuring Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Installing Kafka...3 Prerequisites... 3 Installing Kafka Using Ambari... 3... 9 Preparing the Environment...9
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference
More informationUsing the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver
Using the SDACK Architecture to Build a Big Data Product Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver Outline A Threat Analytic Big Data product The SDACK Architecture Akka Streams and data
More informationSEDA An architecture for Well Condi6oned, scalable Internet Services
SEDA An architecture for Well Condi6oned, scalable Internet Services Ma= Welsh, David Culler, and Eric Brewer University of California, Berkeley Symposium on Operating Systems Principles (SOSP), October
More informationIEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services
IEMS 5780 / IERG 4080 Building and Deploying Scalable Machine Learning Services Lecture 11 - Asynchronous Tasks and Message Queues Albert Au Yeung 22nd November, 2018 1 / 53 Asynchronous Tasks 2 / 53 Client
More informationTransformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's
Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira
More informationSizing Guidelines and Performance Tuning for Intelligent Streaming
Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the
More informationAuto Management for Apache Kafka and Distributed Stateful System in General
Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies
More informationSolace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery
Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery Java Message Service (JMS) is a standardized messaging interface that has become a pervasive part of the IT landscape
More informationIntra-cluster Replication for Apache Kafka. Jun Rao
Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture
More informationData Acquisition. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationOpera&ng Systems: Principles and Prac&ce. Tom Anderson
Opera&ng Systems: Principles and Prac&ce Tom Anderson How This Course Fits in the UW CSE Curriculum CSE 333: Systems Programming Project experience in C/C++ How to use the opera&ng system interface CSE
More informationScalable Streaming Analytics
Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according
More informationNFS 3/25/14. Overview. Intui>on. Disconnec>on. Challenges
NFS Overview Sharing files is useful Network file systems give users seamless integra>on of a shared file system with the local file system Many op>ons: NFS, SMB/CIFS, AFS, etc. Security an important considera>on
More informationFluentd + MongoDB + Spark = Awesome Sauce
Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision
More informationNFS. CSE/ISE 311: Systems Administra5on
NFS CSE/ISE 311: Systems Administra5on Sharing files is useful Overview Network file systems give users seamless integra8on of a shared file system with the local file system Many op8ons: NFS, SMB/CIFS,
More informationFlash Storage Complementing a Data Lake for Real-Time Insight
Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum
More informationCSE Opera,ng System Principles
CSE 30341 Opera,ng System Principles Lecture 5 Processes / Threads Recap Processes What is a process? What is in a process control bloc? Contrast stac, heap, data, text. What are process states? Which
More informationMapReduce. Cloud Computing COMP / ECPE 293A
Cloud Computing COMP / ECPE 293A MapReduce Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, In Proceedings of the 6th conference on Symposium on Opera7ng Systems
More informationCreating a Recommender System. An Elasticsearch & Apache Spark approach
Creating a Recommender System An Elasticsearch & Apache Spark approach My Profile SKILLS Álvaro Santos Andrés Big Data & Analytics Solution Architect in Ericsson with more than 12 years of experience focused
More informationrkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1
rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 Wednesday 28 th June, 2017 rkafka Shruti Gupta Wednesday 28 th June, 2017 Contents 1 Introduction
More informationFlying Faster with Heron
Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN I! II ( III b OVERVIEW MOTIVATION HERON IV Z OPERATIONAL EXPERIENCES V K HERON PERFORMANCE END [! OVERVIEW TWITTER IS
More informationA Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers
A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented
More informationCS / Cloud Compu1ng. Recita1on 8 March 4 th and 6 th, 2014
CS15-319 / 15-619 Cloud Compu1ng Recita1on 8 March 4 th and 6 th, 2014 Announcements Encounter a general bug: Post on Piazza Encounter a grading bug: Post Privately on Piazza Don t ask if my answer is
More informationInforma)on Retrieval and Map- Reduce Implementa)ons. Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies
Informa)on Retrieval and Map- Reduce Implementa)ons Mohammad Amir Sharif PhD Student Center for Advanced Computer Studies mas4108@louisiana.edu Map-Reduce: Why? Need to process 100TB datasets On 1 node:
More informationNo compromises: distributed transac2ons with consistency, availability, and performance
No compromises: distributed transac2ons with consistency, availability, and performance Aleksandar Dragojevic, Dushyanth Narayanan, Edmund B. Nigh2ngale, MaDhew Renzelmann, Alex Shamis, Anirudh Badam,
More informationOracle Database 12c: JMS Sharded Queues
Oracle Database 12c: JMS Sharded Queues For high performance, scalable Advanced Queuing ORACLE WHITE PAPER MARCH 2015 Table of Contents Introduction 2 Architecture 3 PERFORMANCE OF AQ-JMS QUEUES 4 PERFORMANCE
More informationEsper EQC. Horizontal Scale-Out for Complex Event Processing
Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA
More information20+ MILLION RECORDS A SECOND
20+ MILLION RECORDS A SECOND Running Kafka with Dell EMC Isilon All Flash F800 Scale-out NAS Author: Boni Bruno, CISSP, CISM, CGEIT Chief Solutions Architect, Dell EMC Abstract This paper describes performance
More informationHow to sleep *ght and keep your applica*ons running on IPv6 transi*on. The importance of IPv6 Applica*on Tes*ng
How to sleep *ght and keep your applica*ons running on IPv6 transi*on The importance of IPv6 Applica*on Tes*ng About this presenta*on It presents a generic methodology to test the IPv6 func*onality of
More informationLecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka
Lecture 21 11/27/2017 Next Lecture: Quiz review & project meetings Streaming & Apache Kafka What problem does Kafka solve? Provides a way to deliver updates about changes in state from one service to another
More informationDeep Dive Amazon Kinesis. Ian Meyers, Principal Solution Architect - Amazon Web Services
Deep Dive Amazon Kinesis Ian Meyers, Principal Solution Architect - Amazon Web Services Analytics Deployment & Administration App Services Analytics Compute Storage Database Networking AWS Global Infrastructure
More informationWebLogic JMS System Best Practices Daniel Joray Trivadis AG Bern
WebLogic JMS System Best Practices Daniel Joray Trivadis AG Bern Keywords Weblogic, JMS, Performance, J2EE Introduction In many J2EE project the Java Message Service JMS is for exchange of information
More informationEnhancing cloud applications by using messaging services IBM Corporation
Enhancing cloud applications by using messaging services After you complete this section, you should understand: Messaging use cases, benefits, and available APIs in the Message Hub service Message Hub
More informationScaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX
Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX Inventing Internet TV Available in more than 190 countries 104+ million subscribers Lots of Streaming == Lots of Traffic
More informationEffec%vely using Amazon Web Services
Effec%vely using Amazon Web Services Hobin Yoon, Jim Donahue, Ada Gavrilovska, Karsten Schwan CERCS, Georgia Tech ATL, Adobe Systems Message latency of SQS (Simple Queue Service) Op%mizing upload performance
More informationMySQL Performance Optimization and Troubleshooting with PMM. Peter Zaitsev, CEO, Percona
MySQL Performance Optimization and Troubleshooting with PMM Peter Zaitsev, CEO, Percona In the Presentation Practical approach to deal with some of the common MySQL Issues 2 Assumptions You re looking
More informationBreaking Down Barriers: An Intro to GPU Synchronization. Matt Pettineo Lead Engine Programmer Ready At Dawn Studios
Breaking Down Barriers: An Intro to GPU Synchronization Matt Pettineo Lead Engine Programmer Ready At Dawn Studios Who am I? Ready At Dawn for 9 years Lead Engine Programmer for 5 I like GPUs and APIs!
More informationThe SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.
Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate
More informationReliable Distributed Messaging with HornetQ
Reliable Distributed Messaging with HornetQ Lin Zhao Software Engineer, Groupon lin@groupon.com Agenda Introduction MessageBus Design Client API Monitoring Comparison with HornetQ Cluster Future Work Introduction
More informationLogisland Event mining at scale. Thomas [ ]
Logisland Event mining at scale Thomas Bailet @hurence [2017-01-19] Overview Logisland provides a stream analy0cs solu0on that can handle all enterprise-scale event data and processing Big picture Open
More informationOS-caused Long JVM Pauses - Deep Dive and Solutions
OS-caused Long JVM Pauses - Deep Dive and Solutions Zhenyun Zhuang LinkedIn Corp., Mountain View, California, USA https://www.linkedin.com/in/zhenyun Zhenyun@gmail.com 2016-4-21 Outline q Introduction
More informationTyphoon: An SDN Enhanced Real-Time Big Data Streaming Framework
Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common
More informationKafka Connect the Dots
Kafka Connect the Dots Building Oracle Change Data Capture Pipelines With Kafka Mike Donovan CTO Dbvisit Software Mike Donovan Chief Technology Officer, Dbvisit Software Multi-platform DBA, (Oracle, MSSQL..)
More informationOutline. Failure Types
Outline Database Tuning Nikolaus Augsten University of Salzburg Department of Computer Science Database Group 1 Unit 10 WS 2013/2014 Adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Nikolaus
More informationOutline. Spanner Mo/va/on. Tom Anderson
Spanner Mo/va/on Tom Anderson Outline Last week: Chubby: coordina/on service BigTable: scalable storage of structured data GFS: large- scale storage for bulk data Today/Friday: Lessons from GFS/BigTable
More informationh7ps://bit.ly/citustutorial
Before We Start Setup a Citus Cloud account for the exercises: h7ps://bit.ly/citustutorial Designing a Mul
More informationConfinement (Running Untrusted Programs)
Confinement (Running Untrusted Programs) Chester Rebeiro Indian Institute of Technology Madras Untrusted Programs How to run untrusted programs and not harm your system? Answer: Confinement (some:mes called
More informationSCALE AND SECURE MOBILE / IOT MQTT TRAFFIC
APPLICATION NOTE SCALE AND SECURE MOBILE / IOT TRAFFIC Connecting millions of devices requires a simple implementation for fast deployments, adaptive security for protection against hacker attacks, and
More informationSAS Event Stream Processing 5.1: Advanced Topics
SAS Event Stream Processing 5.1: Advanced Topics Starting Streamviewer from the Java Command Line Follow these instructions if you prefer to start Streamviewer from the Java command prompt. You must know
More informationLet the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH
Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast
More informationLies, Damn Lies and Benchmarks: How to Accurately Measure Distributed Application Performance. Heinz Schaffner
Lies, Damn Lies and Benchmarks: How to Accurately Measure Distributed Application Performance Heinz Schaffner Science Projects vs. Production Testing to Destruction vs. Distressed Processing Latency Schemes
More informationApache Kafka Your Event Stream Processing Solution
Apache Kafka Your Event Stream Processing Solution Introduction Data is one among the newer ingredients in the Internet-based systems and includes user-activity events related to logins, page visits, clicks,
More informationCLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters
CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses
More informationUpgrade Your MuleESB with Solace s Messaging Infrastructure
The era of ubiquitous connectivity is upon us. The amount of data most modern enterprises must collect, process and distribute is exploding as a result of real-time process flows, big data, ubiquitous
More informationBringing Multi-Threading to the I/O Level
Bringing Multi-Threading to the I/O Level Rodrigo Medeiros Solutions Architect, Data Platforms @rationalrodrigo @DataCore Copyright 2018 DataCore Software Corp. All Rights Reserved. Objectives Discuss
More informationEvolution of an Apache Spark Architecture for Processing Game Data
Evolution of an Apache Spark Architecture for Processing Game Data Nick Afshartous WB Analytics Platform May 17 th 2017 May 17 th, 2017 About Me nafshartous@wbgames.com WB Analytics Core Platform Lead
More informationScaling the Yelp s logging pipeline with Apache Kafka. Enrico
Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32
More informationIndirect Communication
Indirect Communication Vladimir Vlassov and Johan Montelius KTH ROYAL INSTITUTE OF TECHNOLOGY Time and Space In direct communication sender and receivers exist in the same time and know of each other.
More informationScale out Read Only Workload by sharing data files of InnoDB. Zhai weixiang Alibaba Cloud
Scale out Read Only Workload by sharing data files of InnoDB Zhai weixiang Alibaba Cloud Who Am I - My Name is Zhai Weixiang - I joined in Alibaba in 2011 and has been working on MySQL since then - Mainly
More informationSubmitted to: Dr. Sunnie Chung. Presented by: Sonal Deshmukh Jay Upadhyay
Submitted to: Dr. Sunnie Chung Presented by: Sonal Deshmukh Jay Upadhyay Submitted to: Dr. Sunny Chung Presented by: Sonal Deshmukh Jay Upadhyay What is Apache Survey shows huge popularity spike for Apache
More informationCloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018
Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized
More informationLecture 11 Hadoop & Spark
Lecture 11 Hadoop & Spark Dr. Wilson Rivera ICOM 6025: High Performance Computing Electrical and Computer Engineering Department University of Puerto Rico Outline Distributed File Systems Hadoop Ecosystem
More informationBig Data. Big Data Analyst. Big Data Engineer. Big Data Architect
Big Data Big Data Analyst INTRODUCTION TO BIG DATA ANALYTICS ANALYTICS PROCESSING TECHNIQUES DATA TRANSFORMATION & BATCH PROCESSING REAL TIME (STREAM) DATA PROCESSING Big Data Engineer BIG DATA FOUNDATION
More informationCaching and Demand- Paged Virtual Memory
Caching and Demand- Paged Virtual Memory Defini8ons Cache Copy of data that is faster to access than the original Hit: if cache has copy Miss: if cache does not have copy Cache block Unit of cache storage
More informationThe Lion of storage systems
The Lion of storage systems Rakuten. Inc, Yosuke Hara Mar 21, 2013 1 The Lion of storage systems http://www.leofs.org LeoFS v0.14.0 was released! 2 Table of Contents 1. Motivation 2. Overview & Inside
More informationConceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.
Conceptual Modeling on Tencent s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc. Outline Introduction System overview of TDSQL Conceptual Modeling on TDSQL Applications Conclusion
More informationChapter 11: File System Implementation. Objectives
Chapter 11: File System Implementation Objectives To describe the details of implementing local file systems and directory structures To describe the implementation of remote file systems To discuss block
More informationMySQL Database Scalability
MySQL Database Scalability Nextcloud Conference 2016 TU Berlin Oli Sennhauser Senior MySQL Consultant at FromDual GmbH oli.sennhauser@fromdual.com 1 / 14 About FromDual GmbH Support Consulting remote-dba
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems GFS (The Google File System) 1 Filesystems
More informationFUJITSU Software ServerView Cloud Monitoring Manager V1.1. Release Notes
FUJITSU Software ServerView Cloud Monitoring Manager V1.1 Release Notes J2UL-2170-01ENZ0(00) July 2016 Contents Contents About this Manual... 4 1 What's New?...6 1.1 Performance Improvements... 6 1.2
More informationVOLTDB + HP VERTICA. page
VOLTDB + HP VERTICA ARCHITECTURE FOR FAST AND BIG DATA ARCHITECTURE FOR FAST + BIG DATA FAST DATA Fast Serve Analytics BIG DATA BI Reporting Fast Operational Database Streaming Analytics Columnar Analytics
More informationArchitecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures. Mar>n Rehák
Architecture of So-ware Systems Massively Distributed Architectures Reliability, Failover and failures Mar>n Rehák Mo>va>on Internet- based business models imposed new requirements on computa>onal architectures
More informationHeterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi
Heterogeneous Resources Management In Modern Data Centers with Dynamic Workloads Ningfang Mi Electrical and Computer Engineering Dept. Northeastern University ningfang@ece.neu.edu 1 Research Focus To investigate
More informationPaaS SAE Top3 SuperAPP
PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP Pla$orm Services Group Sam Biwing Monika Rambone Skylee Kingho1d AWS S3 CDN ATS 1k 30+ 10+ Go FE Services Panel C++ Go C/C++ ACM FE Pla$orm Services Group
More informationdavidklee.net gplus.to/kleegeek linked.com/a/davidaklee
@kleegeek davidklee.net gplus.to/kleegeek linked.com/a/davidaklee Specialties / Focus Areas / Passions: Performance Tuning & Troubleshooting Virtualization Cloud Enablement Infrastructure Architecture
More informationEfficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra<on
Efficient Memory and Bandwidth Management for Industrial Strength Kirchhoff Migra
More informationhashfs Applying Hashing to Op2mize File Systems for Small File Reads
hashfs Applying Hashing to Op2mize File Systems for Small File Reads Paul Lensing, Dirk Meister, André Brinkmann Paderborn Center for Parallel Compu2ng University of Paderborn Mo2va2on and Problem Design
More informationApache Storm. A framework for Parallel Data Stream Processing
Apache Storm A framework for Parallel Data Stream Processing Storm Storm is a distributed real- ;me computa;on pla
More informationLarge- Scale Sor,ng: Breaking World Records. Mike Conley CSE 124 Guest Lecture 12 March 2015
Large- Scale Sor,ng: Breaking World Records Mike Conley CSE 124 Guest Lecture 12 March 2015 Sor,ng Given an array of items, put them in order 5 2 8 0 2 5 4 9 0 1 0 0 0 0 0 0 1 2 2 4 5 5 8 9 Many algorithms
More informationUNIX Sockets. COS 461 Precept 1
UNIX Sockets COS 461 Precept 1 Socket and Process Communica;on application layer User Process Socket transport layer (TCP/UDP) OS network stack network layer (IP) link layer (e.g. ethernet) Internet Internet
More informationIBM InfoSphere Streams v4.0 Performance Best Practices
Henry May IBM InfoSphere Streams v4.0 Performance Best Practices Abstract Streams v4.0 introduces powerful high availability features. Leveraging these requires careful consideration of performance related
More informationDesigning MQ deployments for the cloud generation
Designing MQ deployments for the cloud generation WebSphere User Group, London Arthur Barr, Senior Software Engineer, IBM MQ 30 th March 2017 Top business drivers for cloud 2 Source: OpenStack user survey,
More informationIntroduction to Kafka (and why you care)
Introduction to Kafka (and why you care) Richard Nikula VP, Product Development and Support Nastel Technologies, Inc. 2 Introduction Richard Nikula VP of Product Development and Support Involved in MQ
More informationCloud Monitoring as a Service. Built On Machine Learning
Cloud Monitoring as a Service Built On Machine Learning Table of Contents 1 2 3 4 5 6 7 8 9 10 Why Machine Learning Who Cares Four Dimensions to Cloud Monitoring Data Aggregation Anomaly Detection Algorithms
More informationSingle and mul,threaded processes
1 Single and mul,threaded processes Why threads? Express concurrency Web server (mul,ple requests), Browser (GUI + network I/O + rendering), most GUI programs for(;;) { struct request *req = get_request();
More information10 Million Smart Meter Data with Apache HBase
10 Million Smart Meter Data with Apache HBase 5/31/2017 OSS Solution Center Hitachi, Ltd. Masahiro Ito OSS Summit Japan 2017 Who am I? Masahiro Ito ( 伊藤雅博 ) Software Engineer at Hitachi, Ltd. Focus on
More informationStreaming Log Analytics with Kafka
Streaming Log Analytics with Kafka Kresten Krab Thorup, Humio CTO Log Everything, Answer Anything, In Real-Time. Why this talk? Humio is a Log Analytics system Designed to run on-prem High volume, real
More informationProfiling & Tuning Applica1ons. CUDA Course July István Reguly
Profiling & Tuning Applica1ons CUDA Course July 21-25 István Reguly Introduc1on Why is my applica1on running slow? Work it out on paper Instrument code Profile it NVIDIA Visual Profiler Works with CUDA,
More information1/10/16. RPC and Clocks. Tom Anderson. Last Time. Synchroniza>on RPC. Lab 1 RPC
RPC and Clocks Tom Anderson Go Synchroniza>on RPC Lab 1 RPC Last Time 1 Topics MapReduce Fault tolerance Discussion RPC At least once At most once Exactly once Lamport Clocks Mo>va>on MapReduce Fault Tolerance
More informationTime and Space. Indirect communication. Time and space uncoupling. indirect communication
Time and Space Indirect communication Johan Montelius In direct communication sender and receivers exist in the same time and know of each other. KTH In indirect communication we relax these requirements.
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationUsing Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng. Jakub Krzywda Umeå University
Using Dynamic Voltage Frequency Scaling and CPU Pinning for Energy Efficiency in Cloud Compu1ng Jakub Krzywda Umeå University How to use DVFS and CPU Pinning to lower the power consump1on during periods
More informationApplications of Paxos Algorithm
Applications of Paxos Algorithm Gurkan Solmaz COP 6938 - Cloud Computing - Fall 2012 Department of Electrical Engineering and Computer Science University of Central Florida - Orlando, FL Oct 15, 2012 1
More informationNASPInet 2.0 The Evolu4on of Synchrophasor Networks
NASPInet 2.0 The Evolu4on of Synchrophasor Networks NASPI Working Group Mee4ng San Mateo, California March 24, 2015 Dick Willson and Dan LuKer Allied Partners LLC 1 Agenda Future Synchrophasor Networks
More informationPaxos Replicated State Machines as the Basis of a High- Performance Data Store
Paxos Replicated State Machines as the Basis of a High- Performance Data Store William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters and Peng Li March 30, 2011 Q: How to build a
More informationPrincipled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models
Principled Schedulability Analysis for Distributed Storage Systems Using Thread Architecture Models Suli Yang*, Jing Liu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau * work done while at UW-Madison
More informationDell PowerVault MD3600f/MD3620f Remote Replication Functional Guide
Dell PowerVault MD3600f/MD3620f Remote Replication Functional Guide Page i THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT
More information