REAL-TIME ANALYTICS WITH APACHE STORM

Size: px
Start display at page:

Download "REAL-TIME ANALYTICS WITH APACHE STORM"

Transcription

1 REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student

2 IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4- Conclusion

3 1- INTRODUCTION Number of IoT devices increased. - currently ~7 billion,by 2020 ~50 billion (exponentially growing) - low manufacturing costs - availability of internet connections IoT devices consist of : - CPU - memory storage - a wireless connection IoT devices equipment with: - sensors (produce data) - actuators ( capable of receiving commands)

4 1- INTRODUCTION An example of IoT in modern life : Robots; - limited on-board computation power - generates large amount of data Challenges: - latency - computation needs (limits the robot s mobility due to weights and power demands) *Google Images

5 1- INTRODUCTION Solution: - scalable data processing platforms -> CLOUD It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort.[9] - becoming the standard computation Advantages of using central data processing: - the ability to easily draw from vast stores of information, - efficient allocation of computing resources, - a proclivity for parallelization.

6 1.1- REQUIREMENTS FOR IOT DEVICES Data transfer should be in an efficient and scalable manner. - Traditional GET/POST approach is not suitable because this approach increases latency and network traffic. Parallel processing Real-time analysis Batch analysis

7 2. A REAL-TIME ARCHITECTURE Gateway layer: Drivers are deployed in gateway layer. Publish-subscribe messaging layer Cloud-based big data processing layer: Apache Storm Process data and send back to the device. IoT Cloud Architecture [1]

8 2.1- GATEWAY LAYER Gateway layer [2] Each has a unique ID Gateway master responsible for: - Control gateways - Deploy/undeploy & start/stop the drivers Gateways responsible for: - Managing drivers - Managing connections to the brokers - Handling the load balancing of the device data to the brokers - Update the gateway master - Update state information of gateways in a Zookeeper.

9 2.1- GATEWAY LAYER Each channel has a unique name Driver: - Data bridge between a device and the cloud app. - Responsible for data conversion - Has name and set of communication channels - Can be deployed multiple times MQ Layer[2]

10 2.2- MESSAGING LAYER RabbitMQ - Topic based publish subscribe broker - Has a rich API ; topics can be easily created. - Supports Advance Message Queuing Protocol(AMQP) and Message Queue Telemetry Transport (MQTT) - Low latency - Creates lightweight topics RabbitMQ [3]

11 2.2- MESSAGING LAYER Kafka - Topic based publish subscribe broker - Messages are appended to commit log - Topics are divided into partitions - Consumer can read the same topic in parallel - Has its own messaging protocol - Does not support AMQP or MQTT Kafka[4]

12 2.3- ZOOKEEPER - Need to detect online and offline devices - Storm requires coordination among the processing units, because of its distributed nature Discovery[2]

13 2.4- PROCESSING LAYER Apache Storm - Fault tolerant - Horizontally scalable - Handles large amount of streaming data - Open source - Message guarantees - Simple programming model - Supports multi programming language

14 2.4- PROCESSING LAYER Apache Storm Concept - Stream: Storm data model -> unbounded sequence tuple - Spout - Bolt - Topology Directed acrylic graph Vertices: computation Edges: stream of data tuple Apache Storm[5]

15 2.4- PROCESSING LAYER Apache Storm - Grouping Twitter[6]

16 2.4- PROCESSING LAYER Apache Storm Storm cluster[5]

17 2.4- PROCESSING LAYER Apache Storm Topology

18 2.5- WRAP UP IoT Cloud [2]

19 3- EXISITING APPLICATIONS Turtlebot [7] TurtleBot follows a large target in front of it by trying to maintain a constant distance to the target. Compressed depth images of the Kinect camera are sent to the cloud and the processing topology calculates command messages, in the form of velocity vectors, in order to maintain a set distance from the large object in front of TurtleBot.

20 3- EXISITING APPLICATIONS Storm Nimbus and Zookeeper -> 1 node Gateway -> 2 nodes Storm supervisors -> 3 nodes Brokers -> 2 nodes An instance of medium flavor has 2 VCPUs, 4GB of memory, and 40GB of HDD. 4 spouts and 4 bolts are running in parallel.

21 3- EXISITING APPLICATIONS Cloud Drivers[8]

22 3- EXISITING APPLICATIONS Latency with RabbitMQ Latency with Kafka *[2]

23 3- EXISITING APPLICATIONS Latency with RabbitMQ Latency with Kafka *[2]

24 3- EXISITING APPLICATIONS Latency observed in TurtleBot application. *[2]

25 4- CONCLUSION Introduction to a scalable, distributed architecture and its component. Apache storm is leading real-time processing engine. RabbitMQ can be chosen when latency is requirement. Proof of concept was verified by an example. Proposed a new framework.

26 5- REFERENCES [1] Kamburugamuve, Supun, et al. "Cloud-based parallel implementation of slam for mobile robots." Proceedings of the International Conference on Internet of things and Cloud Computing. ACM, [2] Kamburugamuve, Supun, Leif Christiansen, and Geoffrey Fox. "A framework for real time processing of sensor data in the cloud." Journal of Sensors 2015 (2015). [3] [4] [5] [6] [7] [8] He, Hengjing, et al. "Cloud based real-time multi-robot collision avoidance for swarm robotics." International Journal of Grid and Distributed Computing, May 7 (2015). [9] [10] [11] [12]

27 Q&A

28 THANK YOU

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b

Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 4th International Conference on Mechatronics, Materials, Chemistry and Computer Engineering (ICMMCCE 2015) Real-time Calculating Over Self-Health Data Using Storm Jiangyong Cai1, a, Zhengping Jin2, b 1

More information

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Basic info Open sourced September 19th Implementation is 15,000 lines of code Used by over 25 companies >2700 watchers on Github

More information

Scalable Streaming Analytics

Scalable Streaming Analytics Scalable Streaming Analytics KARTHIK RAMASAMY @karthikz TALK OUTLINE BEGIN I! II ( III b Overview Storm Overview Storm Internals IV Z V K Heron Operational Experiences END WHAT IS ANALYTICS? according

More information

Data Analytics with HPC. Data Streaming

Data Analytics with HPC. Data Streaming Data Analytics with HPC Data Streaming Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

Flying Faster with Heron

Flying Faster with Heron Flying Faster with Heron KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron TALK OUTLINE BEGIN I! II ( III b OVERVIEW MOTIVATION HERON IV Z OPERATIONAL EXPERIENCES V K HERON PERFORMANCE END [! OVERVIEW TWITTER IS

More information

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter Storm Distributed and fault-tolerant realtime computation Nathan Marz Twitter Storm at Twitter Twitter Web Analytics Before Storm Queues Workers Example (simplified) Example Workers schemify tweets and

More information

Twitter Heron: Stream Processing at Scale

Twitter Heron: Stream Processing at Scale Twitter Heron: Stream Processing at Scale Saiyam Kohli December 8th, 2016 CIS 611 Research Paper Presentation -Sun Sunnie Chung TWITTER IS A REAL TIME ABSTRACT We process billions of events on Twitter

More information

Streaming & Apache Storm

Streaming & Apache Storm Streaming & Apache Storm Recommended Text: Storm Applied Sean T. Allen, Matthew Jankowski, Peter Pathirana Manning 2010 VMware Inc. All rights reserved Big Data! Volume! Velocity Data flowing into the

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Apache Storm: Hands-on Session A.A. 2016/17

Apache Storm: Hands-on Session A.A. 2016/17 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Apache Storm: Hands-on Session A.A. 2016/17 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica

More information

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors.

Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors. About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache

More information

STORM AND LOW-LATENCY PROCESSING.

STORM AND LOW-LATENCY PROCESSING. STORM AND LOW-LATENCY PROCESSING Low latency processing Similar to data stream processing, but with a twist Data is streaming into the system (from a database, or a netk stream, or an HDFS file, or ) We

More information

Durham Research Online

Durham Research Online Durham Research Online Deposited in DRO: 08 September 2017 Version of attached le: Accepted Version Peer-review status of attached le: Peer-reviewed Citation for published item: He, Hengjing and Zhao,

More information

PaaS SAE Top3 SuperAPP

PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP PaaS SAE Top3 SuperAPP Pla$orm Services Group Sam Biwing Monika Rambone Skylee Kingho1d AWS S3 CDN ATS 1k 30+ 10+ Go FE Services Panel C++ Go C/C++ ACM FE Pla$orm Services Group

More information

Cloud-based Parallel Implementation of SLAM for Mobile Robots

Cloud-based Parallel Implementation of SLAM for Mobile Robots Cloud-based Parallel Implementation of SLAM for Mobile Robots Supun Kamburugamuve 1, Hengjing He 2, Geoffrey Fox 1, David Crandall 1 1 School of Informatics and Computing, Indiana University, Bloomington,

More information

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework

Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Typhoon: An SDN Enhanced Real-Time Big Data Streaming Framework Junguk Cho, Hyunseok Chang, Sarit Mukherjee, T.V. Lakshman, and Jacobus Van der Merwe 1 Big Data Era Big data analysis is increasingly common

More information

Intra-cluster Replication for Apache Kafka. Jun Rao

Intra-cluster Replication for Apache Kafka. Jun Rao Intra-cluster Replication for Apache Kafka Jun Rao About myself Engineer at LinkedIn since 2010 Worked on Apache Kafka and Cassandra Database researcher at IBM Outline Overview of Kafka Kafka architecture

More information

Fluentd + MongoDB + Spark = Awesome Sauce

Fluentd + MongoDB + Spark = Awesome Sauce Fluentd + MongoDB + Spark = Awesome Sauce Nishant Sahay, Sr. Architect, Wipro Limited Bhavani Ananth, Tech Manager, Wipro Limited Your company logo here Wipro Open Source Practice: Vision & Mission Vision

More information

Over the last few years, we have seen a disruption in the data management

Over the last few years, we have seen a disruption in the data management JAYANT SHEKHAR AND AMANDEEP KHURANA Jayant is Principal Solutions Architect at Cloudera working with various large and small companies in various Verticals on their big data and data science use cases,

More information

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems

Vortex Whitepaper. Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems Vortex Whitepaper Simplifying Real-time Information Integration in Industrial Internet of Things (IIoT) Control Systems www.adlinktech.com 2017 Table of Contents 1. Introduction........ P 3 2. Iot and

More information

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà

FROM LEGACY, TO BATCH, TO NEAR REAL-TIME. Marc Sturlese, Dani Solà FROM LEGACY, TO BATCH, TO NEAR REAL-TIME Marc Sturlese, Dani Solà WHO ARE WE? Marc Sturlese - @sturlese Backend engineer, focused on R&D Interests: search, scalability Dani Solà - @dani_sola Backend engineer

More information

Configuring and Deploying Hadoop Cluster Deployment Templates

Configuring and Deploying Hadoop Cluster Deployment Templates Configuring and Deploying Hadoop Cluster Deployment Templates This chapter contains the following sections: Hadoop Cluster Profile Templates, on page 1 Creating a Hadoop Cluster Profile Template, on page

More information

Tutorial: Apache Storm

Tutorial: Apache Storm Indian Institute of Science Bangalore, India भ रत य वज ञ न स स थ न ब गल र, भ रत Department of Computational and Data Sciences DS256:Jan17 (3:1) Tutorial: Apache Storm Anshu Shukla 16 Feb, 2017 Yogesh Simmhan

More information

Research on the Architecture and its Implementation for Instrumentation and Measurement Cloud

Research on the Architecture and its Implementation for Instrumentation and Measurement Cloud IEEE TRANSACTIONS ON SERVICES COMPUTING, MANUSCRIPT ID 1 Research on the Architecture and its Implementation for Instrumentation and Measurement Cloud Hengjing He, Wei Zhao, Songling Huang, Geoffrey C.

More information

Esper EQC. Horizontal Scale-Out for Complex Event Processing

Esper EQC. Horizontal Scale-Out for Complex Event Processing Esper EQC Horizontal Scale-Out for Complex Event Processing Esper EQC - Introduction Esper query container (EQC) is the horizontal scale-out architecture for Complex Event Processing with Esper and EsperHA

More information

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Big Data Technology Ecosystem Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara Agenda End-to-End Data Delivery Platform Ecosystem of Data Technologies Mapping an End-to-End Solution Case

More information

Deploying Applications on DC/OS

Deploying Applications on DC/OS Mesosphere Datacenter Operating System Deploying Applications on DC/OS Keith McClellan - Technical Lead, Federal Programs keith.mcclellan@mesosphere.com V6 THE FUTURE IS ALREADY HERE IT S JUST NOT EVENLY

More information

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Upgrade Your MuleESB with Solace s Messaging Infrastructure The era of ubiquitous connectivity is upon us. The amount of data most modern enterprises must collect, process and distribute is exploding as a result of real-time process flows, big data, ubiquitous

More information

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1

rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 rkafka rkafka is a package created to expose functionalities provided by Apache Kafka in the R layer. Version 1.1 Wednesday 28 th June, 2017 rkafka Shruti Gupta Wednesday 28 th June, 2017 Contents 1 Introduction

More information

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017.

The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Dublin Apache Kafka Meetup, 30 August 2017. Dublin Apache Kafka Meetup, 30 August 2017 The SMACK Stack: Spark*, Mesos*, Akka, Cassandra*, Kafka* Elizabeth K. Joseph @pleia2 * ASF projects 1 Elizabeth K. Joseph, Developer Advocate Developer Advocate

More information

Sizing Guidelines and Performance Tuning for Intelligent Streaming

Sizing Guidelines and Performance Tuning for Intelligent Streaming Sizing Guidelines and Performance Tuning for Intelligent Streaming Copyright Informatica LLC 2017. Informatica and the Informatica logo are trademarks or registered trademarks of Informatica LLC in the

More information

Priority Based Resource Scheduling Techniques for a Multitenant Stream Processing Platform

Priority Based Resource Scheduling Techniques for a Multitenant Stream Processing Platform Priority Based Resource Scheduling Techniques for a Multitenant Stream Processing Platform By Rudraneel Chakraborty A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment

More information

CS 398 ACC Streaming. Prof. Robert J. Brunner. Ben Congdon Tyler Kim

CS 398 ACC Streaming. Prof. Robert J. Brunner. Ben Congdon Tyler Kim CS 398 ACC Streaming Prof. Robert J. Brunner Ben Congdon Tyler Kim MP3 How s it going? Final Autograder run: - Tonight ~9pm - Tomorrow ~3pm Due tomorrow at 11:59 pm. Latest Commit to the repo at the time

More information

UNIK Building Mobile and Wireless Networks Maghsoud Morshedi

UNIK Building Mobile and Wireless Networks Maghsoud Morshedi UNIK4700 - Building Mobile and Wireless Networks Maghsoud Morshedi IoT Market https://iot-analytics.com/iot-market-forecasts-overview/ 21/11/2017 2 IoT Management Advantages Remote provisioning Register

More information

Introduction to IoT. Jianwei Liu Clemson University

Introduction to IoT. Jianwei Liu Clemson University Introduction to IoT Jianwei Liu Clemson University What are IoT & M2M The Internet of Things (IoT), also called Internet of Everything, is the network of physical objects or "things" embedded with electronics,

More information

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH

Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka. Materna GmbH Let the data flow! Data Streaming & Messaging with Apache Kafka Frank Pientka Wer ist Frank Pientka? Dipl.-Informatiker (TH Karlsruhe) Verheiratet, 2 Töchter Principal Software Architect in Dortmund Fast

More information

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS

@unterstein #bedcon. Operating microservices with Apache Mesos and DC/OS @unterstein @dcos @bedcon #bedcon Operating microservices with Apache Mesos and DC/OS 1 Johannes Unterstein Software Engineer @Mesosphere @unterstein @unterstein.mesosphere 2017 Mesosphere, Inc. All Rights

More information

Flash Storage Complementing a Data Lake for Real-Time Insight

Flash Storage Complementing a Data Lake for Real-Time Insight Flash Storage Complementing a Data Lake for Real-Time Insight Dr. Sanhita Sarkar Global Director, Analytics Software Development August 7, 2018 Agenda 1 2 3 4 5 Delivering insight along the entire spectrum

More information

Evaluation of Apache Kafka in Real-Time Big Data Pipeline Architecture

Evaluation of Apache Kafka in Real-Time Big Data Pipeline Architecture Evaluation of Apache Kafka in Real-Time Big Data Pipeline Architecture Thandar Aung, Hla Yin Min, Aung Htein Maw University of Information Technology Yangon, Myanmar thandaraung@uit.edu.mm, hlayinmin@uit.edu.mm,

More information

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers

A Distributed System Case Study: Apache Kafka. High throughput messaging for diverse consumers A Distributed System Case Study: Apache Kafka High throughput messaging for diverse consumers As always, this is not a tutorial Some of the concepts may no longer be part of the current system or implemented

More information

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect

AWS IoT Overview. July 2016 Thomas Jones, Partner Solutions Architect AWS IoT Overview July 2016 Thomas Jones, Partner Solutions Architect AWS customers are connecting physical things to the cloud in every industry imaginable. Healthcare and Life Sciences Municipal Infrastructure

More information

Apache Storm. Hortonworks Inc Page 1

Apache Storm. Hortonworks Inc Page 1 Apache Storm Page 1 What is Storm? Real time stream processing framework Scalable Up to 1 million tuples per second per node Fault Tolerant Tasks reassigned on failure Guaranteed Processing At least once

More information

Introduction to Kafka (and why you care)

Introduction to Kafka (and why you care) Introduction to Kafka (and why you care) Richard Nikula VP, Product Development and Support Nastel Technologies, Inc. 2 Introduction Richard Nikula VP of Product Development and Support Involved in MQ

More information

OpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin

OpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin OpenStack internal messaging at the edge: In-depth evaluation Ken Giusti Javier Rojas Balderrama Matthieu Simonin Who s here? Ken Giusti Javier Rojas Balderrama Matthieu Simonin Fog Edge and Massively

More information

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's

Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Building Agile and Resilient Schema Transformations using Apache Kafka and ESB's Transformation-free Data Pipelines by combining the Power of Apache Kafka and the Flexibility of the ESB's Ricardo Ferreira

More information

Apache Kafka Your Event Stream Processing Solution

Apache Kafka Your Event Stream Processing Solution Apache Kafka Your Event Stream Processing Solution Introduction Data is one among the newer ingredients in the Internet-based systems and includes user-activity events related to logins, page visits, clicks,

More information

Hands-On with IoT Standards & Protocols

Hands-On with IoT Standards & Protocols DEVNET-3623 Hands-On with IoT Standards & Protocols Casey Bleeker, Developer Evangelist @geekbleek Cisco Spark How Questions? Use Cisco Spark to communicate with the speaker after the session 1. Find this

More information

How to Route Internet Traffic between A Mobile Application and IoT Device?

How to Route Internet Traffic between A Mobile Application and IoT Device? Whitepaper How to Route Internet Traffic between A Mobile Application and IoT Device? Website: www.mobodexter.com www.paasmer.co 1 Table of Contents 1. Introduction 3 2. Approach: 1 Uses AWS IoT Setup

More information

Performance Benchmarking an Enterprise Message Bus. Anurag Sharma Pramod Sharma Sumant Vashisth

Performance Benchmarking an Enterprise Message Bus. Anurag Sharma Pramod Sharma Sumant Vashisth Performance Benchmarking an Enterprise Message Bus Anurag Sharma Pramod Sharma Sumant Vashisth About the Authors Sumant Vashisth is Director of Engineering, Security Management Business Unit at McAfee.

More information

/ Cloud Computing. Recitation 15 December 6 th 2016

/ Cloud Computing. Recitation 15 December 6 th 2016 15-319 / 15-619 Cloud Computing Recitation 15 December 6 th 2016 Overview Last week s reflection Team project phase 3 Quiz 12 This week s schedule Phase3 report Deadline TODAY 12/6 Project 4.3 Deadline

More information

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018

Cloudline Autonomous Driving Solutions. Accelerating insights through a new generation of Data and Analytics October, 2018 Cloudline Autonomous Driving Solutions Accelerating insights through a new generation of Data and Analytics October, 2018 HPE big data analytics solutions power the data-driven enterprise Secure, workload-optimized

More information

Distributed systems for stream processing

Distributed systems for stream processing Distributed systems for stream processing Apache Kafka and Spark Structured Streaming Alena Hall Alena Hall Large-scale data processing Distributed Systems Functional Programming Data Science & Machine

More information

Build Your Own Data Collection IoT Devices

Build Your Own Data Collection IoT Devices Build Your Own Data Collection IoT Devices Inspirations for (even) more data Analytics Seminar at Georgetown University Ulrich Norbisrath 2017-05-03 whoami http://ulno.net, Ulrich Norbisrath email: replace

More information

Diving into Open Source Messaging: What Is Kafka?

Diving into Open Source Messaging: What Is Kafka? Diving into Open Source Messaging: What Is Kafka? The world of messaging middleware has changed dramatically over the last 30 years. But in truth the world of communication has changed dramatically as

More information

Vortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D.

Vortex Whitepaper. Intelligent Data Sharing for the Business-Critical Internet of Things. Version 1.1 June 2014 Angelo Corsaro Ph.D. Vortex Whitepaper Intelligent Data Sharing for the Business-Critical Internet of Things Version 1.1 June 2014 Angelo Corsaro Ph.D., CTO, PrismTech Vortex Whitepaper Version 1.1 June 2014 Table of Contents

More information

10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case

10 Things to Consider When Using Apache Ka7a: Ulizaon Points of Apache Ka4a Obtained From IoT Use Case 10 Things to Consider When Using Apache Ka7a: U"liza"on Points of Apache Ka4a Obtained From IoT Use Case May 16, 2017 NTT DATA CorporaAon Naoto Umemori, Yuji Hagiwara 2017 NTT DATA Corporation Contents

More information

Architectural challenges for building a low latency, scalable multi-tenant data warehouse

Architectural challenges for building a low latency, scalable multi-tenant data warehouse Architectural challenges for building a low latency, scalable multi-tenant data warehouse Mataprasad Agrawal Solutions Architect, Services CTO 2017 Persistent Systems Ltd. All rights reserved. Our analytics

More information

Global Data Plane. The Cloud is not enough: Saving IoT from the Cloud & Toward a Global Data Infrastructure PRESENTED BY MEGHNA BAIJAL

Global Data Plane. The Cloud is not enough: Saving IoT from the Cloud & Toward a Global Data Infrastructure PRESENTED BY MEGHNA BAIJAL Global Data Plane The Cloud is not enough: Saving IoT from the Cloud & Toward a Global Data Infrastructure PRESENTED BY MEGHNA BAIJAL Why is the Cloud Not Enough? Currently, peripherals communicate directly

More information

ISSN: [Gireesh Babu C N* et al., 6(7): July, 2017 Impact Factor: 4.116

ISSN: [Gireesh Babu C N* et al., 6(7): July, 2017 Impact Factor: 4.116 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY REAL-TIME DATA PROCESSING WITH STORM: USING TWITTER STREAMING Gireesh Babu C N 1, Manjunath T N 2, Suhas V 3 1,2,3 Department

More information

HDInsight > Hadoop. October 12, 2017

HDInsight > Hadoop. October 12, 2017 HDInsight > Hadoop October 12, 2017 2 Introduction Mark Hudson >20 years mixing technology with data >10 years with CapTech Microsoft Certified IT Professional Business Intelligence Member of the Richmond

More information

Stanislav Harvan Internet of Things

Stanislav Harvan Internet of Things Stanislav Harvan v-sharva@microsoft.com Internet of Things IoT v číslach Gartner: V roku 2020 bude na Internet pripojených viac ako 25mld zariadení: 1,5mld smart TV 2,5mld pc 5mld smart phone 16mld dedicated

More information

Internet of Things: An Introduction

Internet of Things: An Introduction Internet of Things: An Introduction IoT Overview and Architecture IoT Communication Protocols Acknowledgements 1.1 What is IoT? Internet of Things (IoT) comprises things that have unique identities and

More information

August 23, 2017 Revision 0.3. Building IoT Applications with GridDB

August 23, 2017 Revision 0.3. Building IoT Applications with GridDB August 23, 2017 Revision 0.3 Building IoT Applications with GridDB Table of Contents Executive Summary... 2 Introduction... 2 Components of an IoT Application... 2 IoT Models... 3 Edge Computing... 4 Gateway

More information

IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow

IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow 1 IoT Sensor Analytics with Apache Kafka, KSQL and TensorFlow Kafka-Native End-to-End IoT Data Integration and Processing Kai Waehner - Technology Evangelist kontakt@kai-waehner.de - LinkedIn Twitter :

More information

REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT

REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT REAL-TIME PEDESTRIAN DETECTION USING APACHE STORM IN A DISTRIBUTED ENVIRONMENT ABSTRACT Du-Hyun Hwang, Yoon-Ki Kim and Chang-Sung Jeong Department of Electrical Engineering, Korea University, Seoul, Republic

More information

Data Ingestion at Scale. Jeffrey Sica

Data Ingestion at Scale. Jeffrey Sica Data Ingestion at Scale Jeffrey Sica ARC-TS @jeefy Overview What is Data Ingestion? Concepts Use Cases GPS collection with mobile devices Collecting WiFi data from WAPs Sensor data from manufacturing machines

More information

Building a Data-Friendly Platform for a Data- Driven Future

Building a Data-Friendly Platform for a Data- Driven Future Building a Data-Friendly Platform for a Data- Driven Future Benjamin Hindman - @benh 2016 Mesosphere, Inc. All Rights Reserved. INTRO $ whoami BENJAMIN HINDMAN Co-founder and Chief Architect of Mesosphere,

More information

Cisco Tetration Analytics

Cisco Tetration Analytics Cisco Tetration Analytics Enhanced security and operations with real time analytics Christopher Say (CCIE RS SP) Consulting System Engineer csaychoh@cisco.com Challenges in operating a hybrid data center

More information

IoT Intro. Fernando Solano Warsaw University of Technology

IoT Intro. Fernando Solano Warsaw University of Technology IoT Intro Fernando Solano Warsaw University of Technology fs@tele.pw.edu.pl Embedded Systems Wireless Sensor and Actuator Networks Enabling technologies Communication Protocols Cloud Computing Big Data

More information

Performance and Scalability with Griddable.io

Performance and Scalability with Griddable.io Performance and Scalability with Griddable.io Executive summary Griddable.io is an industry-leading timeline-consistent synchronized data integration grid across a range of source and target data systems.

More information

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters CLUSTERING HIVEMQ Building highly available, horizontally scalable MQTT Broker Clusters 12/2016 About this document MQTT is based on a publish/subscribe architecture that decouples MQTT clients and uses

More information

(2016) Software Defined Things in Manufacturing

(2016) Software Defined Things in Manufacturing Journal of Software Engineering and Applications, 2016, 9, 425-438 http://www.scirp.org/journal/jsea ISSN Online: 1945-3124 ISSN Print: 1945-3116 Software Defined Things in Manufacturing Networks Arshdeep

More information

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info

We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info We are ready to serve Latest Testing Trends, Are you ready to learn?? New Batches Info START DATE : TIMINGS : DURATION : TYPE OF BATCH : FEE : FACULTY NAME : LAB TIMINGS : PH NO: 9963799240, 040-40025423

More information

MOHA: Many-Task Computing Framework on Hadoop

MOHA: Many-Task Computing Framework on Hadoop Apache: Big Data North America 2017 @ Miami MOHA: Many-Task Computing Framework on Hadoop Soonwook Hwang Korea Institute of Science and Technology Information May 18, 2017 Table of Contents Introduction

More information

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services

Serverless Computing. Redefining the Cloud. Roger S. Barga, Ph.D. General Manager Amazon Web Services Serverless Computing Redefining the Cloud Roger S. Barga, Ph.D. General Manager Amazon Web Services Technology Triggers Highly Recommended http://a16z.com/2016/12/16/the-end-of-cloud-computing/ Serverless

More information

Panoptes: A Network Telemetry Ecosystem - Part Deux

Panoptes: A Network Telemetry Ecosystem - Part Deux Panoptes: A Network Telemetry Ecosystem - Part Deux Panoptes is: Greenfield Python based network telemetry platform that provides real time telemetry and analytics @ Yahoo Implements discovery, polling,

More information

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane

BIG DATA. Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management. Author: Sandesh Deshmane BIG DATA Using the Lambda Architecture on a Big Data Platform to Improve Mobile Campaign Management Author: Sandesh Deshmane Executive Summary Growing data volumes and real time decision making requirements

More information

Chapter 5: Stream Processing. Big Data Management and Analytics 193

Chapter 5: Stream Processing. Big Data Management and Analytics 193 Chapter 5: Big Data Management and Analytics 193 Today s Lesson Data Streams & Data Stream Management System Data Stream Models Insert-Only Insert-Delete Additive Streaming Methods Sliding Windows & Ageing

More information

Indirect Communication

Indirect Communication Indirect Communication Vladimir Vlassov and Johan Montelius KTH ROYAL INSTITUTE OF TECHNOLOGY Time and Space In direct communication sender and receivers exist in the same time and know of each other.

More information

Enhancing cloud applications by using messaging services IBM Corporation

Enhancing cloud applications by using messaging services IBM Corporation Enhancing cloud applications by using messaging services After you complete this section, you should understand: Messaging use cases, benefits, and available APIs in the Message Hub service Message Hub

More information

A Generic Microservice Architecture for Environmental Data Management

A Generic Microservice Architecture for Environmental Data Management A Generic Microservice Architecture for Environmental Data Management Clemens Düpmeier, Eric Braun, Thorsten Schlachter, Karl-Uwe Stucky, Wolfgang Suess KIT The Research University in the Helmholtz Association

More information

System Support for Internet of Things

System Support for Internet of Things System Support for Internet of Things Kishore Ramachandran (Kirak Hong - Google, Dave Lillethun, Dushmanta Mohapatra, Steffen Maas, Enrique Saurez Apuy) Overview Motivation Mobile Fog: A Distributed

More information

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI /

Index. Scott Klein 2017 S. Klein, IoT Solutions in Microsoft s Azure IoT Suite, DOI / Index A Advanced Message Queueing Protocol (AMQP), 44 Analytics, 9 Apache Ambari project, 209 210 API key, 244 Application data, 4 Azure Active Directory (AAD), 91, 257 Azure Blob Storage, 191 Azure data

More information

Video Analytics at the Edge: Fun with Apache Edgent, OpenCV and a Raspberry Pi

Video Analytics at the Edge: Fun with Apache Edgent, OpenCV and a Raspberry Pi Video Analytics at the Edge: Fun with Apache Edgent, OpenCV and a Raspberry Pi Dale LaBossiere, Will Marshall, Jerome Chailloux Apache Edgent is currently undergoing Incubation at the Apache Software Foundation.

More information

VMware Cloud Application Platform

VMware Cloud Application Platform VMware Cloud Application Platform Jerry Chen Vice President of Cloud and Application Services Director, Cloud and Application Services VMware s Three Strategic Focus Areas Re-think End-User Computing Modernize

More information

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico

Scaling the Yelp s logging pipeline with Apache Kafka. Enrico Scaling the Yelp s logging pipeline with Apache Kafka Enrico Canzonieri enrico@yelp.com @EnricoC89 Yelp s Mission Connecting people with great local businesses. Yelp Stats As of Q1 2016 90M 102M 70% 32

More information

Webinar Series TMIP VISION

Webinar Series TMIP VISION Webinar Series TMIP VISION TMIP provides technical support and promotes knowledge and information exchange in the transportation planning and modeling community. Today s Goals To Consider: Parallel Processing

More information

JStorm Based Network Analytics Platform. Alibaba Cloud Senior Technical Manager, Biao Lyu

JStorm Based Network Analytics Platform. Alibaba Cloud Senior Technical Manager, Biao Lyu JStorm Based Network Analytics Platform Alibaba Cloud Senior Technical Manager, Biao Lyu Overview of Alibaba Cloud 18 Regions 150+ Products 1Million+ Customers Comprehensive Networking Product Family 12

More information

Postprint.

Postprint. http://www.diva-portal.org Postprint This is the accepted version of a paper presented at The 4th International Workshop on Community Networks and Bottom-up-Broadband(CNBuB 2015), 24-26 Aug. 2015, Rome,

More information

A Whirlwind Tour of Apache Mesos

A Whirlwind Tour of Apache Mesos A Whirlwind Tour of Apache Mesos About Herdy Senior Software Engineer at Citadel Technology Solutions (Singapore) The eternal student Find me on the internet: _hhandoko hhandoko hhandoko https://au.linkedin.com/in/herdyhandoko

More information

Advanced Data Processing Techniques for Distributed Applications and Systems

Advanced Data Processing Techniques for Distributed Applications and Systems DST Summer 2018 Advanced Data Processing Techniques for Distributed Applications and Systems Hong-Linh Truong Faculty of Informatics, TU Wien hong-linh.truong@tuwien.ac.at www.infosys.tuwien.ac.at/staff/truong

More information

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F.

Applied Spark. From Concepts to Bitcoin Analytics. Andrew F. Applied Spark From Concepts to Bitcoin Analytics Andrew F. Hart ahart@apache.org @andrewfhart My Day Job CTO, Pogoseat Upgrade technology for live events 3/28/16 QCON-SP Andrew Hart 2 Additionally Member,

More information

Container 2.0. Container: check! But what about persistent data, big data or fast data?!

Container 2.0. Container: check! But what about persistent data, big data or fast data?! @unterstein @joerg_schad @dcos @jaxdevops Container 2.0 Container: check! But what about persistent data, big data or fast data?! 1 Jörg Schad Distributed Systems Engineer @joerg_schad Johannes Unterstein

More information

Big Data Infrastructures & Technologies

Big Data Infrastructures & Technologies Big Data Infrastructures & Technologies Data streams and low latency processing DATA STREAM BASICS What is a data stream? Large data volume, likely structured, arriving at a very high rate Potentially

More information

Cloud Scale IoT Messaging

Cloud Scale IoT Messaging Cloud Scale IoT Messaging EclipseCon France 2018 Dejan Bosanac, Red Hat Jens Reimann, Red Hat IoT : communication patterns Cloud Telemetry 2 Inquiries Commands Notifications optimized for throughput scale-out

More information

Research Faculty Summit Systems Fueling future disruptions

Research Faculty Summit Systems Fueling future disruptions Research Faculty Summit 2018 Systems Fueling future disruptions Elevating the Edge to be a Peer of the Cloud Kishore Ramachandran Embedded Pervasive Lab, Georgia Tech August 2, 2018 Acknowledgements Enrique

More information

Zombie Apocalypse Workshop

Zombie Apocalypse Workshop Zombie Apocalypse Workshop Building Serverless Microservices Danilo Poccia @danilop Paolo Latella @LatellaPaolo September 22 nd, 2016 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

More information

Online Bill Processing System for Public Sectors in Big Data

Online Bill Processing System for Public Sectors in Big Data IJIRST International Journal for Innovative Research in Science & Technology Volume 4 Issue 10 March 2018 ISSN (online): 2349-6010 Online Bill Processing System for Public Sectors in Big Data H. Anwer

More information

Data pipelines with PostgreSQL & Kafka

Data pipelines with PostgreSQL & Kafka Data pipelines with PostgreSQL & Kafka Oskari Saarenmaa PostgresConf US 2018 - Jersey City Agenda 1. Introduction 2. Data pipelines, old and new 3. Apache Kafka 4. Sample data pipeline with Kafka & PostgreSQL

More information