REAL-TIME ANALYTICS WITH APACHE STORM

REAL-TIME ANALYTICS WITH APACHE STORM Mevlut Demir PhD Student

IN TODAY S TALK 1- Problem Formulation 2- A Real-Time Framework and Its Components with an existing applications 3- Proposed Framework 4- Conclusion

1- INTRODUCTION Number of IoT devices increased. - currently ~7 billion,by 2020 ~50 billion (exponentially growing) - low manufacturing costs - availability of internet connections IoT devices consist of : - CPU - memory storage - a wireless connection IoT devices equipment with: - sensors (produce data) - actuators ( capable of receiving commands)

1- INTRODUCTION An example of IoT in modern life : Robots; - limited on-board computation power - generates large amount of data Challenges: - latency - computation needs (limits the robot s mobility due to weights and power demands) *Google Images

1- INTRODUCTION Solution: - scalable data processing platforms -> CLOUD It is a model for enabling ubiquitous, on-demand access to a shared pool of configurable computing resources (e.g., computer networks, servers, storage, applications and services), which can be rapidly provisioned and released with minimal management effort.[9] - becoming the standard computation Advantages of using central data processing: - the ability to easily draw from vast stores of information, - efficient allocation of computing resources, - a proclivity for parallelization.

1.1- REQUIREMENTS FOR IOT DEVICES Data transfer should be in an efficient and scalable manner. - Traditional GET/POST approach is not suitable because this approach increases latency and network traffic. Parallel processing Real-time analysis Batch analysis

2. A REAL-TIME ARCHITECTURE Gateway layer: Drivers are deployed in gateway layer. Publish-subscribe messaging layer Cloud-based big data processing layer: Apache Storm Process data and send back to the device. IoT Cloud Architecture [1]

2.1- GATEWAY LAYER Gateway layer [2] Each has a unique ID Gateway master responsible for: - Control gateways - Deploy/undeploy & start/stop the drivers Gateways responsible for: - Managing drivers - Managing connections to the brokers - Handling the load balancing of the device data to the brokers - Update the gateway master - Update state information of gateways in a Zookeeper.

2.1- GATEWAY LAYER Each channel has a unique name Driver: - Data bridge between a device and the cloud app. - Responsible for data conversion - Has name and set of communication channels - Can be deployed multiple times MQ Layer[2]

2.2- MESSAGING LAYER RabbitMQ - Topic based publish subscribe broker - Has a rich API ; topics can be easily created. - Supports Advance Message Queuing Protocol(AMQP) and Message Queue Telemetry Transport (MQTT) - Low latency - Creates lightweight topics RabbitMQ [3]

2.2- MESSAGING LAYER Kafka - Topic based publish subscribe broker - Messages are appended to commit log - Topics are divided into partitions - Consumer can read the same topic in parallel - Has its own messaging protocol - Does not support AMQP or MQTT Kafka[4]

2.3- ZOOKEEPER - Need to detect online and offline devices - Storm requires coordination among the processing units, because of its distributed nature Discovery[2]

2.4- PROCESSING LAYER Apache Storm - Fault tolerant - Horizontally scalable - Handles large amount of streaming data - Open source - Message guarantees - Simple programming model - Supports multi programming language

2.4- PROCESSING LAYER Apache Storm Concept - Stream: Storm data model -> unbounded sequence tuple - Spout - Bolt - Topology Directed acrylic graph Vertices: computation Edges: stream of data tuple Apache Storm[5]

2.4- PROCESSING LAYER Apache Storm - Grouping Twitter[6]

2.4- PROCESSING LAYER Apache Storm Storm cluster[5]

2.4- PROCESSING LAYER Apache Storm Topology

2.5- WRAP UP IoT Cloud [2]

3- EXISITING APPLICATIONS Turtlebot [7] TurtleBot follows a large target in front of it by trying to maintain a constant distance to the target. Compressed depth images of the Kinect camera are sent to the cloud and the processing topology calculates command messages, in the form of velocity vectors, in order to maintain a set distance from the large object in front of TurtleBot.

3- EXISITING APPLICATIONS Storm Nimbus and Zookeeper -> 1 node Gateway -> 2 nodes Storm supervisors -> 3 nodes Brokers -> 2 nodes An instance of medium flavor has 2 VCPUs, 4GB of memory, and 40GB of HDD. 4 spouts and 4 bolts are running in parallel.

3- EXISITING APPLICATIONS Cloud Drivers[8]

3- EXISITING APPLICATIONS Latency with RabbitMQ Latency with Kafka *[2]

3- EXISITING APPLICATIONS Latency observed in TurtleBot application. *[2]

4- CONCLUSION Introduction to a scalable, distributed architecture and its component. Apache storm is leading real-time processing engine. RabbitMQ can be chosen when latency is requirement. Proof of concept was verified by an example. Proposed a new framework.

5- REFERENCES [1] Kamburugamuve, Supun, et al. "Cloud-based parallel implementation of slam for mobile robots." Proceedings of the International Conference on Internet of things and Cloud Computing. ACM, 2016. [2] Kamburugamuve, Supun, Leif Christiansen, and Geoffrey Fox. "A framework for real time processing of sensor data in the cloud." Journal of Sensors 2015 (2015). [3] http://www.rabbitmq.com/ [4] http://kafka.apache.org/ [5] http://storm.apache.org/ [6] http://www.twitter.com/ [7] http:// www.turtlebot.com [8] He, Hengjing, et al. "Cloud based real-time multi-robot collision avoidance for swarm robotics." International Journal of Grid and Distributed Computing, May 7 (2015). [9] http:// www.wikipedia.com [10] http:// www.tensorflow.org [11] http:// www.kubernetes.io [12] http:// www.github.com

Q&A

THANK YOU