Performance Modeling of IoT Applications

Similar documents
Performance Extrapolation for Load Testing Results of Mixture of Applications

Quantitative System Evaluation with Java Modelling Tools

Extrapolation Tool for Load Testing Results

Performance Extrapolation across Servers

The JMT Simulator for Performance Evaluation of Non-Product-Form Queueing Networks

Quantitative System Evaluation with Java Modeling Tools (Tutorial Paper)

Chapter 6 Queueing Networks Modelling Software (QNs) and Computing Language(s)

RESEARCH OF JAVA MODELING TOOLS USING FOR MARKOV SIMULATIONS

Shared-Memory Multiprocessor Systems Hierarchical Task Queue

Dr.-Ing. Thomas Goldschmidt, ABB Corporate Research, Ladenburg, Germany The Automation Cloud

Is Your Project in Trouble on System Performance?

Ch. 7: Benchmarks and Performance Tests

White Paper. Major Performance Tuning Considerations for Weblogic Server

REAL-TIME ANALYTICS WITH APACHE STORM

Determining the Number of CPUs for Query Processing

Fit for Purpose Platform Positioning and Performance Architecture

Message Queuing Telemetry Transport

2 TEST: A Tracer for Extracting Speculative Threads

A Quantitative Model for Capacity Estimation of Products

A queueing network model to study Proxy Cache Servers

Extracting Performance and Scalability Metrics From TCP. Baron Schwartz Postgres Open September 16, 2011

Nimble Storage Adaptive Flash

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

On Checkpoint Latency. Nitin H. Vaidya. In the past, a large number of researchers have analyzed. the checkpointing and rollback recovery scheme

VVD for Cloud Providers: Scale and Performance Guidelines. October 2018

Exploring Cloud Security, Operational Visibility & Elastic Datacenters. Kiran Mohandas Consulting Engineer

Interoperability First Published On: Last Updated On:

Decoupling Datacenter Studies from Access to Large-Scale Applications: A Modeling Approach for Storage Workloads

Performance and Scalability: Tuning, Testing, and Monitoring

Best Practices for Setting BIOS Parameters for Performance

Ch. 13: Measuring Performance

Network-Aware Resource Allocation in Distributed Clouds

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Performance and Scalability with Griddable.io

Future-ready IT Systems with Performance Prediction using Analytical Models

Cisco Tetration Analytics

ArcGIS Enterprise Performance and Scalability Best Practices. Andrew Sakowicz

SIMULATION OF NETWORK CONGESTION RATE BASED ON QUEUING THEORY USING OPNET

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Read Chapter 4 of Kurose-Ross

Multisensory Agricultural Monitoring Platform

CS268: Beyond TCP Congestion Control

DiPerF: automated DIstributed PERformance testing Framework

Certified Reference Design for VMware Cloud Providers

Scaling Up Performance Benchmarking

Data Analytics for IoT: Applications to Security and Privacy. Nick Feamster Princeton University

Lecture 5: Performance Analysis I

EsgynDB Enterprise 2.0 Platform Reference Architecture

Scalability Engine Guidelines for SolarWinds Orion Products

On the Use of Performance Models in Autonomic Computing

A comparison of UKCloud s platform against other public cloud providers

SaaS Providers. ThousandEyes for. Summary

3 Software Stacks for IoT Solutions. Ian Skerrett Eclipse

Cisco Tetration Analytics

Fig Data flow diagram and architecture when using the TCUP Cloud Server for PaaS for the Developers and large

Contact Center Assurance Dashboards

Forecasting Oracle Performance

Cloud Monitoring as a Service. Built On Machine Learning

POWER-ONE ITALY, 5 TH JUNE 2018 Cloud, Big Data & Cyber Security. Business, Opportunities and Risks

VULCAN. Stateful Traffic Generation and Analysis Test Applications. QoS Determining performance, responsiveness and stability of network devices.

Contact Center Assurance Dashboards

vsan Remote Office Deployment January 09, 2018

Hackveda Appsec Labs Java Programming Course and Internship Program Description:

Medical practice: diagnostics, treatment and surgery in supercomputing centers

An Introduction to Developing for Cisco Kinetic

Improving TCP Performance over Wireless Networks using Loss Predictors

PaaS Cloud mit Java. Eberhard Wolff, Principal Technologist, SpringSource A division of VMware VMware Inc. All rights reserved

TUTORIAL: WHITE PAPER. VERITAS Indepth for the J2EE Platform PERFORMANCE MANAGEMENT FOR J2EE APPLICATIONS

Exploring Workload Patterns for Saving Power

Capacity Planning for Application Design

ADAPTIVE AND DYNAMIC LOAD BALANCING METHODOLOGIES FOR DISTRIBUTED ENVIRONMENT

Native vsphere Storage for Remote and Branch Offices

Experiment-Driven Evaluation of Cloud-based Distributed Systems

BlackBerry AtHoc Networked Crisis Communication Capacity Planning Guidelines. AtHoc SMS Codes

Storage and Virtualization: Five Best Practices to Ensure Success. Mike Matchett, Akorri

PerfCenterLite: Extrapolating Load Test Results for Performance Prediction of Multi-Tier Applications

Dynamics 365. for Finance and Operations, Enterprise edition (onpremises) system requirements

Designing MQ deployments for the cloud generation

AWS Lambda and Cassandra

Developing Enterprise Cloud Solutions with Azure

Diffusing Your Mobile Apps: Extending In-Network Function Virtualisation to Mobile Function Offloading

Queueing Networks analysis with GNU Octave. Moreno Marzolla Università di Bologna

Performance Assurance in Virtualized Data Centers

Qlik Sense Performance Benchmark

Scalable Streaming Analytics

Barry D. Lamkin Executive IT Specialist Capitalware's MQ Technical Conference v

Datasheet FUJITSU Software Cloud Monitoring Manager V2.0

AWS Lambda. 1.1 What is AWS Lambda?

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Apigee Edge Cloud. Supported browsers:

Improving NoSQL Database Benchmarking

Apigee Edge Cloud. Supported browsers:

Diffusion TM 5.0 Performance Benchmarks

SMART LIGHTING SOLUTION

Axibase Time-Series Database. Non-relational database for storing and analyzing large volumes of metrics collected at high-frequency

Virtual CDN Implementation

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Cloudamize Agents FAQ

Design Space Exploration of Network Processor Architectures

Major Components of the Internet of Things Systems

Transcription:

Performance Modeling of IoT Applications Dr. Subhasri Duttagupta, TCS www.cmgindia.org 1

Contents Introduction to IoT System Performance Modeling of an IoT platform Performance Modeling of a sample IoT Application Performance Modeling of a real-life WSN system Summary 2

Motivating Examples Scalability Analysis WNA Lab, Amrita University End-to-End Delay Analysis 3

Elements of a Typical IoT Platform Sensor Data Management Developers Device Agents LWM2M Device Agents Device Management Message Routing & Event Processing Apps API RESTful http(s), tcp, udp, mqtt OPC-UA, Modbus, Continua Analytics Things with Embedded Sensors Gateway Devices Cloud Services Apps, Clients & Portals Mobile Devices A high performance, scalable platform for Internet-of-Things 4

Questions that Performance Modelling can answer When does any of the subsystem become a bottleneck As the sensor data rate increases As the number of queries injected by users increases How many VMs a particular subsystem needs to handle a certain load Is the SLA going to be met for a certain growth in the number of users What kind of performance modelling is useful in a specific situation 5

Challenges in Performance Modelling of IoT Diverse Technology Huge number of Smart Devices of various types Lack of suitable testing platform and tool for testing End-to-End system Difficult to Predict exact workload mix Addition of frequent new services and devices Changes in Deployment platform 6

Modeling Background Open Systems characterized by rate of arrival Closed Systems are characterized by No of users and think time Input to the Model Service Demand Amount of CPU/Disk time spent in serving one unit of Output (Utilization of resource / Throughput) 7

What Questions Modelling helps in Answering Which component becomes the bottleneck when the system is handling peak data rate from a number of sensors How many No of VMs required at each layer For a certain no of sensors with a response time SLA For a certain no of API clients and with a Response time SLA Can the system handle a certain growth of users accessing the IoT services Can the platform support different types of APIs (random access query, range query, sequential scan query) simultaneously without affecting the performance 8

Steps in Modelling Exercise 1. Understand the architecture Identify the components that are significant 2. Analyse the commonly used workloads and their parameters 3. Do performance test or analyse available data to obtain Service Demands for each type of workload 4. Analyse workload to find out any variation in service demands based on certain conditions 5. Decide the flow of requests within the model Attach probability to various alternate flows 9

Architecture Diagram of a Subsystem Sensor Observation Services API ID Generation Service Pulls and Queues ID ID Spring MVC Spring MVC SOS SOS Phoenix Phoenix JDBC JDBC Derby Derby Tomcat NIO Servers Tomcat NIO Servers REST Clients Observation & Audit Trails Hazelcast Distributed Cache Message Exchange Hbase Cluster Phoenix Coprocessor on Region Servers 10

Workloads for Sensor Observation Services SOS Different APIs GetObs latest, by Sensor, Time range PostObs Get/Post Sensor Get Feature get features of sensors Get Capability get capability of SOS Find out whether API output depends on the parameters passed 11

JMT: Powerful Java Modelling Tool Developed since 2002 by 10+ generations of PG and UG students at Politecnico di Milano and Imperial College London http://jmt.sourceforge.net/ JMT is open source: GPL v2 size: ~4,000 classes; 21MB code; ~200k lines Download the jar file and simply run java jar JMT.tar M.Bertoli, G.Casale, G.Serazzi. JMT: performance engineering tools for system modeling. ACM SIGMETRICS Performance Evaluation Review, Volume 36 Issue 4, New York, US, March 2009, 10-15, ACM press 12

JMT Java Modeling Tools JSIMgraph - Queueing network models simulator with graphical user interface JSIMwiz - Queueing network models simulator with wizard-based user interface JMVA - Mean Value Analysis and Approximate solution algorithms for queueing network models JABA - Asymptotic Analysis and bottlenecks identification of queueing network models JWAT - Workload characterization from log data JMCH - Markov chain simulator 13

Analytical Modeling of SOS Model Delay stations Queuing stations Postgres version HBase version 14

Modeling of SOS with PostGres Think time No of Threads SOS Modules Performance Testing Tool Throughput Utilization Predicting Max Throughput, Response time for a specific Deployment Predicting performance for a different Deployment Use of Performance Mimicking Benchmarks [Duttagupta, IoT 2016] 15

Performance of a Single API on AWS (using Postgres) Throughput (Txn/sec) Response time (in ms) 140 120 100 PostObs on AWS 3500 3000 2500 PostObs on AWS 80 60 40 2000 1500 1000 Response time ~ 2s 20 500 0 0 100 200 300 400 500 600 No of Users 0-500 0 100 200 300 400 500 600 No of Users Actual Throughput Predicted Throughput Actual Response Time Predicted Response Time Throughput saturates at 256 users and at 123 trans/sec. Response time increases more than 1 sec beyond 256 users For scaling to higher no of users, we need to add more VMs to Tomcat 16

Throughput Response Time Mixed API for a different Datastore (Hbase) GetObsLatest + PostObs GetObsLatest + PostObs 200 150 100 50 0 0 100 200 300 400 500 600 No of Users 1200 1000 800 600 400 200 0 0 100 200 300 400 500 600 No of Users Actual Response time Predicted Response time Actual Throughput Predicted Throughput PostObs SD=12.5 ms GetObsLatest SD= 5.2 ms GetObsLatest 10% threads and PostObservation 90% threads Modeling helps in performance of Mixed API given that of Single API 17

Throughput (Txn/sec) Response time (in ms) Does API support Horizontal Scalability? PostObs with 2VMs on AWS PostObs with 2VMs on AWS 250 2000 200 150 100 50 0 0 200 400 600 800 1000 No of Users 1500 1000 500 0 Response time ~ 0.5s 0 200 400 600 800 1000 No of Users Actual Throughput Predicted Throughput Actual Response Time Predicted Response Time With 2 Tomcat VMs, Application scales up-to 512 users. - Model predicts API to have linear scalability and actual test results show that two VMs scales to twice the number of users without increasing response time. 18

Number of VMs Required for a resp time SLA Response time SLA = 1 sec Tomcat layer is scaling horizontally PostGres VM needs to be upgraded to bigger VM beyond 1280 users 19

Modeling Challenge Service demand variability Service Demand varies with higher no of threads, it also depends on the API API No of threads Tomcat Service Demand PostObs 64 11.2 ms 768 8.5 ms 1024 7.2 ms GetObsLatest 64 8.7 ms 256 3.8 ms 512 2.4 ms GetObsbySensor 32 44 ms 64 50.8 ms 128 68.4 ms 20

Modeling of a Rule Processing Engine Factors Impacting Performance Complexity of Rule applied on messages Payload of messages Inter-arrival delay of consecutive messages No of rules applied to an observation No of message producers and consumers Server/VM architecture We consider the effect of message rate and complexity of rules 21

Architecture of a Rule Processing Engine Messages first come to Tenant RabbitMQ based on API keys Then Rule processing engine routes them to various topic exchanges Multiple Topic MQs exist for multiple tenants with same topic 22

Model of a Rule Processing Engine 23

Latency (ms) Performance of a Message Routing System 450 400 350 300 250 200 150 100 50 0 Latency in MQ Module 0 200 400 600 800 1000 1200 1400 Flow rate msg/sec Actual latency Predicted Latency Flow rate Latency RabbitMQ CPU% 900/s 8 ms 80% 975/s 13 ms 87.4% 1050/s 1.9 8 sec 93.5% Message latency is very low until either RabbitMQ or MR VM saturates. Once Util > 90%, Latency can increase rapidly due to Queue build up at the server. 24

Combined Model with SOS sending data to MQ PostObs request gets forked after processing at SOS and gets processed by Rabbit MQ + MR engine Modeled using a combination of Open Request and Closed Request [ACM IoT 2016] 25

We have seen Performance Analysis of two subsystems on IoT Platform Next Performance Analysis of a Sample Application on IoT Platform 26

Sample App Energy Monitoring System 27

Architecture Data flows from Platform to backend 28

Modeling Problems How many more buildings the current infrastructure can support? How many Online users dashboard can support with the present deployment for typical queries? 30

Q: How many more buildings can be supported Data comes from difference sources Occupancy data Energy Meter Readings Find out the distribution of Inter-arrival time of observation SOS log gives timestamps of arrival for each observation Two metrics are calculated from Inter-arrival time samples Mean and Standard Deviation 31

Backend Access Log We extract the timestamps of subsequent observations being posted. With the timestamp, we calculate the inter-arrival time of every observations. 32

Probability Distribution for Inter-arrival time 0.35 0.3 0.25 PDF of Inter-arrival time 0.2 0.15 0.1 0.05 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 Interarrival time (in ms) What trend does this distribution reflect? How do we compute parameters corresponding to this distribution? If it is Exponential distribution and Mean = 37.2, std deviation should have been also 37.2 but it is 45.6 33

What is hyper-exponential distribution We need to find out a set of exponential distributions and their probability to match the distribution of data n i=0 p i λ i 34

Fitting distribution to Data We can find out µ 1 µ 2 and their probabilities so that it matches with desired mean and standard deviation. µ 1 = 22, p1 = 0.6 We obtain mean = 37.2 and STD = 45.6 µ 2 = 60.1, p2 = 0.4 But Ideally we should use a function to obtain the correct mixture so that it matches the shape of the distribution. 35

Q: How many online users can be supported What if Performance Testing is not an option? Rely on Utilization monitoring interface Find the Trend of the utilization data to derive Min, Max, Average. History is for 1 day, last 7 days or 1 month Is the utilization due to one type of workload? 36

When No Dashboard Queries are running Utilization is due to data injection and Alert Queries Higher utilization at every 25-30 mins and remains high for 15 mins, CPU% varies between 11%-20% 37

Trend over a day what s max, average? 38

Deriving Service demands for Different Workloads We compute the service demand based on mean throughput derived from the log Front-end Client node handles traffic from only Dashboard Backend Data node handles traffic from Dashboard as well as from the sensor backend Sensor Data Dashboard Queries ES Backend Alert Queries 39

Deriving Service demands for ES datanode Use of Least Square Technique U ES = X Data D Data + X Alert D Alert + X DashQ D DashQ X Data = Throughput for Data Insertion D Data = Service Demand for Data insertion 40

Model for Energy Monitoring Subsystem 41

Take aways from Modelling of Sample App Throughput in an Open system mostly remains unchanged Iowait% occurs due to logging of Debug/Info Utilization over a short duration can become very high even at low Concurrency Model can be built based on data from production log and utilization information 42

What other Modeling Techniques we can use Markov Chains can be used when we can model system as a set of states Need to know States, Their Transition rates Example of a real-life Land slide Monitoring systems A Number of different types of sensors are used Rain gauge, pore Pressure, Humidity, Movement Makes decisions based on the value of sensor readings 43

Deriving Parameters for the Markov Chain r(t) < Th 1 Power drain OFF Power drain S r S 234 ON r(t) > Th 1 r(t) > Th 3 p(t) > Th p r(t) < Th 3 r(t) > Th 2 m(t) > Th m S rm S mp r(t) < Th 2 WNA Lab, Amrita University 44

Summary Things we covered Basics of Performance Modeling using Queuing Networks Given an architecture, how to build the model for the system Performance Modeling of an IoT Platform Closed System, Open System, No of VMs required, Scalability analysis Performance Modeling of a Sample Application running on an IoT Platform Gathering inter-arrival time distribution, Deriving service demands of workload Outcome of a Performance Model Other Modeling Techniques Markov Chains 45

Important Resources M.Bertoli, G.Casale, G.Serazzi. User-Friendly Approach to Capacity Planning Studies with Java Modelling Tools. Int.l ICST Conf. on Simulation Tools and Techniques, SIMUTools 2009, Rome, Italy, 2009, ACM press S. Kounev, and A. Buchmann, Performance modeling and evaluation of large-scale J2EE applications, In Proceedings of the Computer Measurement Group's Conference, 2003. E. Lazowska, J. Zahorjan, G. Graham and K. Sevcik, Quantitative System Performance: Computer System Analysis Using Queueing Network Models, Prentice-Hall, 1984 Performance Modeling and Design of Computer Systems: Queueing Theory in Action, by Prof. Mor Harchol-Balter A gentle introduction to some basic queuing concepts, by William Stallings. Automatically Determining Load Test Duration Using Confidence Intervals, R Mansharamani, S Duttagupta, A Nehete, CMG India, Pune, 2014 Subhasri Duttagupta, Rajesh Mansharamani. Extrapolation Tool for Load Testing Results, Int. Symposium for Performance Evaluation of Computer System and Telecommunication System, 2011 Subhasri Duttagupta, Mukund Kumar and Manoj Nambiar, Performance Modeling of IoT applications, 6 th ACM Conference on Internet of Things, IoT 2016. 46

Open Issues How to take care of variability of technology used by various sensors for connecting to IoT system How Service demand varies with load or higher flow rate What are the fundamental limits for a technology stack for a certain kind of workload 47

48