Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT

Similar documents
Evaluating Classifiers

Evaluating Classifiers

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Tutorial on Machine Learning Tools

Performance Measures

Clustering algorithms and autoencoders for anomaly detection

Machine Learning A W 1sst KU. b) [1 P] Give an example for a probability distributions P (A, B, C) that disproves

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Improving the way neural networks learn Srikumar Ramalingam School of Computing University of Utah

COMP 551 Applied Machine Learning Lecture 13: Unsupervised learning

Index. Umberto Michelucci 2018 U. Michelucci, Applied Deep Learning,

Classification Algorithms in Data Mining

Chapter 3: Supervised Learning

Grundlagen der Künstlichen Intelligenz

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

Network Traffic Measurements and Analysis

Neural Network Optimization and Tuning / Spring 2018 / Recitation 3

Contents. Preface to the Second Edition

Analyzing Flow-based Anomaly Intrusion Detection using Replicator Neural Networks. Carlos García Cordero Sascha Hauke Max Mühlhäuser Mathias Fischer

Evaluation of Machine Learning Algorithms for Satellite Operations Support

Outlier detection using autoencoders

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

1) Give decision trees to represent the following Boolean functions:

CSE 5526: Introduction to Neural Networks Radial Basis Function (RBF) Networks

Neural Network Weight Selection Using Genetic Algorithms

Louis Fourrier Fabien Gaie Thomas Rolf

5 Learning hypothesis classes (16 points)

Deep Learning. Volker Tresp Summer 2014

Probabilistic Classifiers DWML, /27

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Neural Networks and Deep Learning

MIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA

An Abnormal Data Detection Method Based on the Temporal-spatial Correlation in Wireless Sensor Networks

Machine Learning. Chao Lan

CS4491/CS 7265 BIG DATA ANALYTICS

TCP Spatial-Temporal Measurement and Analysis

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

Machine Learning A WS15/16 1sst KU Version: January 11, b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

CS 6501: Deep Learning for Computer Graphics. Training Neural Networks II. Connelly Barnes

ISyE 6416 Basic Statistical Methods - Spring 2016 Bonus Project: Big Data Analytics Final Report. Team Member Names: Xi Yang, Yi Wen, Xue Zhang

Data Mining and Knowledge Discovery Practice notes 2

Logistic Regression: Probabilistic Interpretation

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Neural Network Neurons

Context-aware Automotive Intrusion Detection

CS6220: DATA MINING TECHNIQUES

Regularization and model selection

CSC411/2515 Tutorial: K-NN and Decision Tree

Machine Learning A W 1sst KU. b) [1 P] For the probability distribution P (A, B, C, D) with the factorization

Data Mining and Knowledge Discovery: Practice Notes

Detection of Video Anomalies Using Convolutional Autoencoders and One-Class Support Vector Machines

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Classification. Slide sources:

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Energy Based Models, Restricted Boltzmann Machines and Deep Networks. Jesse Eickholt

Data Mining and Knowledge Discovery: Practice Notes

List of Exercises: Data Mining 1 December 12th, 2015

Supervised Learning for Image Segmentation

1. Network Function Virtualization (NFV) 2. NFV on multiple clouds. 4. Fault detection using Shallow Learning. 5. Fault location using Deep Learning

IBL and clustering. Relationship of IBL with CBR

Deep Learning for Computer Vision

Machine Learning Classifiers and Boosting

Report: Privacy-Preserving Classification on Deep Neural Network

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Handwritten Hindi Numerals Recognition System

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

RESAMPLING METHODS. Chapter 05

Anomaly detection in wide area network mesh using two machine learning anomaly detection algorithms

Combined Weak Classifiers

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

CPSC 340: Machine Learning and Data Mining. Deep Learning Fall 2016

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

Stacked Denoising Autoencoders for Face Pose Normalization

Lab 8 CSC 5930/ Computer Vision

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

Radial Basis Function (RBF) Neural Networks Based on the Triple Modular Redundancy Technology (TMR)

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Opening the Black Box Data Driven Visualizaion of Neural N

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018

A Dendrogram. Bioinformatics (Lec 17)

Creating Situational Awareness with Spacecraft Data Trending and Monitoring

Early detection of Crossfire attacks using deep learning

Lecture #17: Autoencoders and Random Forests with R. Mat Kallada Introduction to Data Mining with R

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Machine Learning for Pre-emptive Identification of Performance Problems in UNIX Servers Helen Cunningham

Pouya Kousha Fall 2018 CSE 5194 Prof. DK Panda

CS145: INTRODUCTION TO DATA MINING

Intro to Artificial Intelligence

Lecture : Neural net: initialization, activations, normalizations and other practical details Anne Solberg March 10, 2017

Deep Neural Networks Optimization

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

COMP 551 Applied Machine Learning Lecture 16: Deep Learning

Software Documentation of the Potential Support Vector Machine

3D Convolutional Neural Networks for Landing Zone Detection from LiDAR

Machine Learning and Bioinformatics 機器學習與生物資訊學

Classification. Instructor: Wei Ding

Transcription:

Distributed Anomaly Detection using Autoencoder Neural Networks in WSN for IoT Tony T. Luo, Institute for Infocomm Research, A*STAR, Singapore - https://tonylt.github.io Sai G. Nagarajan, Singapore University of Technology and Design IEEE ICC 2018

Introduction Anomalies (a.k.a. outliers): Data that do not conform to the patterns exhibited by the majority of data set e.g. equipment faults, sudden environmental changes, security attacks Conventional approach to anomaly detection: Mainly handled by Backend Disadvantages: inefficient use of resources (bandwidth & energy); delay Other prior work: Threshold-based detection with Bayesian assumptions [2] Classification using knn or SVM [3,6] Distributed detection based on local messaging [4,5] Disadvantages: computationally expensive, large communication overhead

Our approach Objective: push the task to the edge Challenges: sensors are resource-scarce Introducing autoencoder neutral networks A deep learning model traditionally used in image recognition and spacecraft telemetry data analysis But DL is generally not suitable for WSN! We build a three-layer autoencoder neutral network with only one hidden layer, leveraging the power of autoencoder in reconstructing inputs We design a two-part algorithm, residing on sensors and IoT cloud, respectively: Sensors perform distributed anomaly detection, without communicating with each other IoT cloud handles the computation-intensive learning Only very infrequent communication between sensors and cloud is required

Contributions 1. First introduces autoencoder neutral networks into WSN to solve the problem of anomaly detection 2. Fully distributed 3. Minimal communication and edge computation load 4. Solves the common challenge of lacking anomaly training data

Preliminaries: autoencoder A special type of neural networks Objective is to reconstruct inputs instead of predicting a target variable Structure: Input layer: e.g., a time series of sensor readings Output layer: a clone of the inputs Hidden layers: encode the essential information of inputs

Preliminaries: autoencoder (cont d) Activation function: each represented by a neuron, usually a sigmoid function Hyperparameters: W: weights b: bias (the +1 node) Output at each neuron: Objective: minimize cost function i.e., Reconstruction error + Regularization term (to avoid overfitting)

System architecture Sensors Each runs an autoencoder to detect anomalies Sends inputs and outputs (in fact difference) of autoencoder to IoT cloud in low frequency Cloud Trains autoencoder model using the data provided by all sensors Sends updated model parameters (W, b) back to all the sensors

Anomaly detection Each sensor calculates reconstruction error (residual): Cloud calculates mean and variance over all sensors: D: # of days S: # of sensors Each sensor detects anomaly by calculating p: assuming residuals are Gaussian, p=2 corresponds to 5% are anomalies and 3 corresponds to 2.5%

Two-part algorithm Sensor: DADA-S Cloud: DADA-C Computational complexity: O(M 2 ) TPDS 13: O(2 M-1 )

Performance evaluation An indoor WSN testbed consisting of 8 sensors that measure temperature and humidity Data collected over 4 months (Sep Dec 2016) Synthetic anomalies generated using two common models: Spike: Burst: # of neurons: 720 (I/O layer), 504 (hidden layer; optimized using k-fold cross validation)

Reconstruction performance When no anomaly is present Recovered data (output) almost coincides with true data (input) model is validated

Varying anomaly magnitude Varying magnitude according to normal distribution N(μ, σ 2 ) Plot AUC w.r.t. both μ and σ 2 AUC > 0.8 in most cases, indicating a good classifier Lower AUC (0.5--0.8) appears when both μ and σ 2 are very small, which are insignificant deviations from the normal

Varying anomaly frequency Continues to perform well even when the # of anomalies is large

Adaptive to non-stationary environment Use two different configurations of training data: Random: new observations are randomized with the entire historic data Prioritized: most recent 14 days data mixed with another randomly chosen 14 days data TPR: Random performs better, because training data is less affected by changes FPR: Prioritized performs better, because autoencoder learns more from fresh inputs that contains more changes, thus recognizing some previous anomalies are no longer anomalies

Conclusion First introduces autoencoder neutral networks into WSN to solve the anomaly detection problem Fully distributed Minimal communication (zero among sensors) and minimal edge computation load (polynomial complexity) Solves the common challenge of lacking anomaly training data (by virtue of unsupervised learning) High accuracy and low false alarm (characterized by AUC) Adaptive to new changes in non-stationary environments

Connect via my research homepage: https://tonylt.github.io