RESISTING RELIABILITY DEGRADATION THROUGH PROACTIVE RECONFIGURATION

Similar documents
Hidden Markov Models. Slides adapted from Joyce Ho, David Sontag, Geoffrey Hinton, Eric Xing, and Nicholas Ruozzi

ε-machine Estimation and Forecasting

Using Hidden Markov Models to analyse time series data

IoT Week Workshop on Globally Interoperable IoT Identification and Data Processing Identifiers in IoT

Skill. Robot/ Controller

HMM and Auction-based Formulations of of ISR Coordination Mechanisms for for the Expeditionary Strike Group Missions

Parallel HMMs. Parallel Implementation of Hidden Markov Models for Wireless Applications

Constraints in Particle Swarm Optimization of Hidden Markov Models

Hidden Markov Models in the context of genetic analysis

ECE521: Week 11, Lecture March 2017: HMM learning/inference. With thanks to Russ Salakhutdinov

Introduction to SLAM Part II. Paul Robertson

AERONAUTICAL CHANNEL SIMULATION IN NETWORK SIMULATORS FOR INCORPORATION INTO OPNET

Section 4 General Factorial Tutorials

Modeling time series with hidden Markov models

Spring Localization II. Roland Siegwart, Margarita Chli, Juan Nieto, Nick Lawrance. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Evolving the CORBA standard to support new distributed real-time and embedded systems

Intelligent Fault Diagnosis and Recovery in Power Electronic Systems

Software Architecture--Continued. Another Software Architecture Example

Jo-Car2 Autonomous Mode. Path Planning (Cost Matrix Algorithm)

Robotics: Science and Systems

Hidden Markov Models. Mark Voorhies 4/2/2012

Genome 559. Hidden Markov Models

HIDDEN MARKOV MODELS AND SEQUENCE ALIGNMENT

AirTight: A Resilient Wireless Communication Protocol for Mixed- Criticality Systems

MauveDB: Statistical Modeling inside Database Systems. Amol Deshpande, University of Maryland

Introduction to Mobile Robotics Path Planning and Collision Avoidance

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Toward Architecture-Based Reliability Estimation

Separating Speech From Noise Challenge

Stephen Scott.

Page 1. Interface Input Modalities. Lecture 5a: Advanced Input. Keyboard input. Keyboard input

Automatic Recovery from Software Failure: A Model-Based Approach to Self-Adaptive Software

Introduction to Mobile Robotics Path Planning and Collision Avoidance. Wolfram Burgard, Maren Bennewitz, Diego Tipaldi, Luciano Spinello

A Runtime Dependability Evaluation Framework for Fault Tolerant Web Services

RoSES. Robust Self-configuring Embedded Systems ENGINEERING. Prof. Philip Koopman

Spring Localization II. Roland Siegwart, Margarita Chli, Martin Rufli. ASL Autonomous Systems Lab. Autonomous Mobile Robots

Attack Resilient State Estimation for Vehicular Systems

Online structured learning for Obstacle avoidance

Eventual Consistency Today: Limitations, Extensions and Beyond

Fault Tolerance. The Three universe model

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

Autonomous Navigation of Humanoid Using Kinect. Harshad Sawhney Samyak Daga 11633

COS Lecture 13 Autonomous Robot Navigation

Mining Distributed Frequent Itemset with Hadoop

Aerial Robotic Autonomous Exploration & Mapping in Degraded Visual Environments. Kostas Alexis Autonomous Robots Lab, University of Nevada, Reno

D-BAUG Informatik I. Exercise session: week 5 HS 2018

>>> SOLUTIONS <<< Answer the following questions regarding the basics principles and concepts of networks.

Robot Motion Planning

Basic Concepts of Reliability

Latent Space Model for Road Networks to Predict Time-Varying Traffic. Presented by: Rob Fitzgerald Spring 2017

HMM Based Semi-Supervised Learning for Activity Recognition

DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION

Threshold-Based Markov Prefetchers

Distributed Systems Principles and Paradigms

Package HMMCont. February 19, 2015

Final Exam Practice Fall Semester, 2012

Hidden Markov Model for Sequential Data

An object of research has changed System events System renewal Mathematical computer tools reliability dependability

Maximizing Availability With Hyper-Converged Infrastructure

Latent Variable Models for Structured Prediction and Content-Based Retrieval

Toward Architecture-based Reliability Estimation

MauveDB: Statistical Modeling inside Database Systems

Licensing Number Assignments and Porting Call Routing Reporting and Analytics Moving To Teams

Detection and Tracking of Moving Objects Using 2.5D Motion Grids

Toward Intrusion Tolerant Clouds

Grid-Based Models for Dynamic Environments

A Thorough Introduction to 64-Bit Aggregates

CONTROL ALGORITHMS FOR SPACE TUG RENDEZVOUS. Statement of Project Fall Author: Timothée de Mierry

Proc. 14th Int. Conf. on Intelligent Autonomous Systems (IAS-14), 2016

Using HMM in Strategic Games

Department of Defense Fiscal Year (FY) 2014 IT President's Budget Request Defense Advanced Research Projects Agency Overview

Evaluation of Moving Object Tracking Techniques for Video Surveillance Applications

Distributed Systems Principles and Paradigms. Chapter 01: Introduction

Distributed Systems Principles and Paradigms. Chapter 01: Introduction. Contents. Distributed System: Definition.

Non-Homogeneous Swarms vs. MDP s A Comparison of Path Finding Under Uncertainty

Department of Defense Emerging Needs for Standardization

A reevaluation and benchmark of hidden Markov Models

Chapter 18 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Resilience Design Patterns: A Structured Approach to Resilience at Extreme Scale

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

15-780: Problem Set #4

A Thorough Introduction to 64-Bit Aggregates

Chapter 3. Speech segmentation. 3.1 Preprocessing

Data Replication Buying Guide

UAV Autonomous Navigation in a GPS-limited Urban Environment

Remote Operation Services

CA464 Distributed Programming

Mississippi Emergency Management Agency. Shawn Wise. Office Of Preparedness

CS839: Probabilistic Graphical Models. Lecture 10: Learning with Partially Observed Data. Theo Rekatsinas

PROTEIN MULTIPLE ALIGNMENT MOTIVATION: BACKGROUND: Marina Sirota

NOVEL HYBRID GENETIC ALGORITHM WITH HMM BASED IRIS RECOGNITION

How Small Schools Realize the BIG promise of BYOT. JD Ferries-Rowe, Head Geek, Brebeuf Jesuit, Indianapolis Thurs, Jan. 26; 3:20pm North 320E #FETC

Challenges in large-scale graph processing on HPC platforms and the Graph500 benchmark. by Nkemdirim Dockery

The Method of User s Identification Using the Fusion of Wavelet Transform and Hidden Markov Models

CS273: Algorithms for Structure Handout # 4 and Motion in Biology Stanford University Thursday, 8 April 2004

Conditional Random Fields - A probabilistic graphical model. Yen-Chin Lee 指導老師 : 鮑興國

CS545 Project: Conditional Random Fields on an ecommerce Website

Robot Motion Control and Planning

ロボット情報工学特論. Advanced Information Engineering for Robotics 第 3 回 :AIの基礎. ~ 探索問題 ~ No.03:Search problem

USING ARTIFACTORY TO MANAGE BINARIES ACROSS MULTI-SITE TOPOLOGIES

Transcription:

RESISTING RELIABILITY DEGRADATION THROUGH PROACTIVE RECONFIGURATION Summarized by Andeep Toor By Deshan Cooray, Sam Malek, Roshanak Roshandel, and David Kilgore

Definitions RESIST REsilient SItuated SofTware system A framework for mission-critical systems Situated Systems (SS) Embedded Mobile Pervasive Ex. Mobile devices, robots Mission Critical Ex. Emergency response, disaster recovery

Core Problem Mission critical systems require high reliability Situated systems are inherently unreliable External factors play a huge role in this The best configuration for a system is known only at runtime Need to update configuration to improve reliability How do we design such a system?

What does RESIST do? Self-healing / Self-optimizing

What assumptions does RESIST make? Errors are assumed to be between components Errors internal to the component are not handled by this error model Configurations may have replicas of components Replicas of different components fail independently

How is RESIST different? Optimizes proactively Uses predictive models to optimize ahead Focuses on where system is expected to be Different from other systems that focus on current state Note: Can only focus on near-future Considers external factors (Context) Other related work not appropriate Expects apriori knowledge of reliability Do not consider context

How does RESIST work? Determine optimal configuration of components for SS Optimal = most reliable Calculate individual component reliabilities Calculate total system reliability This is based on individual component reliabilities Consider architectural factors Redundant components Assignment of components to processes

Scenario Emergency response Firefighters Central dispatcher Robots Components Sensors Actuators Controllers

Calculating Component Reliability Uses Hidden Markov Models (HMMs) Normal Markov Model a 13 a 12 a 23 S 1 S 2 S 3 a 21

Calculating Component Reliability Normal Markov Model Can predict next state based purely on current state a 13 Set of states S = {S 1, S 2,..., S N } a 12 a 23 S 1 S 2 S 3 a 21 Transition matrix A = {a ij } This gives the probability of transitioning from one state to another

Calculating Component Reliability Hidden Markov Models (HMMs) HMMs extend this idea by adding hidden states a 13 a 12 a 23 S 1 S 2 S 3 e 11 a 21 e 21 e 22 e 32 e 33 e 34 O 1 O 2 O 3 O 4

Calculating Component Reliability Set of observations O = {O 1, O 2,..., O N } Observation probability matrix E = {e ik } This represents the probability of observing an event in a particular state a 13 a 12 a 23 S 1 S 2 S 3 e 11 a 21 e 21 e 22 e 32 e 33 e 34 O 1 O 2 O 3 O 4

Real State Transitions

Real State Transitions This represents the fact that a system can fail instead of transitioning to another state with a certain probability Note that the failed state can be reached from most other states

Training the HMM States are known Ex. Monitoring, moving Need to determine transition probability matrix Can learn this from monitoring data This gives us observations Train using sample data Baum-Welch algorithm Method for finding the hidden parameters in an HMM Uses expectation-maximization

Predictive Calculations Calculating reliability at runtime before failure Involves the use of context These are events or processes outside of the system that affect it Must be included in calculations for situated systems Introduce a new set of parameters: Set of contextual parameters C = {C 1, C 2,..., C N }

Using Context in Reliability Calculations a kj = u(a kj, C n ) a kj transition probability u a function that based on context Encapsulates the effect that C n has on a kj

Calculating Total System Reliability Based on individual component reliability k = Number of states R k = Reliability of exit state M = Matrix of size k x k I-M = Determinant of M E = Determinant of everything but the first column and row of I-M

Considering Architectural Factors

Considering Architectural Factors More efficient architecture More reliable architecture

Finding Optimal Configuration Reliability is the goal In practice, other factors may influence calculation U q = Utility function Can take on any format

Finding Optimal Configuration Configurations have constraints Must be assigned to at least one process Can have a bounded number of replicas Cannot share a process and have a replica Components and replicas should be on separate processes

Experimental Results Robot example from earlier Context - probability of hitting an obstacle Bump probability (BP) Controller failure is examined with respect to different BP This is because the transition from one state to another can fail with a certain probability

Experimental Results Observed and predicted reliability Shows accuracy of predictive model Reliability degrades with context Increased BP = lower reliability

Experimental Results Real robotic results RESIST sees a increase in BP This is predicated to result in a drop in reliability Before this degradation in reliability, RESIST adapts the system to maintain its reliability above 97%. As a result, the Navigator is replicated and the Controller is redeployed to a separate process.

Conclusion Overall, the paper covers a lot of ground Offers an interesting, predictive approach Questions What other machine learning techniques can be used to aid prediction? Does the system s accuracy improve with more data / examples?

References 1. Deshan Cooray, Sam Malek, Roshanak Roshandel, David Kilgore, RESISTing reliability degradation through proactive reconfiguration, Proceedings of the IEEE/ACM international conference on Automated software engineering, September 20-24, 2010, Antwerp, Belgium http://en.wikipedia.org/wiki/baum- Welch_algorithm