Connecting the Dots: Using Runtime Paths for Macro Analysis

Size: px
Start display at page:

Download "Connecting the Dots: Using Runtime Paths for Macro Analysis"

Transcription

1 Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen

2 Motivation Divide and conquer, layering, and replication are fundamental design principles e.g. Internet systems, P2P systems, and sensor networks Execution context is dispersed throughout the system => difficult to monitor and debug Lots of existing low-level tools that help with debugging individual components, but not a collection of them Much of the system is in how the components are put together Observation: a widening gap between the systems we are building and the tools we have

3 Current Approach Apache Sensor Peer Apache Sensor Peer Java Sensor Peer Bean Java Sensor Peer Bean Java Sensor Peer Bean Database Sensor Peer Database Sensor Peer

4 Current Approach Micro analysis tools, like code-level debuggers (e.g. gdb) and application logs, offers details of each individual component Scenario A user reports request A1 failed You try the same request, A2, but it works fine What to do next? gdb Apache A1 X Java = 1 X Java = 3 Y Bean = 2 Y Bean = 1 Database A2 Apache X Java = 3 Y Bean = 2 Database

5 Macro Analysis Macro analysis exploits non-local context to improve reliability and performance Performance examples: Scout, ILP, Magpie Statistical view is essential for large, complex systems Analogy: micro analysis allows you to understand the details of individual honeybee; macro analysis is needed to understand how the bees interact to keep the beehive functioning

6 Observation Systems have a single system-wide execution paths associated with each request E.g. request/response, one-way messages Scout, SEDA, Ninja use paths to specify how to service requests Our philosophy Use only dynamic, observed behavior Application-independent techniques

7 Our Approach Use runtime paths to connect the dots! dynamically captures the interactions and dependency between components look across many requests to get the overall system behavior more robust to noise Components are only partially known ( gray boxes ) paths Java Bean Apache Database Java Bean Apache Database Java Bean path

8 Our Approach Applicable to a wide range of systems. Apache Sensor Peer A Apache Sensor Peer B paths Java Sensor Peer C Bean Java Sensor Peer D Bean Java Sensor Peer E Bean Java Database Sensor Peer F Bean Java Database Sensor Peer G Bean

9 Open Challenges in Systems Today 1. Deducing system structure manual approach is error-prone static analysis doesn t consider resources 2. Detecting application-level failures often don t exhibit lower-level symptoms 3. Diagnosing failures failures may manifest far from the actual faults multi-component faults Goal: reduce time to detection, recovery, diagnosis, and repair

10 Talk Outline Motivation Model and architecture Applying macro analysis Future directions

11 Runtime Paths Instrument code to dynamically trace requests through a system at the component level record call path + the runtime properties e.g. components, latency, success/failure, and resources used to service each request Use statistical analysis detect and diagnose problems e.g. data mining, machine learning, etc. Runtime analysis tells you how the system is actually being used, not how it may be used Complements existing micro analysis tools

12 Architecture Tracer Tags each request with a unique ID, and carries it with the request throughout the system Report observations (component name + resource + performance properties) for each component Aggregator + Repository Reconstructs paths and stores them Declarative Query Engine Supports statistical queries on paths Data mining and machine learning routines Visualization request Tracer A Tracer C Tracer E Tracer B Tracer D Tracer F Reusable Analysis Framework Aggregator Developers/ Operators Visualization Query Engine Path Repository

13 Request Tracing Challenge: maintaining an ID with each request throughout the system Tracing is platform-specific but can be applicationgeneric and reusable across applications 2 classes of techniques Intra-thread tracing Use per-thread context to store request ID (e.g. ThreadLocal in Java) ID is preserved if the same thread is used to service the request Inter-thread tracing For extensible protocols like HTTP, inject new headers that will be preserved (e.g. REQ_ID: xx) Modify RPC to pass request ID under the cover Piggyback onto messages

14 Talk Outline Motivation Model and architecture Applying macro analysis Inferring system structure Detection application-level failures Diagnosing failures Future directions

15 Inferring System Structure Key idea: paths directly capture application structure 2 requests

16 Indirect Coupling of Requests Key idea: paths associate requests with internal state Trace requests from web server to database Parse client-side SQL queries to get sharing of db tables Straightforward to extend to more fine-grained state (e.g. rows) Request types Database tables

17 Failure Detection and Diagnose Detecting application-level failures Key idea: paths change under failures => detect failures via path changes. Diagnosing failures Key idea: bad paths touch root cause(s). Find common features.

18 Future Directions Key idea: violation of macro invariants are signs of buggy implementation or intrusion Message paths in P2P and sensor networks a general mechanism to provide visibility into the collective behavior of multiple nodes micro or static approaches by themselves don t work well in dynamic, distributed settings e.g. algorithms have upper bounds on the # of hops Although hop count violation can be detected locally, paths help identify nodes that route messages incorrectly e.g. detecting nodes that are slow or corrupt msgs

19 Conclusion Macro analysis fills the need when monitoring and debugging systems where local context is of insufficient use Runtime path-based approach dynamically traces request paths and statistically infer macro properties A shared analysis framework that is reusable across many systems Simplifies the construction of effective tools for other systems and the integration with recovery techniques like RR Paper includes a commercial example from Tellme! (thanks to Anthony Accardi and Mark Verber)

20 Backup Slides

21 Backup Slides

22 Current Approach Micro analysis tools, like code-level debuggers (e.g. gdb) and application logs, offers details of each individual component gdb Java Apache X = 1 Y Bean = 2 Java Apache X = 2 Y Bean = 4 X Java = 1 Y Bean = 2 X Java = 5 Y Bean = 2 X Java = 3 Y Bean = 2 Java Database X = 2 Y Bean = 3 Java Database X = 7 Y Bean = 1

23 Related Work Commercial request tracing systems Announced in 2002, a few months after Pinpoint was developed PerformaSure and AppAssure focus on performance problems. IntegriTea captures and playback failure conditions. Focus on individual requests rather than overall behavior, and on recreating the failure condition. Extensive work in event/alarm correlation, mostly in the context of network management (i.e. IP) Don t directly capture relationship between events Rely on human knowledge or use machine learning to suppress alarms. Distributed debuggers PDT, P2D2, TotalView, PRISM, pdbx Aggregates views from multiple components, but do not capture relationship and interaction between components Comparative debuggers: Wizard, GUARD Dependency models Most are statically generated and are likely to be inconsistent. Brown et al. takes an active, black box approach but is invasive. Candea et al. dynamically trace failures propagation.

24 1. Detecting Failures using Anomaly Detection Key idea: paths change under failures => detect failures via path changes Anomalies Unusual paths Changes in distribution Changes in latency/response time Examples: Error paths are shorter. User behavior changes under failures Retries a few times then give up Implement as long running queries (i.e. diff) Challenges: detecting application-level failures comparing sets of paths

25 2. Root-cause Analysis Key idea: all bad paths touch root cause, find common features Challenge: a small set of known bad paths and a large set of maybes Ideally want to correlate and rank all combinations of feature sets E.g. association rules mining May get false alarms because the root cause may not be one of the features Automatic generation of dynamic functional and state dependency graphs Helps developers and operators understand inter-component dependency and inter-request dependency Input to recovery algorithms that use dependency graphs

26 3. Verifying Macro Invariants Key idea: violations of high-level invariants are signs of intrusion or bugs Example: Peer auditing Problem: A small number of faulty or malicious nodes can bring down the system Corruption should be statistically visible in your behavior look for nodes that delay or corrupt messages or route messages incorrectly Apply root-cause analysis to locate the misbehaving peers Some distributed auditing is necessary Example: P2P implementation verification Problem: are messages delivered as specified by the algorithms? Detect extra hops, loops, and verify that the paths are correct Can implement as a query: select length from paths where (length > log2(n))

27 4. Detecting Single Point of Failure Key idea: paths converge on a single-point of failure Useful for finding out what to replicate to improve availability P2P example: Many P2P systems rely on overlay networks, which typically are networks built on top of the IP infrastructure. It s common for several overlay links to fail together if they depend on a shared physical IP link that failed Implement as a query: intersect edge.ip_links from paths Sensor Peer A Sensor Peer C Sensor Peer D Sensor Peer F Sensor Peer B Sensor Peer E Sensor Peer G

28 5. Monitoring of Sensor Networks An emerging area with primitive tools Key idea: use paths to reconstruct topology and membership Example: Membership select unique node from paths Network topology for directed information dissemination Challenge: limited bandwidth Can record a (random) subset of the nodes for each path, then statistically reconstruct the paths

29 Macro Analysis Look across many requests to get the overall system behavior more robust to noise Macro Analysis Request 1 Request 2 Request 3 Request 4 Component A X X X Component B X Component C X X

30 Properties of Network Systems Web services, P2P systems, and sensor networks can have tens of thousands of nodes each running many application components Continuous adaptation provides high availability, but also makes it difficult to reproduce and debug errors Constant evolution of software and hardware

31 Motivation Difficult to understand and debug network systems e.g. Clustered Internet systems, P2P systems and sensor networks Composed of many components Systems are becoming larger, more dynamic, and more distributed Workload is unpredictable and impractical to simulate Unit testing is necessary but insufficient. Components break when used together under real workload Don t have tools that capture the interactions between components and the overall behavior Existing debugging tools and application-level logs only do micro analysis

32 Macro vs Micro Analysis Resolution Overhead Macro Analysis Component. Complements micro analysis tools. Low. Can use it in actual deployment. Micro Analysis Line or variable High. Typically not used in deployment other than application logs.

33 What s a dynamic path? A dynamic path is the (control flow + runtime properties) of a request Think of it as a stack trace across process/machine boundaries with runtime properties Dynamically constructed by tracing requests through a system Runtime properties Resources (e.g. host, version) Performance properties (e.g. latency) Arguments (e.g. URL, args, SQL statement) Success/failure request A C E B D F Path A D E RequestID: 1 Seq Num: 1 Name: A Host: xx Latency: 10ms Success: true..

34 Related Work Micro debugging tools RootCause provides extensible logging of method calls and arguments. Diduce look for inconsistencies in variable usage. Complements macro analysis tools. Languages for monitoring InfoSpect looks for inconsistencies in system state using a logic language Network flow-based monitoring RTFM and Cisco NetFlow classify and record network flows Statistical and data mining languages S, DMQL, WebML

35 Visualization Techniques Tainted paths: mark all flows that have a certain property (e.g. failed or slow) with a distinct color and overlay it on the graph Detecting performance bottlenecks: look for replicated nodes that have different colors Detecting anomaly: look for missing edges and unknown paths

36 Pinpoint Framework Requests #1 #2 #3 External F/D Components A B C Communications Layer (Tracing & Internal F/D) Logs 1, success 2, fail 3,... 1,A 1,C 2,B.. Statistical Analysis Detected Faults

37 Experimental Setup Demo app: J2EE Pet Store e-commerce site w/~30 components Load generator replay trace of browsing Approx. TPCW WIPSo load (~50% ordering) Fault injection parameters Trigger faults based on combinations of used components Inject exceptions, infinite loops, null calls 55 tests with single-components faults and interaction faults 5-min runs of a single client (J2EE server limitation)

38 Application Observations # of components used in a dynamic web page request: median 14, min 6, max 23 large number of tightly coupled components that are always used together # of tightly coupled components cart productitemlist InventoryEJB SignOnEJB ClientControllerEJB ShoppingCartEJB 100% 80% 60% 40% 20% 0% # of components Cumulative % of total requests

39 Metrics Predicted Faults (P) Correctly Identified Faults (C) Actual Faults (A) Precision: C/P Recall: C/A Accuracy: whether all actual faults are correctly identified (recall == 100%) boolean measure

40 4 Analysis Techniques Pinpoint: clusters of components that statistically correlate with failures Detection: components where Java exceptions were detected union across all failed requests similar to what an event monitoring system outputs Intersection: intersection of components used in failed requests Union: union of all components used in failed requests

41 Results 100% 80% 60% 40% 20% average accuracy average precision 76% 77% 50% 33% 83% 15% 100% 11% Average Accuracy 100% 80% 60% 40% 20% Pinpoint Detection Intersection Union 0% Pinpoint Detection Intersection Union 0% # of Interacting Components Pinpoint has high accuracy with relatively high precision

42 Pinpoint Prototype Limitations Assumptions client requests provide good coverage over components and combinations requests are autonomous (don t corrupt state and cause later requests to fail) Currently can t detect the following: faults that only degrade performance faults due to pathological inputs Single-node only

43 Current Status Simple graph visualization

44 Proposed Research 3 classes of large network systems Clustered Internet systems Tiered architecture, high bandwidth, many replicas Peer-to-peer (P2P) systems, including sensor networks Widely distributed nodes, dynamic membership Sensor networks Limited storage, processing, and bandwidth.

45 P2P Systems: Tracing Trace messages by piggybacking the current node name to the messages Tracing overhead Assume 32-bit per node name and a very conservative log2(n) hops for each msg and Data overhead is 40% for a 1500-byte message in a node system

46 P2P Systems: Implementation Verification Current debugging techniques: lots of printf() s on each node and manually correlate the paths taken by messages How do you know the messages are delivered as specified by the algorithms? Use message paths to check for routing invariants detect extra hops, loops, and verify that the paths are correct Can implement as a query: select length from paths where (length > log2(n))

Path-based Macroanalysis for Large Distributed Systems

Path-based Macroanalysis for Large Distributed Systems Path-based Macroanalysis for Large Distributed Systems 1, Anthony Accardi 2, Emre Kiciman 3 Dave Patterson 1, Armando Fox 3, Eric Brewer 1 mikechen@cs.berkeley.edu UC Berkeley 1, Tellme Networks 2, Stanford

More information

Using Runtime Paths for Macro Analysis

Using Runtime Paths for Macro Analysis Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer {mikechen, brewer}@cs.berkeley.edu, {emrek, fox}@cs.stanford.edu, anthony@tellme.com Abstract We

More information

Using Runtime Paths for Macroanalysis

Using Runtime Paths for Macroanalysis Using Runtime Paths for Macroanalysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer UC Berkeley, Stanford University, Tellme Networks {mikechen, brewer}@cs.berkeley.edu, {emrek, fox}@cs.stanford.edu,

More information

Profiling and diagnosing large-scale decentralized systems David Oppenheimer

Profiling and diagnosing large-scale decentralized systems David Oppenheimer Profiling and diagnosing large-scale decentralized systems David Oppenheimer ROC Retreat Thursday, June 5, 2003 1 Why focus on P2P systems? There are a few real ones file trading, backup, IM Look a lot

More information

Distributed Diagnosis of Failures in a Three Tier E-Commerce System. Motivation

Distributed Diagnosis of Failures in a Three Tier E-Commerce System. Motivation Distributed Diagnosis of Failures in a Three Tier E-ommerce System Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad, and Saurabh agchi Dependable omputing Systems Lab (DSL) School of Electrical and omputer

More information

On the Way to ROC-2 (JAGR: JBoss + App-Generic Recovery)

On the Way to ROC-2 (JAGR: JBoss + App-Generic Recovery) On the Way to ROC-2 (JAGR: JBoss + App-Generic Recovery) George Candea and many others Stanford + Berkeley Outline J2EE and JBoss JBoss Additions JAGR µreboots Automatic Failure-Path Inference Self-Recovery

More information

An Introduction to Software Architecture. David Garlan & Mary Shaw 94

An Introduction to Software Architecture. David Garlan & Mary Shaw 94 An Introduction to Software Architecture David Garlan & Mary Shaw 94 Motivation Motivation An increase in (system) size and complexity structural issues communication (type, protocol) synchronization data

More information

Decentralized systems need decentralized benchmarks

Decentralized systems need decentralized benchmarks Decentralized systems need decentralized benchmarks David Oppenheimer, David A. Patterson, and Joseph M. Hellerstein University of California at Berkeley, EECS Computer Science Division {davidopp,patterson,jmh}@cs.berkeley.edu

More information

Performance Debugging for Distributed Systems of Black Boxes. Yinghua Wu, Haiyong Xie

Performance Debugging for Distributed Systems of Black Boxes. Yinghua Wu, Haiyong Xie Performance Debugging for Distributed Systems of Black Boxes Yinghua Wu, Haiyong Xie Outline Problem Problem statement & goals Overview of our approach Algorithms The nesting algorithm (RPC) The convolution

More information

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES

SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell

More information

Proceedings of the First Symposium on Networked Systems Design and Implementation

Proceedings of the First Symposium on Networked Systems Design and Implementation USENIX Association Proceedings of the First Symposium on Networked Systems Design and Implementation San Francisco, CA, USA March 29 31, 2004 2004 by The USENIX Association All Rights Reserved For more

More information

How To Keep Your Head Above Water While Detecting Errors

How To Keep Your Head Above Water While Detecting Errors USENIX / ACM / IFIP 10th International Middleware Conference How To Keep Your Head Above Water While Detecting Errors Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi Dependable Computing

More information

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial

GDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial A Walkthrough with Examples CMSC 212 - Spring 2009 Last modified March 22, 2009 What is gdb? GNU Debugger A debugger for several languages, including C and C++ It allows you to inspect what the program

More information

SaaS Providers. ThousandEyes for. Summary

SaaS Providers. ThousandEyes for. Summary USE CASE ThousandEyes for SaaS Providers Summary With Software-as-a-Service (SaaS) applications rapidly replacing onpremise solutions, the onus of ensuring a great user experience for these applications

More information

Path-Based Failure and Evolution Management

Path-Based Failure and Evolution Management Path-Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kıcıman, Jim Lloyd, Dave Patterson, Armando Fox, Eric Brewer UC Berkeley, Tellme Networks, Stanford University, ebay Inc.

More information

ClearSpeed Visual Profiler

ClearSpeed Visual Profiler ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are

More information

Oracle Database 11g: Real Application Testing & Manageability Overview

Oracle Database 11g: Real Application Testing & Manageability Overview Oracle Database 11g: Real Application Testing & Manageability Overview Top 3 DBA Activities Performance Management Challenge: Sustain Optimal Performance Change Management Challenge: Preserve Order amid

More information

ThousandEyes for. Application Delivery White Paper

ThousandEyes for. Application Delivery White Paper ThousandEyes for Application Delivery White Paper White Paper Summary The rise of mobile applications, the shift from on-premises to Software-as-a-Service (SaaS), and the reliance on third-party services

More information

Routing Outline. EECS 122, Lecture 15

Routing Outline. EECS 122, Lecture 15 Fall & Walrand Lecture 5 Outline EECS, Lecture 5 Kevin Fall kfall@cs.berkeley.edu Jean Walrand wlr@eecs.berkeley.edu Definition/Key Questions Distance Vector Link State Comparison Variations EECS - Fall

More information

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda

1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:

More information

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla

The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -

More information

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.

IQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song.   HUAWEI TECHNOLOGIES Co., Ltd. IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring

More information

Self Checking Network Protocols: A Monitor Based Approach

Self Checking Network Protocols: A Monitor Based Approach Self Checking Network Protocols: A Monitor Based Approach Gunjan Khanna, Padma Varadharajan, Saurabh Bagchi Dependable Computing Systems Lab School of Electrical and Computer Engineering Purdue University

More information

Method-Level Phase Behavior in Java Workloads

Method-Level Phase Behavior in Java Workloads Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS

More information

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18

Failure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18 Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick

More information

Steps for project success. git status. Milestones. Deliverables. Homework 1 submitted Homework 2 will be posted October 26.

Steps for project success. git status. Milestones. Deliverables. Homework 1 submitted Homework 2 will be posted October 26. git status Steps for project success Homework 1 submitted Homework 2 will be posted October 26 due November 16, 9AM Projects underway project status check-in meetings November 9 System-building project

More information

Non Intrusive Detection & Diagnosis of Failures in High Throughput Distributed Applications. DCSL Research Thrusts

Non Intrusive Detection & Diagnosis of Failures in High Throughput Distributed Applications. DCSL Research Thrusts Non Intrusive Detection & Diagnosis of Failures in High Throughput Distributed Applications Saurabh Bagchi Dependable Computing Systems Lab (DCSL) & The Center for Education and Research in Information

More information

Copyright 2018, Oracle and/or its affiliates. All rights reserved.

Copyright 2018, Oracle and/or its affiliates. All rights reserved. Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle

More information

Peer-to-peer systems and overlay networks

Peer-to-peer systems and overlay networks Complex Adaptive Systems C.d.L. Informatica Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell Informazione 1 Outline Introduction to P2P systems

More information

Magpie: Distributed request tracking for realistic performance modelling

Magpie: Distributed request tracking for realistic performance modelling Magpie: Distributed request tracking for realistic performance modelling Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge

More information

Lecture 13: Traffic Engineering

Lecture 13: Traffic Engineering Lecture 13: Traffic Engineering CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mike Freedman, Nick Feamster Lecture 13 Overview Evolution of routing in the ARPAnet Today s TE: Adjusting

More information

October 25, Berkeley/Stanford Recovery-oriented Computing Course Lecture

October 25, Berkeley/Stanford Recovery-oriented Computing Course Lecture 0 0 20 30 Berkeley/Stanford Recovery-oriented When bad things happen to good systems: detecting and diagnosing problems -2-0 2 3 Kimberly Keeton HPL Storage and Content Distribution Problem definition

More information

Building an Internet-Scale Publish/Subscribe System

Building an Internet-Scale Publish/Subscribe System Building an Internet-Scale Publish/Subscribe System Ian Rose Mema Roussopoulos Peter Pietzuch Rohan Murty Matt Welsh Jonathan Ledlie Imperial College London Peter R. Pietzuch prp@doc.ic.ac.uk Harvard University

More information

Practical Techniques for Regeneration and Immunization of COTS Applications

Practical Techniques for Regeneration and Immunization of COTS Applications Practical Techniques for Regeneration and Immunization of COTS Applications Lixin Li Mark R.Cornwell E.Hultman James E. Just R. Sekar Stony Brook University Global InfoTek, Inc (Research supported by DARPA,

More information

AWS Lambda: Event-driven Code in the Cloud

AWS Lambda: Event-driven Code in the Cloud AWS Lambda: Event-driven Code in the Cloud Dean Bryen, Solutions Architect AWS Andrew Wheat, Senior Software Engineer - BBC April 15, 2015 London, UK 2015, Amazon Web Services, Inc. or its affiliates.

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is

More information

Interdomain Routing Design for MobilityFirst

Interdomain Routing Design for MobilityFirst Interdomain Routing Design for MobilityFirst October 6, 2011 Z. Morley Mao, University of Michigan In collaboration with Mike Reiter s group 1 Interdomain routing design requirements Mobility support Network

More information

VLSI Test Technology and Reliability (ET4076)

VLSI Test Technology and Reliability (ET4076) VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What

More information

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Definition of a Distributed System (1) A distributed system is: A collection of

More information

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era

Introduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era Massimiliano Sbaraglia Network Engineer Introduction In the cloud computing era, distributed architecture is used to handle operations of mass data, such as the storage, mining, querying, and searching

More information

Designing and debugging real-time distributed systems

Designing and debugging real-time distributed systems Designing and debugging real-time distributed systems By Geoff Revill, RTI This article identifies the issues of real-time distributed system development and discusses how development platforms and tools

More information

J2EE DIAGNOSING J2EE PERFORMANCE PROBLEMS THROUGHOUT THE APPLICATION LIFECYCLE

J2EE DIAGNOSING J2EE PERFORMANCE PROBLEMS THROUGHOUT THE APPLICATION LIFECYCLE DIAGNOSING J2EE PERFORMANCE PROBLEMS THROUGHOUT THE APPLICATION LIFECYCLE ABSTRACT Many large-scale, complex enterprise applications are now built and deployed using the J2EE architecture. However, many

More information

Be Conservative: Enhancing Failure Diagnosis with Proactive Logging

Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Ding Yuan, Soyeon Park, Peng Huang, Yang Liu, Michael Lee, Xiaoming Tang, Yuanyuan Zhou, Stefan Savage University of California, San

More information

CS 520 Theory and Practice of Software Engineering Fall 2018

CS 520 Theory and Practice of Software Engineering Fall 2018 CS 520 Theory and Practice of Software Engineering Fall 2018 Nediyana Daskalova Monday, 4PM CS 151 Debugging October 30, 2018 Personalized Behavior-Powered Systems for Guiding Self-Experiments Help me

More information

Dependability tree 1

Dependability tree 1 Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques

More information

Distributed Computation Models

Distributed Computation Models Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case

More information

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818

Security for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818 Security for Structured Peer-to-peer Overlay Networks By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818 1 Acknowledgement Some of the following slides are borrowed from talks by Yun Mao

More information

Software Quality Assurance. David Janzen

Software Quality Assurance. David Janzen Software Quality Assurance David Janzen What is quality? Crosby: Conformance to requirements Issues: who establishes requirements? implicit requirements Juran: Fitness for intended use Issues: Who defines

More information

Internet Traffic Classification using Machine Learning

Internet Traffic Classification using Machine Learning Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville

More information

02 - Distributed Systems

02 - Distributed Systems 02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is

More information

Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE

Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE #20655 @ecbanks Senior Network Architect, Carenection Co-founder, Packet Pushers Interactive http://ethancbanks.com http://packetpushers.net

More information

Basic Concepts. Network Management. Spring Bahador Bakhshi CE & IT Department, Amirkabir University of Technology

Basic Concepts. Network Management. Spring Bahador Bakhshi CE & IT Department, Amirkabir University of Technology Basic Concepts Network Management Spring 2018 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology This presentation is based on the slides listed in references. Outline Introduction:

More information

Self-defending software: Automatically patching errors in deployed software

Self-defending software: Automatically patching errors in deployed software Self-defending software: Automatically patching errors in deployed software Michael Ernst University of Washington Joint work with: Saman Amarasinghe, Jonathan Bachrach, Michael Carbin, Sung Kim, Samuel

More information

Two-Tier Oracle Application

Two-Tier Oracle Application Two-Tier Oracle Application This tutorial shows how to use ACE to analyze application behavior and to determine the root causes of poor application performance. Overview Employees in a satellite location

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Today l Basic distributed file systems l Two classical examples Next time l Naming things xkdc Distributed File Systems " A DFS supports network-wide sharing of files and devices

More information

Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it

Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it Mark V. Scardina - Director Oracle QoS Management & Oracle Autonomous Health Framework Agenda 1 2 3 4 5 6 7 Introduction

More information

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE

ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE Most application performance problems surface during peak loads. Often times, these problems are time and resource intensive,

More information

Lecture 8: Networks to Internetworks

Lecture 8: Networks to Internetworks Lecture 8: Networks to Internetworks CSE 123: Computer Networks Alex C. Snoeren NO CLASS FRIDAY Lecture 8 Overview Bridging & switching Learning bridges Spanning Tree Internetworking Routering Internet

More information

Debugging Applications in Pervasive Computing

Debugging Applications in Pervasive Computing Debugging Applications in Pervasive Computing Larry May 1, 2006 SMA 5508; MIT 6.883 1 Outline Video of Speech Controlled Animation Survey of approaches to debugging Turning bugs into features Speech recognition

More information

Lecture 10: Internetworking"

Lecture 10: Internetworking Lecture 10: Internetworking" CSE 123: Computer Networks Alex C. Snoeren HW 2 due NOW! Lecture 10 Overview" Spanning Tree Internet Protocol Service model Packet format 2 Spanning Tree Algorithm" Each bridge

More information

Profile-Guided Program Simplification for Effective Testing and Analysis

Profile-Guided Program Simplification for Effective Testing and Analysis Profile-Guided Program Simplification for Effective Testing and Analysis Lingxiao Jiang Zhendong Su Program Execution Profiles A profile is a set of information about an execution, either succeeded or

More information

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today

Peer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become

More information

Auto Management for Apache Kafka and Distributed Stateful System in General

Auto Management for Apache Kafka and Distributed Stateful System in General Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies

More information

AMP-Based Flow Collection. Greg Virgin - RedJack

AMP-Based Flow Collection. Greg Virgin - RedJack AMP-Based Flow Collection Greg Virgin - RedJack AMP- Based Flow Collection AMP - Analytic Metadata Producer : Patented US Government flow / metadata producer AMP generates data including Flows Host metadata

More information

Profiling Distributed File Systems with Computer Animation

Profiling Distributed File Systems with Computer Animation Profiling Distributed File Systems with Computer Animation Andrew Leung, Eric Lalonde, Jake Telleen, Crystal Lee {aleung, elalonde, jtelleen, cryslee}@soe.ucsc.edu CMPS262 Presentation Draft File Systems

More information

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the

More information

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system

Chapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline

More information

Enabling Performance & Stress Test throughout the Application Lifecycle

Enabling Performance & Stress Test throughout the Application Lifecycle Enabling Performance & Stress Test throughout the Application Lifecycle March 2010 Poor application performance costs companies millions of dollars and their reputation every year. The simple challenge

More information

Generating String Attack Inputs Using Constrained Symbolic Execution. presented by Kinga Dobolyi

Generating String Attack Inputs Using Constrained Symbolic Execution. presented by Kinga Dobolyi Generating String Attack Inputs Using Constrained Symbolic Execution presented by Kinga Dobolyi What is a String Attack? Web applications are 3 tiered Vulnerabilities in the application layer Buffer overruns,

More information

Introduction to Peer-to-Peer Systems

Introduction to Peer-to-Peer Systems Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed

More information

Quickly Pinpoint and Resolve Problems in Windows /.NET Applications TECHNICAL WHITE PAPER

Quickly Pinpoint and Resolve Problems in Windows /.NET Applications TECHNICAL WHITE PAPER Quickly Pinpoint and Resolve Problems in Windows /.NET Applications TECHNICAL WHITE PAPER Table of Contents Executive Overview...1 Problem Resolution A Major Time Consumer...2 > Inefficiencies of the Problem

More information

Intrusion Recovery for Database-backed Web Applications

Intrusion Recovery for Database-backed Web Applications Intrusion Recovery for Database-backed Web Applications Ramesh Chandra, Taesoo Kim, Meelap Shah, Neha Narula, Nickolai Zeldovich MIT CSAIL Web applications routinely compromised Web applications routinely

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

An Introduction to the Waratek Application Security Platform

An Introduction to the Waratek Application Security Platform Product Analysis January 2017 An Introduction to the Waratek Application Security Platform The Transformational Application Security Technology that Improves Protection and Operations Highly accurate.

More information

Homework 1: Programming Component Routing Packets Within a Structured Peer-to-Peer (P2P) Network Overlay VERSION 1.1

Homework 1: Programming Component Routing Packets Within a Structured Peer-to-Peer (P2P) Network Overlay VERSION 1.1 Homework 1: Programming Component Routing Packets Within a Structured Peer-to-Peer (P2P) Network Overlay VERSION 1.1 DUE DATE: Wednesday February 14 th, 2018 @ 5:00 pm Objectives & Overview The objective

More information

Diagnostics in Testing and Performance Engineering

Diagnostics in Testing and Performance Engineering Diagnostics in Testing and Performance Engineering This document talks about importance of diagnostics in application testing and performance engineering space. Here are some of the diagnostics best practices

More information

Barry D. Lamkin Executive IT Specialist Capitalware's MQ Technical Conference v

Barry D. Lamkin Executive IT Specialist Capitalware's MQ Technical Conference v What happened to my Transaction? Barry D. Lamkin Executive IT Specialist blamkin@us.ibm.com Transaction Tracking - APM Transaction Tracking is a major part of Application Performance Monitoring To ensure

More information

WHAT IS SOFTWARE ARCHITECTURE?

WHAT IS SOFTWARE ARCHITECTURE? WHAT IS SOFTWARE ARCHITECTURE? Chapter Outline What Software Architecture Is and What It Isn t Architectural Structures and Views Architectural Patterns What Makes a Good Architecture? Summary 1 What is

More information

Providing Real-Time and Fault Tolerance for CORBA Applications

Providing Real-Time and Fault Tolerance for CORBA Applications Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing

More information

Statistical Debugging for Real-World Performance Problems. Linhai Song Advisor: Prof. Shan Lu

Statistical Debugging for Real-World Performance Problems. Linhai Song Advisor: Prof. Shan Lu Statistical Debugging for Real-World Performance Problems Linhai Song Advisor: Prof. Shan Lu 1 Software Efficiency is Critical No one wants slow and inefficient software Frustrate end users Cause economic

More information

Cooperative Bug Isolation

Cooperative Bug Isolation Cooperative Bug Isolation Alex Aiken Mayur Naik Stanford University Alice Zheng Michael Jordan UC Berkeley Ben Liblit University of Wisconsin Build and Monitor Alex Aiken, Cooperative Bug Isolation 2 The

More information

Securing BGP. Geoff Huston November 2007

Securing BGP. Geoff Huston November 2007 Securing BGP Geoff Huston November 2007 Agenda An Introduction to BGP BGP Security Questions Current Work Research Questions An Introduction to BGP Background to Internet Routing The routing architecture

More information

Regression testing. Whenever you find a bug. Why is this a good idea?

Regression testing. Whenever you find a bug. Why is this a good idea? Regression testing Whenever you find a bug Reproduce it (before you fix it!) Store input that elicited that bug Store correct output Put into test suite Then, fix it and verify the fix Why is this a good

More information

Introduction and Statement of the Problem

Introduction and Statement of the Problem Chapter 1 Introduction and Statement of the Problem 1.1 Introduction Unlike conventional cellular wireless mobile networks that rely on centralized infrastructure to support mobility. An Adhoc network

More information

Adaptive Cluster Computing using JavaSpaces

Adaptive Cluster Computing using JavaSpaces Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of

More information

CS 457 Networking and the Internet. Shortest-Path Problem. Dijkstra s Shortest-Path Algorithm 9/29/16. Fall 2016

CS 457 Networking and the Internet. Shortest-Path Problem. Dijkstra s Shortest-Path Algorithm 9/29/16. Fall 2016 9/9/6 S 7 Networking and the Internet Fall 06 Shortest-Path Problem Given: network topology with link costs c(x,y): link cost from node x to node y Infinity if x and y are not direct neighbors ompute:

More information

The Measurement Manager Modular End-to-End Measurement Services

The Measurement Manager Modular End-to-End Measurement Services The Measurement Manager Modular End-to-End Measurement Services Ph.D. Research Proposal Department of Electrical and Computer Engineering University of Maryland, College Park, MD Pavlos Papageorgiou pavlos@eng.umd.edu

More information

Wireless Sensor Architecture GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO

Wireless Sensor Architecture GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO Wireless Sensor Architecture 1 GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO FORM A MEANINGFUL NETWORK Mobile ad hoc networks Nodes talking to each other Nodes talking to some

More information

GFS: The Google File System

GFS: The Google File System GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one

More information

A Serializability Violation Detector for Shared-Memory Server Programs

A Serializability Violation Detector for Shared-Memory Server Programs A Serializability Violation Detector for Shared-Memory Server Programs Min Xu Rastislav Bodík Mark Hill University of Wisconsin Madison University of California, Berkeley Serializability Violation Detector:

More information

End-to-end Management with Grid Control. John Abrahams Technology Sales Consultant Oracle Nederland B.V.

End-to-end Management with Grid Control. John Abrahams Technology Sales Consultant Oracle Nederland B.V. End-to-end Management with Grid Control John Abrahams Technology Sales Consultant Oracle Nederland B.V. Agenda End-to-end management with Grid Control Database Performance Management Challenges Complexity

More information

J2EE Instrumentation for software aging root cause application component determination with AspectJ

J2EE Instrumentation for software aging root cause application component determination with AspectJ J2EE Instrumentation for software aging root cause application component determination with AspectJ Javier Alonso Josep Ll. Berral Ricard Gavaldà Jordi Torres Technical University of Catalonia, Spain Contents

More information

Dynamic Analytics Extended to all layers Utilizing P4

Dynamic Analytics Extended to all layers Utilizing P4 Dynamic Analytics Extended to all layers Utilizing P4 Tom Tofigh, AT&T Nic VIljoen, Netronome This Talk is about Why P4 should be extended to other layers Interoperability - Utilizing common framework

More information

CSci 4061 Introduction to Operating Systems. Programs in C/Unix

CSci 4061 Introduction to Operating Systems. Programs in C/Unix CSci 4061 Introduction to Operating Systems Programs in C/Unix Today Basic C programming Follow on to recitation Structure of a C program A C program consists of a collection of C functions, structs, arrays,

More information

Service Mesh and Microservices Networking

Service Mesh and Microservices Networking Service Mesh and Microservices Networking WHITEPAPER Service mesh and microservice networking As organizations adopt cloud infrastructure, there is a concurrent change in application architectures towards

More information

Deployment Planning Guide

Deployment Planning Guide Deployment Planning Guide Community 1.5.1 release The purpose of this document is to educate the user about the different strategies that can be adopted to optimize the usage of Jumbune on Hadoop and also

More information

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2

Recap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2 Recap CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo Paxos is a consensus algorithm. Proposers? Acceptors? Learners? A proposer

More information

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT

Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT environment (e.g., Windows vs Linux) Levels of abstraction

More information

KEYNOTE FOR

KEYNOTE FOR THE OS SCHEDULER: A PERFORMANCE-CRITICAL COMPONENT IN LINUX CLUSTER ENVIRONMENTS KEYNOTE FOR BPOE-9 @ASPLOS2018 THE NINTH WORKSHOP ON BIG DATA BENCHMARKS, PERFORMANCE, OPTIMIZATION AND EMERGING HARDWARE

More information

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES

DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline System Architectural Design Issues Centralized Architectures Application

More information