Connecting the Dots: Using Runtime Paths for Macro Analysis
|
|
- Arron Henderson
- 5 years ago
- Views:
Transcription
1 Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen
2 Motivation Divide and conquer, layering, and replication are fundamental design principles e.g. Internet systems, P2P systems, and sensor networks Execution context is dispersed throughout the system => difficult to monitor and debug Lots of existing low-level tools that help with debugging individual components, but not a collection of them Much of the system is in how the components are put together Observation: a widening gap between the systems we are building and the tools we have
3 Current Approach Apache Sensor Peer Apache Sensor Peer Java Sensor Peer Bean Java Sensor Peer Bean Java Sensor Peer Bean Database Sensor Peer Database Sensor Peer
4 Current Approach Micro analysis tools, like code-level debuggers (e.g. gdb) and application logs, offers details of each individual component Scenario A user reports request A1 failed You try the same request, A2, but it works fine What to do next? gdb Apache A1 X Java = 1 X Java = 3 Y Bean = 2 Y Bean = 1 Database A2 Apache X Java = 3 Y Bean = 2 Database
5 Macro Analysis Macro analysis exploits non-local context to improve reliability and performance Performance examples: Scout, ILP, Magpie Statistical view is essential for large, complex systems Analogy: micro analysis allows you to understand the details of individual honeybee; macro analysis is needed to understand how the bees interact to keep the beehive functioning
6 Observation Systems have a single system-wide execution paths associated with each request E.g. request/response, one-way messages Scout, SEDA, Ninja use paths to specify how to service requests Our philosophy Use only dynamic, observed behavior Application-independent techniques
7 Our Approach Use runtime paths to connect the dots! dynamically captures the interactions and dependency between components look across many requests to get the overall system behavior more robust to noise Components are only partially known ( gray boxes ) paths Java Bean Apache Database Java Bean Apache Database Java Bean path
8 Our Approach Applicable to a wide range of systems. Apache Sensor Peer A Apache Sensor Peer B paths Java Sensor Peer C Bean Java Sensor Peer D Bean Java Sensor Peer E Bean Java Database Sensor Peer F Bean Java Database Sensor Peer G Bean
9 Open Challenges in Systems Today 1. Deducing system structure manual approach is error-prone static analysis doesn t consider resources 2. Detecting application-level failures often don t exhibit lower-level symptoms 3. Diagnosing failures failures may manifest far from the actual faults multi-component faults Goal: reduce time to detection, recovery, diagnosis, and repair
10 Talk Outline Motivation Model and architecture Applying macro analysis Future directions
11 Runtime Paths Instrument code to dynamically trace requests through a system at the component level record call path + the runtime properties e.g. components, latency, success/failure, and resources used to service each request Use statistical analysis detect and diagnose problems e.g. data mining, machine learning, etc. Runtime analysis tells you how the system is actually being used, not how it may be used Complements existing micro analysis tools
12 Architecture Tracer Tags each request with a unique ID, and carries it with the request throughout the system Report observations (component name + resource + performance properties) for each component Aggregator + Repository Reconstructs paths and stores them Declarative Query Engine Supports statistical queries on paths Data mining and machine learning routines Visualization request Tracer A Tracer C Tracer E Tracer B Tracer D Tracer F Reusable Analysis Framework Aggregator Developers/ Operators Visualization Query Engine Path Repository
13 Request Tracing Challenge: maintaining an ID with each request throughout the system Tracing is platform-specific but can be applicationgeneric and reusable across applications 2 classes of techniques Intra-thread tracing Use per-thread context to store request ID (e.g. ThreadLocal in Java) ID is preserved if the same thread is used to service the request Inter-thread tracing For extensible protocols like HTTP, inject new headers that will be preserved (e.g. REQ_ID: xx) Modify RPC to pass request ID under the cover Piggyback onto messages
14 Talk Outline Motivation Model and architecture Applying macro analysis Inferring system structure Detection application-level failures Diagnosing failures Future directions
15 Inferring System Structure Key idea: paths directly capture application structure 2 requests
16 Indirect Coupling of Requests Key idea: paths associate requests with internal state Trace requests from web server to database Parse client-side SQL queries to get sharing of db tables Straightforward to extend to more fine-grained state (e.g. rows) Request types Database tables
17 Failure Detection and Diagnose Detecting application-level failures Key idea: paths change under failures => detect failures via path changes. Diagnosing failures Key idea: bad paths touch root cause(s). Find common features.
18 Future Directions Key idea: violation of macro invariants are signs of buggy implementation or intrusion Message paths in P2P and sensor networks a general mechanism to provide visibility into the collective behavior of multiple nodes micro or static approaches by themselves don t work well in dynamic, distributed settings e.g. algorithms have upper bounds on the # of hops Although hop count violation can be detected locally, paths help identify nodes that route messages incorrectly e.g. detecting nodes that are slow or corrupt msgs
19 Conclusion Macro analysis fills the need when monitoring and debugging systems where local context is of insufficient use Runtime path-based approach dynamically traces request paths and statistically infer macro properties A shared analysis framework that is reusable across many systems Simplifies the construction of effective tools for other systems and the integration with recovery techniques like RR Paper includes a commercial example from Tellme! (thanks to Anthony Accardi and Mark Verber)
20 Backup Slides
21 Backup Slides
22 Current Approach Micro analysis tools, like code-level debuggers (e.g. gdb) and application logs, offers details of each individual component gdb Java Apache X = 1 Y Bean = 2 Java Apache X = 2 Y Bean = 4 X Java = 1 Y Bean = 2 X Java = 5 Y Bean = 2 X Java = 3 Y Bean = 2 Java Database X = 2 Y Bean = 3 Java Database X = 7 Y Bean = 1
23 Related Work Commercial request tracing systems Announced in 2002, a few months after Pinpoint was developed PerformaSure and AppAssure focus on performance problems. IntegriTea captures and playback failure conditions. Focus on individual requests rather than overall behavior, and on recreating the failure condition. Extensive work in event/alarm correlation, mostly in the context of network management (i.e. IP) Don t directly capture relationship between events Rely on human knowledge or use machine learning to suppress alarms. Distributed debuggers PDT, P2D2, TotalView, PRISM, pdbx Aggregates views from multiple components, but do not capture relationship and interaction between components Comparative debuggers: Wizard, GUARD Dependency models Most are statically generated and are likely to be inconsistent. Brown et al. takes an active, black box approach but is invasive. Candea et al. dynamically trace failures propagation.
24 1. Detecting Failures using Anomaly Detection Key idea: paths change under failures => detect failures via path changes Anomalies Unusual paths Changes in distribution Changes in latency/response time Examples: Error paths are shorter. User behavior changes under failures Retries a few times then give up Implement as long running queries (i.e. diff) Challenges: detecting application-level failures comparing sets of paths
25 2. Root-cause Analysis Key idea: all bad paths touch root cause, find common features Challenge: a small set of known bad paths and a large set of maybes Ideally want to correlate and rank all combinations of feature sets E.g. association rules mining May get false alarms because the root cause may not be one of the features Automatic generation of dynamic functional and state dependency graphs Helps developers and operators understand inter-component dependency and inter-request dependency Input to recovery algorithms that use dependency graphs
26 3. Verifying Macro Invariants Key idea: violations of high-level invariants are signs of intrusion or bugs Example: Peer auditing Problem: A small number of faulty or malicious nodes can bring down the system Corruption should be statistically visible in your behavior look for nodes that delay or corrupt messages or route messages incorrectly Apply root-cause analysis to locate the misbehaving peers Some distributed auditing is necessary Example: P2P implementation verification Problem: are messages delivered as specified by the algorithms? Detect extra hops, loops, and verify that the paths are correct Can implement as a query: select length from paths where (length > log2(n))
27 4. Detecting Single Point of Failure Key idea: paths converge on a single-point of failure Useful for finding out what to replicate to improve availability P2P example: Many P2P systems rely on overlay networks, which typically are networks built on top of the IP infrastructure. It s common for several overlay links to fail together if they depend on a shared physical IP link that failed Implement as a query: intersect edge.ip_links from paths Sensor Peer A Sensor Peer C Sensor Peer D Sensor Peer F Sensor Peer B Sensor Peer E Sensor Peer G
28 5. Monitoring of Sensor Networks An emerging area with primitive tools Key idea: use paths to reconstruct topology and membership Example: Membership select unique node from paths Network topology for directed information dissemination Challenge: limited bandwidth Can record a (random) subset of the nodes for each path, then statistically reconstruct the paths
29 Macro Analysis Look across many requests to get the overall system behavior more robust to noise Macro Analysis Request 1 Request 2 Request 3 Request 4 Component A X X X Component B X Component C X X
30 Properties of Network Systems Web services, P2P systems, and sensor networks can have tens of thousands of nodes each running many application components Continuous adaptation provides high availability, but also makes it difficult to reproduce and debug errors Constant evolution of software and hardware
31 Motivation Difficult to understand and debug network systems e.g. Clustered Internet systems, P2P systems and sensor networks Composed of many components Systems are becoming larger, more dynamic, and more distributed Workload is unpredictable and impractical to simulate Unit testing is necessary but insufficient. Components break when used together under real workload Don t have tools that capture the interactions between components and the overall behavior Existing debugging tools and application-level logs only do micro analysis
32 Macro vs Micro Analysis Resolution Overhead Macro Analysis Component. Complements micro analysis tools. Low. Can use it in actual deployment. Micro Analysis Line or variable High. Typically not used in deployment other than application logs.
33 What s a dynamic path? A dynamic path is the (control flow + runtime properties) of a request Think of it as a stack trace across process/machine boundaries with runtime properties Dynamically constructed by tracing requests through a system Runtime properties Resources (e.g. host, version) Performance properties (e.g. latency) Arguments (e.g. URL, args, SQL statement) Success/failure request A C E B D F Path A D E RequestID: 1 Seq Num: 1 Name: A Host: xx Latency: 10ms Success: true..
34 Related Work Micro debugging tools RootCause provides extensible logging of method calls and arguments. Diduce look for inconsistencies in variable usage. Complements macro analysis tools. Languages for monitoring InfoSpect looks for inconsistencies in system state using a logic language Network flow-based monitoring RTFM and Cisco NetFlow classify and record network flows Statistical and data mining languages S, DMQL, WebML
35 Visualization Techniques Tainted paths: mark all flows that have a certain property (e.g. failed or slow) with a distinct color and overlay it on the graph Detecting performance bottlenecks: look for replicated nodes that have different colors Detecting anomaly: look for missing edges and unknown paths
36 Pinpoint Framework Requests #1 #2 #3 External F/D Components A B C Communications Layer (Tracing & Internal F/D) Logs 1, success 2, fail 3,... 1,A 1,C 2,B.. Statistical Analysis Detected Faults
37 Experimental Setup Demo app: J2EE Pet Store e-commerce site w/~30 components Load generator replay trace of browsing Approx. TPCW WIPSo load (~50% ordering) Fault injection parameters Trigger faults based on combinations of used components Inject exceptions, infinite loops, null calls 55 tests with single-components faults and interaction faults 5-min runs of a single client (J2EE server limitation)
38 Application Observations # of components used in a dynamic web page request: median 14, min 6, max 23 large number of tightly coupled components that are always used together # of tightly coupled components cart productitemlist InventoryEJB SignOnEJB ClientControllerEJB ShoppingCartEJB 100% 80% 60% 40% 20% 0% # of components Cumulative % of total requests
39 Metrics Predicted Faults (P) Correctly Identified Faults (C) Actual Faults (A) Precision: C/P Recall: C/A Accuracy: whether all actual faults are correctly identified (recall == 100%) boolean measure
40 4 Analysis Techniques Pinpoint: clusters of components that statistically correlate with failures Detection: components where Java exceptions were detected union across all failed requests similar to what an event monitoring system outputs Intersection: intersection of components used in failed requests Union: union of all components used in failed requests
41 Results 100% 80% 60% 40% 20% average accuracy average precision 76% 77% 50% 33% 83% 15% 100% 11% Average Accuracy 100% 80% 60% 40% 20% Pinpoint Detection Intersection Union 0% Pinpoint Detection Intersection Union 0% # of Interacting Components Pinpoint has high accuracy with relatively high precision
42 Pinpoint Prototype Limitations Assumptions client requests provide good coverage over components and combinations requests are autonomous (don t corrupt state and cause later requests to fail) Currently can t detect the following: faults that only degrade performance faults due to pathological inputs Single-node only
43 Current Status Simple graph visualization
44 Proposed Research 3 classes of large network systems Clustered Internet systems Tiered architecture, high bandwidth, many replicas Peer-to-peer (P2P) systems, including sensor networks Widely distributed nodes, dynamic membership Sensor networks Limited storage, processing, and bandwidth.
45 P2P Systems: Tracing Trace messages by piggybacking the current node name to the messages Tracing overhead Assume 32-bit per node name and a very conservative log2(n) hops for each msg and Data overhead is 40% for a 1500-byte message in a node system
46 P2P Systems: Implementation Verification Current debugging techniques: lots of printf() s on each node and manually correlate the paths taken by messages How do you know the messages are delivered as specified by the algorithms? Use message paths to check for routing invariants detect extra hops, loops, and verify that the paths are correct Can implement as a query: select length from paths where (length > log2(n))
Path-based Macroanalysis for Large Distributed Systems
Path-based Macroanalysis for Large Distributed Systems 1, Anthony Accardi 2, Emre Kiciman 3 Dave Patterson 1, Armando Fox 3, Eric Brewer 1 mikechen@cs.berkeley.edu UC Berkeley 1, Tellme Networks 2, Stanford
More informationUsing Runtime Paths for Macro Analysis
Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer {mikechen, brewer}@cs.berkeley.edu, {emrek, fox}@cs.stanford.edu, anthony@tellme.com Abstract We
More informationUsing Runtime Paths for Macroanalysis
Using Runtime Paths for Macroanalysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer UC Berkeley, Stanford University, Tellme Networks {mikechen, brewer}@cs.berkeley.edu, {emrek, fox}@cs.stanford.edu,
More informationProfiling and diagnosing large-scale decentralized systems David Oppenheimer
Profiling and diagnosing large-scale decentralized systems David Oppenheimer ROC Retreat Thursday, June 5, 2003 1 Why focus on P2P systems? There are a few real ones file trading, backup, IM Look a lot
More informationDistributed Diagnosis of Failures in a Three Tier E-Commerce System. Motivation
Distributed Diagnosis of Failures in a Three Tier E-ommerce System Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad, and Saurabh agchi Dependable omputing Systems Lab (DSL) School of Electrical and omputer
More informationOn the Way to ROC-2 (JAGR: JBoss + App-Generic Recovery)
On the Way to ROC-2 (JAGR: JBoss + App-Generic Recovery) George Candea and many others Stanford + Berkeley Outline J2EE and JBoss JBoss Additions JAGR µreboots Automatic Failure-Path Inference Self-Recovery
More informationAn Introduction to Software Architecture. David Garlan & Mary Shaw 94
An Introduction to Software Architecture David Garlan & Mary Shaw 94 Motivation Motivation An increase in (system) size and complexity structural issues communication (type, protocol) synchronization data
More informationDecentralized systems need decentralized benchmarks
Decentralized systems need decentralized benchmarks David Oppenheimer, David A. Patterson, and Joseph M. Hellerstein University of California at Berkeley, EECS Computer Science Division {davidopp,patterson,jmh}@cs.berkeley.edu
More informationPerformance Debugging for Distributed Systems of Black Boxes. Yinghua Wu, Haiyong Xie
Performance Debugging for Distributed Systems of Black Boxes Yinghua Wu, Haiyong Xie Outline Problem Problem statement & goals Overview of our approach Algorithms The nesting algorithm (RPC) The convolution
More informationSEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES
SEER: LEVERAGING BIG DATA TO NAVIGATE THE COMPLEXITY OF PERFORMANCE DEBUGGING IN CLOUD MICROSERVICES Yu Gan, Yanqi Zhang, Kelvin Hu, Dailun Cheng, Yuan He, Meghna Pancholi, and Christina Delimitrou Cornell
More informationProceedings of the First Symposium on Networked Systems Design and Implementation
USENIX Association Proceedings of the First Symposium on Networked Systems Design and Implementation San Francisco, CA, USA March 29 31, 2004 2004 by The USENIX Association All Rights Reserved For more
More informationHow To Keep Your Head Above Water While Detecting Errors
USENIX / ACM / IFIP 10th International Middleware Conference How To Keep Your Head Above Water While Detecting Errors Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi Dependable Computing
More informationGDB Tutorial. A Walkthrough with Examples. CMSC Spring Last modified March 22, GDB Tutorial
A Walkthrough with Examples CMSC 212 - Spring 2009 Last modified March 22, 2009 What is gdb? GNU Debugger A debugger for several languages, including C and C++ It allows you to inspect what the program
More informationSaaS Providers. ThousandEyes for. Summary
USE CASE ThousandEyes for SaaS Providers Summary With Software-as-a-Service (SaaS) applications rapidly replacing onpremise solutions, the onus of ensuring a great user experience for these applications
More informationPath-Based Failure and Evolution Management
Path-Based Failure and Evolution Management Mike Y. Chen, Anthony Accardi, Emre Kıcıman, Jim Lloyd, Dave Patterson, Armando Fox, Eric Brewer UC Berkeley, Tellme Networks, Stanford University, ebay Inc.
More informationClearSpeed Visual Profiler
ClearSpeed Visual Profiler Copyright 2007 ClearSpeed Technology plc. All rights reserved. 12 November 2007 www.clearspeed.com 1 Profiling Application Code Why use a profiler? Program analysis tools are
More informationOracle Database 11g: Real Application Testing & Manageability Overview
Oracle Database 11g: Real Application Testing & Manageability Overview Top 3 DBA Activities Performance Management Challenge: Sustain Optimal Performance Change Management Challenge: Preserve Order amid
More informationThousandEyes for. Application Delivery White Paper
ThousandEyes for Application Delivery White Paper White Paper Summary The rise of mobile applications, the shift from on-premises to Software-as-a-Service (SaaS), and the reliance on third-party services
More informationRouting Outline. EECS 122, Lecture 15
Fall & Walrand Lecture 5 Outline EECS, Lecture 5 Kevin Fall kfall@cs.berkeley.edu Jean Walrand wlr@eecs.berkeley.edu Definition/Key Questions Distance Vector Link State Comparison Variations EECS - Fall
More information1 Dulcian, Inc., 2001 All rights reserved. Oracle9i Data Warehouse Review. Agenda
Agenda Oracle9i Warehouse Review Dulcian, Inc. Oracle9i Server OLAP Server Analytical SQL Mining ETL Infrastructure 9i Warehouse Builder Oracle 9i Server Overview E-Business Intelligence Platform 9i Server:
More informationThe Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Presented By: Kamalakar Kambhatla
The Design and Implementation of a Next Generation Name Service for the Internet (CoDoNS) Venugopalan Ramasubramanian Emin Gün Sirer Presented By: Kamalakar Kambhatla * Slides adapted from the paper -
More informationIQ for DNA. Interactive Query for Dynamic Network Analytics. Haoyu Song. HUAWEI TECHNOLOGIES Co., Ltd.
IQ for DNA Interactive Query for Dynamic Network Analytics Haoyu Song www.huawei.com Motivation Service Provider s pain point Lack of real-time and full visibility of networks, so the network monitoring
More informationSelf Checking Network Protocols: A Monitor Based Approach
Self Checking Network Protocols: A Monitor Based Approach Gunjan Khanna, Padma Varadharajan, Saurabh Bagchi Dependable Computing Systems Lab School of Electrical and Computer Engineering Purdue University
More informationMethod-Level Phase Behavior in Java Workloads
Method-Level Phase Behavior in Java Workloads Andy Georges, Dries Buytaert, Lieven Eeckhout and Koen De Bosschere Ghent University Presented by Bruno Dufour dufour@cs.rutgers.edu Rutgers University DCS
More informationFailure models. Byzantine Fault Tolerance. What can go wrong? Paxos is fail-stop tolerant. BFT model. BFT replication 5/25/18
Failure models Byzantine Fault Tolerance Fail-stop: nodes either execute the protocol correctly or just stop Byzantine failures: nodes can behave in any arbitrary way Send illegal messages, try to trick
More informationSteps for project success. git status. Milestones. Deliverables. Homework 1 submitted Homework 2 will be posted October 26.
git status Steps for project success Homework 1 submitted Homework 2 will be posted October 26 due November 16, 9AM Projects underway project status check-in meetings November 9 System-building project
More informationNon Intrusive Detection & Diagnosis of Failures in High Throughput Distributed Applications. DCSL Research Thrusts
Non Intrusive Detection & Diagnosis of Failures in High Throughput Distributed Applications Saurabh Bagchi Dependable Computing Systems Lab (DCSL) & The Center for Education and Research in Information
More informationCopyright 2018, Oracle and/or its affiliates. All rights reserved.
Beyond SQL Tuning: Insider's Guide to Maximizing SQL Performance Monday, Oct 22 10:30 a.m. - 11:15 a.m. Marriott Marquis (Golden Gate Level) - Golden Gate A Ashish Agrawal Group Product Manager Oracle
More informationPeer-to-peer systems and overlay networks
Complex Adaptive Systems C.d.L. Informatica Università di Bologna Peer-to-peer systems and overlay networks Fabio Picconi Dipartimento di Scienze dell Informazione 1 Outline Introduction to P2P systems
More informationMagpie: Distributed request tracking for realistic performance modelling
Magpie: Distributed request tracking for realistic performance modelling Rebecca Isaacs Paul Barham Richard Mortier Dushyanth Narayanan Microsoft Research Cambridge James Bulpin University of Cambridge
More informationLecture 13: Traffic Engineering
Lecture 13: Traffic Engineering CSE 222A: Computer Communication Networks Alex C. Snoeren Thanks: Mike Freedman, Nick Feamster Lecture 13 Overview Evolution of routing in the ARPAnet Today s TE: Adjusting
More informationOctober 25, Berkeley/Stanford Recovery-oriented Computing Course Lecture
0 0 20 30 Berkeley/Stanford Recovery-oriented When bad things happen to good systems: detecting and diagnosing problems -2-0 2 3 Kimberly Keeton HPL Storage and Content Distribution Problem definition
More informationBuilding an Internet-Scale Publish/Subscribe System
Building an Internet-Scale Publish/Subscribe System Ian Rose Mema Roussopoulos Peter Pietzuch Rohan Murty Matt Welsh Jonathan Ledlie Imperial College London Peter R. Pietzuch prp@doc.ic.ac.uk Harvard University
More informationPractical Techniques for Regeneration and Immunization of COTS Applications
Practical Techniques for Regeneration and Immunization of COTS Applications Lixin Li Mark R.Cornwell E.Hultman James E. Just R. Sekar Stony Brook University Global InfoTek, Inc (Research supported by DARPA,
More informationAWS Lambda: Event-driven Code in the Cloud
AWS Lambda: Event-driven Code in the Cloud Dean Bryen, Solutions Architect AWS Andrew Wheat, Senior Software Engineer - BBC April 15, 2015 London, UK 2015, Amazon Web Services, Inc. or its affiliates.
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/60 Definition Distributed Systems Distributed System is
More informationInterdomain Routing Design for MobilityFirst
Interdomain Routing Design for MobilityFirst October 6, 2011 Z. Morley Mao, University of Michigan In collaboration with Mike Reiter s group 1 Interdomain routing design requirements Mobility support Network
More informationVLSI Test Technology and Reliability (ET4076)
VLSI Test Technology and Reliability (ET4076) Lecture 4(part 2) Testability Measurements (Chapter 6) Said Hamdioui Computer Engineering Lab Delft University of Technology 2009-2010 1 Previous lecture What
More informationDISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 1 Introduction Definition of a Distributed System (1) A distributed system is: A collection of
More informationIntroduction. Network Architecture Requirements of Data Centers in the Cloud Computing Era
Massimiliano Sbaraglia Network Engineer Introduction In the cloud computing era, distributed architecture is used to handle operations of mass data, such as the storage, mining, querying, and searching
More informationDesigning and debugging real-time distributed systems
Designing and debugging real-time distributed systems By Geoff Revill, RTI This article identifies the issues of real-time distributed system development and discusses how development platforms and tools
More informationJ2EE DIAGNOSING J2EE PERFORMANCE PROBLEMS THROUGHOUT THE APPLICATION LIFECYCLE
DIAGNOSING J2EE PERFORMANCE PROBLEMS THROUGHOUT THE APPLICATION LIFECYCLE ABSTRACT Many large-scale, complex enterprise applications are now built and deployed using the J2EE architecture. However, many
More informationBe Conservative: Enhancing Failure Diagnosis with Proactive Logging
Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Ding Yuan, Soyeon Park, Peng Huang, Yang Liu, Michael Lee, Xiaoming Tang, Yuanyuan Zhou, Stefan Savage University of California, San
More informationCS 520 Theory and Practice of Software Engineering Fall 2018
CS 520 Theory and Practice of Software Engineering Fall 2018 Nediyana Daskalova Monday, 4PM CS 151 Debugging October 30, 2018 Personalized Behavior-Powered Systems for Guiding Self-Experiments Help me
More informationDependability tree 1
Dependability tree 1 Means for achieving dependability A combined use of methods can be applied as means for achieving dependability. These means can be classified into: 1. Fault Prevention techniques
More informationDistributed Computation Models
Distributed Computation Models SWE 622, Spring 2017 Distributed Software Engineering Some slides ack: Jeff Dean HW4 Recap https://b.socrative.com/ Class: SWE622 2 Review Replicating state machines Case
More informationSecurity for Structured Peer-to-peer Overlay Networks. Acknowledgement. Outline. By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818
Security for Structured Peer-to-peer Overlay Networks By Miguel Castro et al. OSDI 02 Presented by Shiping Chen in IT818 1 Acknowledgement Some of the following slides are borrowed from talks by Yun Mao
More informationSoftware Quality Assurance. David Janzen
Software Quality Assurance David Janzen What is quality? Crosby: Conformance to requirements Issues: who establishes requirements? implicit requirements Juran: Fitness for intended use Issues: Who defines
More informationInternet Traffic Classification using Machine Learning
Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville
More information02 - Distributed Systems
02 - Distributed Systems Definition Coulouris 1 (Dis)advantages Coulouris 2 Challenges Saltzer_84.pdf Models Physical Architectural Fundamental 2/58 Definition Distributed Systems Distributed System is
More informationLessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE
Lessons Learned Operating Active/Active Data Centers Ethan Banks, CCIE #20655 @ecbanks Senior Network Architect, Carenection Co-founder, Packet Pushers Interactive http://ethancbanks.com http://packetpushers.net
More informationBasic Concepts. Network Management. Spring Bahador Bakhshi CE & IT Department, Amirkabir University of Technology
Basic Concepts Network Management Spring 2018 Bahador Bakhshi CE & IT Department, Amirkabir University of Technology This presentation is based on the slides listed in references. Outline Introduction:
More informationSelf-defending software: Automatically patching errors in deployed software
Self-defending software: Automatically patching errors in deployed software Michael Ernst University of Washington Joint work with: Saman Amarasinghe, Jonathan Bachrach, Michael Carbin, Sung Kim, Samuel
More informationTwo-Tier Oracle Application
Two-Tier Oracle Application This tutorial shows how to use ACE to analyze application behavior and to determine the root causes of poor application performance. Overview Employees in a satellite location
More informationDistributed File Systems
Distributed File Systems Today l Basic distributed file systems l Two classical examples Next time l Naming things xkdc Distributed File Systems " A DFS supports network-wide sharing of files and devices
More informationOracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it
Oracle Database 12c Rel. 2 Cluster Health Advisor - How it Works & How to Use it Mark V. Scardina - Director Oracle QoS Management & Oracle Autonomous Health Framework Agenda 1 2 3 4 5 6 7 Introduction
More informationORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE
ORACLE ENTERPRISE MANAGER 10g ORACLE DIAGNOSTICS PACK FOR NON-ORACLE MIDDLEWARE Most application performance problems surface during peak loads. Often times, these problems are time and resource intensive,
More informationLecture 8: Networks to Internetworks
Lecture 8: Networks to Internetworks CSE 123: Computer Networks Alex C. Snoeren NO CLASS FRIDAY Lecture 8 Overview Bridging & switching Learning bridges Spanning Tree Internetworking Routering Internet
More informationDebugging Applications in Pervasive Computing
Debugging Applications in Pervasive Computing Larry May 1, 2006 SMA 5508; MIT 6.883 1 Outline Video of Speech Controlled Animation Survey of approaches to debugging Turning bugs into features Speech recognition
More informationLecture 10: Internetworking"
Lecture 10: Internetworking" CSE 123: Computer Networks Alex C. Snoeren HW 2 due NOW! Lecture 10 Overview" Spanning Tree Internet Protocol Service model Packet format 2 Spanning Tree Algorithm" Each bridge
More informationProfile-Guided Program Simplification for Effective Testing and Analysis
Profile-Guided Program Simplification for Effective Testing and Analysis Lingxiao Jiang Zhendong Su Program Execution Profiles A profile is a set of information about an execution, either succeeded or
More informationPeer-to-Peer Systems. Network Science: Introduction. P2P History: P2P History: 1999 today
Network Science: Peer-to-Peer Systems Ozalp Babaoglu Dipartimento di Informatica Scienza e Ingegneria Università di Bologna www.cs.unibo.it/babaoglu/ Introduction Peer-to-peer (PP) systems have become
More informationAuto Management for Apache Kafka and Distributed Stateful System in General
Auto Management for Apache Kafka and Distributed Stateful System in General Jiangjie (Becket) Qin Data Infrastructure @LinkedIn GIAC 2017, 12/23/17@Shanghai Agenda Kafka introduction and terminologies
More informationAMP-Based Flow Collection. Greg Virgin - RedJack
AMP-Based Flow Collection Greg Virgin - RedJack AMP- Based Flow Collection AMP - Analytic Metadata Producer : Patented US Government flow / metadata producer AMP generates data including Flows Host metadata
More informationProfiling Distributed File Systems with Computer Animation
Profiling Distributed File Systems with Computer Animation Andrew Leung, Eric Lalonde, Jake Telleen, Crystal Lee {aleung, elalonde, jtelleen, cryslee}@soe.ucsc.edu CMPS262 Presentation Draft File Systems
More informationEXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the
More informationChapter Outline. Chapter 2 Distributed Information Systems Architecture. Distributed transactions (quick refresh) Layers of an information system
Prof. Dr.-Ing. Stefan Deßloch AG Heterogene Informationssysteme Geb. 36, Raum 329 Tel. 0631/205 3275 dessloch@informatik.uni-kl.de Chapter 2 Distributed Information Systems Architecture Chapter Outline
More informationEnabling Performance & Stress Test throughout the Application Lifecycle
Enabling Performance & Stress Test throughout the Application Lifecycle March 2010 Poor application performance costs companies millions of dollars and their reputation every year. The simple challenge
More informationGenerating String Attack Inputs Using Constrained Symbolic Execution. presented by Kinga Dobolyi
Generating String Attack Inputs Using Constrained Symbolic Execution presented by Kinga Dobolyi What is a String Attack? Web applications are 3 tiered Vulnerabilities in the application layer Buffer overruns,
More informationIntroduction to Peer-to-Peer Systems
Introduction Introduction to Peer-to-Peer Systems Peer-to-peer (PP) systems have become extremely popular and contribute to vast amounts of Internet traffic PP basic definition: A PP system is a distributed
More informationQuickly Pinpoint and Resolve Problems in Windows /.NET Applications TECHNICAL WHITE PAPER
Quickly Pinpoint and Resolve Problems in Windows /.NET Applications TECHNICAL WHITE PAPER Table of Contents Executive Overview...1 Problem Resolution A Major Time Consumer...2 > Inefficiencies of the Problem
More informationIntrusion Recovery for Database-backed Web Applications
Intrusion Recovery for Database-backed Web Applications Ramesh Chandra, Taesoo Kim, Meelap Shah, Neha Narula, Nickolai Zeldovich MIT CSAIL Web applications routinely compromised Web applications routinely
More informationNoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre
NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data
More informationAn Introduction to the Waratek Application Security Platform
Product Analysis January 2017 An Introduction to the Waratek Application Security Platform The Transformational Application Security Technology that Improves Protection and Operations Highly accurate.
More informationHomework 1: Programming Component Routing Packets Within a Structured Peer-to-Peer (P2P) Network Overlay VERSION 1.1
Homework 1: Programming Component Routing Packets Within a Structured Peer-to-Peer (P2P) Network Overlay VERSION 1.1 DUE DATE: Wednesday February 14 th, 2018 @ 5:00 pm Objectives & Overview The objective
More informationDiagnostics in Testing and Performance Engineering
Diagnostics in Testing and Performance Engineering This document talks about importance of diagnostics in application testing and performance engineering space. Here are some of the diagnostics best practices
More informationBarry D. Lamkin Executive IT Specialist Capitalware's MQ Technical Conference v
What happened to my Transaction? Barry D. Lamkin Executive IT Specialist blamkin@us.ibm.com Transaction Tracking - APM Transaction Tracking is a major part of Application Performance Monitoring To ensure
More informationWHAT IS SOFTWARE ARCHITECTURE?
WHAT IS SOFTWARE ARCHITECTURE? Chapter Outline What Software Architecture Is and What It Isn t Architectural Structures and Views Architectural Patterns What Makes a Good Architecture? Summary 1 What is
More informationProviding Real-Time and Fault Tolerance for CORBA Applications
Providing Real-Time and Tolerance for CORBA Applications Priya Narasimhan Assistant Professor of ECE and CS University Pittsburgh, PA 15213-3890 Sponsored in part by the CMU-NASA High Dependability Computing
More informationStatistical Debugging for Real-World Performance Problems. Linhai Song Advisor: Prof. Shan Lu
Statistical Debugging for Real-World Performance Problems Linhai Song Advisor: Prof. Shan Lu 1 Software Efficiency is Critical No one wants slow and inefficient software Frustrate end users Cause economic
More informationCooperative Bug Isolation
Cooperative Bug Isolation Alex Aiken Mayur Naik Stanford University Alice Zheng Michael Jordan UC Berkeley Ben Liblit University of Wisconsin Build and Monitor Alex Aiken, Cooperative Bug Isolation 2 The
More informationSecuring BGP. Geoff Huston November 2007
Securing BGP Geoff Huston November 2007 Agenda An Introduction to BGP BGP Security Questions Current Work Research Questions An Introduction to BGP Background to Internet Routing The routing architecture
More informationRegression testing. Whenever you find a bug. Why is this a good idea?
Regression testing Whenever you find a bug Reproduce it (before you fix it!) Store input that elicited that bug Store correct output Put into test suite Then, fix it and verify the fix Why is this a good
More informationIntroduction and Statement of the Problem
Chapter 1 Introduction and Statement of the Problem 1.1 Introduction Unlike conventional cellular wireless mobile networks that rely on centralized infrastructure to support mobility. An Adhoc network
More informationAdaptive Cluster Computing using JavaSpaces
Adaptive Cluster Computing using JavaSpaces Jyoti Batheja and Manish Parashar The Applied Software Systems Lab. ECE Department, Rutgers University Outline Background Introduction Related Work Summary of
More informationCS 457 Networking and the Internet. Shortest-Path Problem. Dijkstra s Shortest-Path Algorithm 9/29/16. Fall 2016
9/9/6 S 7 Networking and the Internet Fall 06 Shortest-Path Problem Given: network topology with link costs c(x,y): link cost from node x to node y Infinity if x and y are not direct neighbors ompute:
More informationThe Measurement Manager Modular End-to-End Measurement Services
The Measurement Manager Modular End-to-End Measurement Services Ph.D. Research Proposal Department of Electrical and Computer Engineering University of Maryland, College Park, MD Pavlos Papageorgiou pavlos@eng.umd.edu
More informationWireless Sensor Architecture GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO
Wireless Sensor Architecture 1 GENERAL PRINCIPLES AND ARCHITECTURES FOR PUTTING SENSOR NODES TOGETHER TO FORM A MEANINGFUL NETWORK Mobile ad hoc networks Nodes talking to each other Nodes talking to some
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationA Serializability Violation Detector for Shared-Memory Server Programs
A Serializability Violation Detector for Shared-Memory Server Programs Min Xu Rastislav Bodík Mark Hill University of Wisconsin Madison University of California, Berkeley Serializability Violation Detector:
More informationEnd-to-end Management with Grid Control. John Abrahams Technology Sales Consultant Oracle Nederland B.V.
End-to-end Management with Grid Control John Abrahams Technology Sales Consultant Oracle Nederland B.V. Agenda End-to-end management with Grid Control Database Performance Management Challenges Complexity
More informationJ2EE Instrumentation for software aging root cause application component determination with AspectJ
J2EE Instrumentation for software aging root cause application component determination with AspectJ Javier Alonso Josep Ll. Berral Ricard Gavaldà Jordi Torres Technical University of Catalonia, Spain Contents
More informationDynamic Analytics Extended to all layers Utilizing P4
Dynamic Analytics Extended to all layers Utilizing P4 Tom Tofigh, AT&T Nic VIljoen, Netronome This Talk is about Why P4 should be extended to other layers Interoperability - Utilizing common framework
More informationCSci 4061 Introduction to Operating Systems. Programs in C/Unix
CSci 4061 Introduction to Operating Systems Programs in C/Unix Today Basic C programming Follow on to recitation Structure of a C program A C program consists of a collection of C functions, structs, arrays,
More informationService Mesh and Microservices Networking
Service Mesh and Microservices Networking WHITEPAPER Service mesh and microservice networking As organizations adopt cloud infrastructure, there is a concurrent change in application architectures towards
More informationDeployment Planning Guide
Deployment Planning Guide Community 1.5.1 release The purpose of this document is to educate the user about the different strategies that can be adopted to optimize the usage of Jumbune on Hadoop and also
More informationRecap. CSE 486/586 Distributed Systems Google Chubby Lock Service. Recap: First Requirement. Recap: Second Requirement. Recap: Strengthening P2
Recap CSE 486/586 Distributed Systems Google Chubby Lock Service Steve Ko Computer Sciences and Engineering University at Buffalo Paxos is a consensus algorithm. Proposers? Acceptors? Learners? A proposer
More informationDifferent attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT
Different attack manifestations Network packets OS calls Audit records Application logs Different types of intrusion detection Host vs network IT environment (e.g., Windows vs Linux) Levels of abstraction
More informationKEYNOTE FOR
THE OS SCHEDULER: A PERFORMANCE-CRITICAL COMPONENT IN LINUX CLUSTER ENVIRONMENTS KEYNOTE FOR BPOE-9 @ASPLOS2018 THE NINTH WORKSHOP ON BIG DATA BENCHMARKS, PERFORMANCE, OPTIMIZATION AND EMERGING HARDWARE
More informationDISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES
DISTRIBUTED COMPUTER SYSTEMS ARCHITECTURES Dr. Jack Lange Computer Science Department University of Pittsburgh Fall 2015 Outline System Architectural Design Issues Centralized Architectures Application
More information