TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow

Similar documents
APPENDIX Summary of Benchmarks

Impact of Cache Coherence Protocols on the Processing of Network Traffic

ProtoFlex Tutorial: Full-System MP Simulations Using FPGAs

Shengyue Wang, Xiaoru Dai, Kiran S. Yellajyosula, Antonia Zhai, Pen-Chung Yew Department of Computer Science & Engineering University of Minnesota

Aries: Transparent Execution of PA-RISC/HP-UX Applications on IPF/HP-UX

15-740/ Computer Architecture Lecture 10: Runahead and MLP. Prof. Onur Mutlu Carnegie Mellon University

Which is the best? Measuring & Improving Performance (if planes were computers...) An architecture example

POSH: A TLS Compiler that Exploits Program Structure

Microarchitecture Overview. Performance

Chip-Multithreading Systems Need A New Operating Systems Scheduler

Improvements to Linear Scan register allocation

Binary Stirring: Self-randomizing Instruction Addresses of Legacy x86 Binary Code

Microarchitecture Overview. Performance

Inserting Data Prefetches into Loops in Dynamically Translated Code in IA-32EL. Inserting Prefetches IA-32 Execution Layer - 1

Enhanced Debugging with Traces

Execution-based Prediction Using Speculative Slices

Workloads, Scalability and QoS Considerations in CMP Platforms

Breaking Cyclic-Multithreading Parallelization with XML Parsing. Simone Campanoni, Svilen Kanev, Kevin Brownell Gu-Yeon Wei, David Brooks

Quality Assurance: Test Development & Execution. Ian S. King Test Development Lead Windows CE Base OS Team Microsoft Corporation

Introduction to Java Programming

Debugging and profiling in R

Dnmaloc: a more secure memory allocator

Short-term Memory for Self-collecting Mutators. Martin Aigner, Andreas Haas, Christoph Kirsch, Ana Sokolova Universität Salzburg

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

CS266 Software Reverse Engineering (SRE) Reversing and Patching Wintel Machine Code

Deterministic Process Groups in

I/O, Disks, and RAID Yi Shi Fall Xi an Jiaotong University

An Analysis of the Amount of Global Level Redundant Computation in the SPEC 95 and SPEC 2000 Benchmarks

The Z Garbage Collector Low Latency GC for OpenJDK

Systems software design. Software build configurations; Debugging, profiling & Quality Assurance tools

IntFlow: Integer Error Handling With Information Flow Tracking

Low level security. Andrew Ruef

Inlining Java Native Calls at Runtime

Welcome. HRSK Practical on Debugging, Zellescher Weg 12 Willers-Bau A106 Tel

The Z Garbage Collector An Introduction

Call Paths for Pin Tools

Employing Multi-FPGA Debug Techniques

Ready to Automate? Ready to Automate?

Visual Profiler. User Guide

José F. Martínez 1, Jose Renau 2 Michael C. Huang 3, Milos Prvulovic 2, and Josep Torrellas 2

System Wide Tracing User Need

Wish Branch: A New Control Flow Instruction Combining Conditional Branching and Predicated Execution

1. Introduction to Concurrent Programming

Background. Let s see what we prescribed.

Architecture Cloning For PowerPC Processors. Edwin Chan, Raul Silvera, Roch Archambault IBM Toronto Lab Oct 17 th, 2005

Chapter 10. Improving the Runtime Type Checker Type-Flow Analysis

Speculative Multithreaded Processors

Artemis: Practical Runtime Monitoring of Applications for Errors

Supporting Operating System Kernel Data Disambiguation using Points-to Analysis

Koji Inoue Department of Informatics, Kyushu University Japan Science and Technology Agency

Improving Error Checking and Unsafe Optimizations using Software Speculation. Kirk Kelsey and Chen Ding University of Rochester

Jackson Marusarz Software Technical Consulting Engineer

What's New in CDT 7.0? dominique dot toupin at ericsson dot com

Outline. Speculative Register Promotion Using Advanced Load Address Table (ALAT) Motivation. Motivation Example. Motivation

Computer System. Performance

COL862 Programming Assignment-1

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

Frequently Asked Questions about Real-Time

Debugging. ICS312 Machine-Level and Systems Programming. Henri Casanova

Improved Error Reporting for Software that Uses Black-box Components

Parallel Streaming Computation on Error-Prone Processors. Yavuz Yetim, Margaret Martonosi, Sharad Malik

Multithreading: Exploiting Thread-Level Parallelism within a Processor

Outline. Threads. Single and Multithreaded Processes. Benefits of Threads. Eike Ritter 1. Modified: October 16, 2012

It was a dark and stormy night. Seriously. There was a rain storm in Wisconsin, and the line noise dialing into the Unix machines was bad enough to

File Systems. CS 4410 Operating Systems. [R. Agarwal, L. Alvisi, A. Bracy, M. George, E. Sirer, R. Van Renesse]

ATOS introduction ST/Linaro Collaboration Context

Virtual Memory. Reading. Sections 5.4, 5.5, 5.6, 5.8, 5.10 (2) Lecture notes from MKP and S. Yalamanchili

IBM Memory Expansion Technology

Design of Experiments - Terminology

Using Intel VTune Amplifier XE and Inspector XE in.net environment

Frequently Asked Questions about Real-Time

Microarchitecture-Based Introspection: A Technique for Transient-Fault Tolerance in Microprocessors. Moinuddin K. Qureshi Onur Mutlu Yale N.

Oracle Developer Studio Code Analyzer

An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

A Cross-Architectural Interface for Code Cache Manipulation. Kim Hazelwood and Robert Cohn

Memory-Link Compression Schemes: A Value Locality Perspective. Georgy Ushakov Institutt for datateknikk og informasjonsteknologi

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

Chapter 8: Virtual Memory. Operating System Concepts Essentials 2 nd Edition

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Intel Threading Tools

STARCOUNTER. Technical Overview

Probabilistic Replacement: Enabling Flexible Use of Shared Caches for CMPs

Memory & Thread Debugger

Dynamic Tracing and Instrumentation

GFS: The Google File System

A Cost Effective Spatial Redundancy with Data-Path Partitioning. Shigeharu Matsusaka and Koji Inoue Fukuoka University Kyushu University/PREST

A Serializability Violation Detector for Shared-Memory Server Programs

CHAPTER 2: SYSTEM STRUCTURES. By I-Chen Lin Textbook: Operating System Concepts 9th Ed.

Chapter 9: Virtual-Memory

What run-time services could help scientific programming?

CSE 374 Programming Concepts & Tools

The Google File System

DieHard: Memory Error Fault Tolerance in C and C++

Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure

CS252 S05. Main memory management. Memory hardware. The scale of things. Memory hardware (cont.) Bottleneck

Distributed Systems. Pre-Exam 1 Review. Paul Krzyzanowski. Rutgers University. Fall 2015

Effective Memory Protection Using Dynamic Tainting

csc444h: so(ware engineering I matt medland

Statistical Debugging for Real-World Performance Problems. Linhai Song Advisor: Prof. Shan Lu

Transcription:

TraceBack: First Fault Diagnosis by Reconstruction of Distributed Control Flow Andrew Ayers Chris Metcalf Junghwan Rhee Richard Schooler VERITAS Emmett Witchel Microsoft Anant Agarwal UT Austin MIT

Software Support Why aren t users also useful testers? Neither users nor developers benefit from bug Many options for user, all bad Developer tools can t be used in production Sometimes testers aren t useful testers Debugger Designer Support Desk Debug Mode Memory Checker Developer Knowledge Base Support Engineer User

Bug Reports Lack Information Thoroughly documenting a bug is difficult Bug re-creation is difficult and expensive Many software components, what version? Might require large, expensive infrastructure Might require high load levels, concurrent activity (not reproducible by user) Might involve proprietary code or data Bug re-creation does not leverage user s initial bug experience

TraceBack: First Fault Diagnosis Provide debugger-like experience to developer from user s execution TraceBack helps first time user has a problem Easier model for users and testers TraceBack constraints Consume limited resources (always on) Tolerate failure conditions Do not modify source code Systems are more expensive to support than they are to purchase

TraceBack In Action Step back Step back out

Design Implementation Deployment issues Talk Outline Supporting long-running applications Memory allocation issues Generating TraceBack bug reports Trace viewing Cross-language, cross-machine traces Results

TraceBack Design Code instrumentation + runtime support Do more work before and after execution time to minimize impact on execution time Records only control flow Stores environment + optionally core dump Captures only recent history Circular buffer of trace records in memory Previous 64K lines Vendor statically instruments product using TB, gets TB bug reports from field

Instrumentation Code Instrumentation records execution history Executable instrumented with code probes (statically minimize impact on execution time) Code probes write trace records in memory Common case flip one bit per basic block Each thread has its own trace buffer Instrumented executable 1 2 Instrumentation original code Instrumentation original code Trace buffer Buffer header 1 2 Trace records

Efficiently Encoding Control Flow Minimize time & space overhead Partition control flow graph by DAGs Trace record one word per DAG DAG header writes DAG number DAG blocks set bits (with single or) Break DAGs at calls Easiest way for inter-procedural trace Any call can cross modules Performance overhead DAG 1 Module becomes sequence of DAGs 2 4 1 3 DAG Headers 5 DAG 2 6 Control flow graph of basic blocks

Module DAG Renumbering Real applications made of many modules Code modules instrumented independently Which DAG is really DAG number 1? Modules heuristically instrumented with disjoint DAG number spaces (dll rebasing) TraceBack runtime monitors DAG space If it loads a module with a conflicting space, it renumbers the DAGs If it reloads same module, it uses the same number space (support long running apps)

Allocating Trace Buffers What happens if there are more threads than trace buffers? Delegate one buffer as desperation buffer Instrumentation must write records somewhere Don t recover trace data, but don t crash On buffer wrap, retry buffer allocation What if no buffers can be allocated? Use static buffer, compiled into runtime What if thread runs no instrumented code? Start it in zero-length probation buffer

Sub-Buffering Current trace record pointer is in thread-local storage When a thread terminates abruptly, pointer disappears Where is the last record? Break buffers into regions Zero sub-region when code enters Current sub-buffer is the one with zeroed records Thread 0 Buffer Ptr Buffer Header Buffer Header Sub-Buffer Trailer Sub-Buffer Trailer Trace Records Zeroed Partition Trace Buffer

Snapshots Trace buffer in memory mapped file Persistent even if application crashes/hangs Snapshot is a copy of the trace buffers External program (e.g., on program hang) Program event, like an exception (debugging) Programmatic API (at unreachable point) Snap suppression is key users want to store and examine unique snapshots

Trace Reconstruction Trace records converted to line trace Refine line trace for exceptions Users hate seeing a line executed after the line that took an exception Call structure recreated Don t waste time & space at runtime Threads interleaved plausibly Realtime timestamps for ordering

Cross language trace Trace records are language independent

Distributed Tracing Logical threads Real time clocks

Implementation TraceBack on x86 6 engineer-years ( 99-01) 20 engineer-years total for TB functionality Still sold as part of VERITAS Application Saver TraceBack product supports many platforms Language C/C++, VB6, Java,.NET C/C++, Java Java only COBOL OS/Architecture Windows/x86 Linux/x86, Solaris/SPARC AIX/PPC, HP-UX/PA OS/390

SPECInt2000 Performance Results 500 400 300 200 100 Normal TraceBack Seconds ammp art bzip2 crafty 0 eon equake gap gcc gzip mcf mesa parser perlbmk vortex vpr Geometric mean slowdown 60% 3GHz P4, 2GB RAM, VC 7.1, ref inputs

Webserver Performance Results SPECJbb SPECWeb99/Apache Completed Transactions 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Win 1W Win 5W Normal TraceBack Lin 1W Lin 5W Sun 1W Sun 5W 400 350 300 250 200 150 100 50 0 Kbits/sec 70 60 50 40 30 20 10 0 ops/sec 400 350 300 250 200 150 100 50 0 Response (ms) Multi-threaded, long running server apps SPECJbb throughput reduced 16% 25% SPECWeb throughput & latency reduced < 5% Phase Forward slowdown less than 5%

Real World Examples Phase Forward s C++ application hung due to a third party database dll Cross-process trace got Phase Forward a fix from database company At Oracle, TraceBack found cause of a slow Java/C++ application too many Java exceptions A call to sleep had been wrapped in a try/catch block Argument to sleep was a random integer, negative half the time The TraceBack GUI itself (written in C++) is instrumented with TraceBack At ebay, the GUI became unresponsive (to Ayers) Ayers took a snap, and sent the trace, in real time (to Metcalf) Culprit was an O(n 2 ) algorithm in the GUI Ayers told the engineers at ebay, right then, about the bug and its fix

Related Work Path profiling [Ball/Larus 96, Duesterwald 00, Nandy 03, Bond 05] Some interprocedural extensions. [Tallam 04] Most recent work on making it more efficient (e.g., using sampling which TraceBack can t). Statistical bug hunting [Liblit 03 & 05] Virtutech s Hindsight, reverse execution in a machine simulator 2005. Omniscient debugger [Lewis 03] Microsoft Watson crashdump analysis. Static translation systems (ATOM, etc.)

Future Work Navel current project at UT Connect user with workarounds for common bugs Use program trace to search knowledge base Machine learning does the fuzzy matching Eliminating duplicate bug reports Program behavior is useful data

TraceBack Application of binary translation research Efficient enough to be always on Provides developer with debugger-like information from crash report Multiple threads Multiple languages Multiple machines Thank you