Diagnosing Production-Run Concurrency-Bug Failures. Shan Lu University of Wisconsin, Madison

Similar documents
Production-Run Software Failure Diagnosis via Hardware Performance Counters. Joy Arulraj, Po-Chun Chang, Guoliang Jin and Shan Lu

Leveraging the Short-Term Memory of Hardware to Diagnose Production-Run Software Failures. Joy Arulraj, Guoliang Jin and Shan Lu

Production-Run Software Failure Diagnosis via Hardware Performance Counters

Statistical Debugging for Real-World Performance Problems

CFix. Automated Concurrency-Bug Fixing. Guoliang Jin, Wei Zhang, Dongdong Deng, Ben Liblit, and Shan Lu. University of Wisconsin Madison

Yuxi Chen, Shu Wang, Shan Lu, and Karthikeyan Sankaralingam *

Statistical Debugging for Real-World Performance Problems. Linhai Song Advisor: Prof. Shan Lu

Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation

Automated Adaptive Bug Isolation using Dyninst. Piramanayagam Arumuga Nainar, Prof. Ben Liblit University of Wisconsin-Madison

Understanding and Genera-ng High Quality Patches for Concurrency Bugs. Haopeng Liu, Yuxi Chen and Shan Lu

Failure Sketching: A Technique for Automated Root Cause Diagnosis of In-Production Failures

Do you have to reproduce the bug on the first replay attempt?

Transaction Memory for Existing Programs Michael M. Swift Haris Volos, Andres Tack, Shan Lu, Adam Welc * University of Wisconsin-Madison, *Intel

TERN: Stable Deterministic Multithreading through Schedule Memoization

DoubleChecker: Efficient Sound and Precise Atomicity Checking

Understanding the Interleaving Space Overlap across Inputs and So7ware Versions

Dynamically Detecting and Tolerating IF-Condition Data Races

Vulcan: Hardware Support for Detecting Sequential Consistency Violations Dynamically

Profile-Guided Program Simplification for Effective Testing and Analysis

Martin Kruliš, v

Deterministic Process Groups in

Last 2 Classes: Introduction to Operating Systems & C++ tutorial. Today: OS and Computer Architecture

Optimistic Shared Memory Dependence Tracing

Cooperative Crug Isolation

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments

Review: Easy Piece 1

Automated Concurrency-Bug Fixing

Chimera: Hybrid Program Analysis for Determinism

Slides by Y. Nir-Buchbinder, adapted by O. Agmon Ben-Yehuda 1/30

Parallel storage allocator

Automatically Repairing Concurrency Bugs with ARC MUSEPAT 2013 Saint Petersburg, Russia

24-vm.txt Mon Nov 21 22:13: Notes on Virtual Machines , Fall 2011 Carnegie Mellon University Randal E. Bryant.

Operating Systems CMPSCI 377 Spring Mark Corner University of Massachusetts Amherst

Samsara: Efficient Deterministic Replay in Multiprocessor. Environments with Hardware Virtualization Extensions

CS 153 Design of Operating Systems Winter 2016

A Serializability Violation Detector for Shared-Memory Server Programs

Deterministic Replay and Data Race Detection for Multithreaded Programs

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Multiprocessors and Locking

Potential violations of Serializability: Example 1

Operating Systems. Operating System Structure. Lecture 2 Michael O Boyle

Causes of Software Failures

CS 333 Introduction to Operating Systems. Class 3 Threads & Concurrency. Jonathan Walpole Computer Science Portland State University

Speculative Synchronization

Dept. of CSE, York Univ. 1

Lecture 13: Consistency Models. Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models

Part 1: Introduction to device drivers Part 2: Overview of research on device driver reliability Part 3: Device drivers research at ERTOS

AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors

Advanced Memory Management

Transactional Memory. Concurrency unlocked Programming. Bingsheng Wang TM Operating Systems

Lecture 2 Fundamental OS Concepts. Bo 2018, Spring

Computer Architecture

Active Testing for Concurrent Programs

CMSC Computer Architecture Lecture 15: Memory Consistency and Synchronization. Prof. Yanjing Li University of Chicago

DMP Deterministic Shared Memory Multiprocessing

Fast access ===> use map to find object. HW == SW ===> map is in HW or SW or combo. Extend range ===> longer, hierarchical names

To Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1

THE UNIVERSITY OF CHICAGO TRANSACTIONAL MEMORY SUPPORT FOR CONCURRENCY-BUG FAILURE RECOVERY IN PRODUCTION RUN A DISSERTATION SUBMITTED TO

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

Synchronization. CS61, Lecture 18. Prof. Stephen Chong November 3, 2011

OS impact on performance

Wrong Path Events and Their Application to Early Misprediction Detection and Recovery

BERKELEY PAR LAB. UPC-THRILLE Demo. Chang-Seo Park

USCOPE: A SCALABLE UNIFIED TRACER FROM KERNEL TO USER SPACE

[537] Locks. Tyler Harter

I/O Systems (3): Clocks and Timers. CSE 2431: Introduction to Operating Systems

Last class: Today: Course administration OS definition, some history. Background on Computer Architecture

Hardware Performance Monitoring Unit Working Group Outbrief

Log-Based Transactional Memory

Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability

Be Conservative: Enhancing Failure Diagnosis with Proactive Logging

DEBUGGING: DYNAMIC PROGRAM ANALYSIS

Performance analysis basics

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Common Computer-System and OS Structures

Xen and the Art of Virtualization. CSE-291 (Cloud Computing) Fall 2016

EIO: Error-handling is Occasionally Correct

Cooperative Bug Isolation

FlexSC. Flexible System Call Scheduling with Exception-Less System Calls. Livio Soares and Michael Stumm. University of Toronto

Main Points of the Computer Organization and System Software Module

Synchronization I. Jo, Heeseung

Lecture 16: Checkpointed Processors. Department of Electrical Engineering Stanford University

THREADS: (abstract CPUs)

Virtual Machine Design

FaRM: Fast Remote Memory

Synchronization for Concurrent Tasks

Concurrent programming: Introduction I

New features in AddressSanitizer. LLVM developer meeting Nov 7, 2013 Alexey Samsonov, Kostya Serebryany

CS 537: Introduction to Operating Systems (Summer 2017) University of Wisconsin-Madison Department of Computer Sciences.

Dealing with Issues for Interprocess Communication

ConSeq: Detecting Concurrency Bugs through Sequential Errors

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Computer Systems Engineering: Spring Quiz I Solutions

Fixing, preventing, and recovering from concurrency bugs

Physical memory vs. Logical memory Process address space Addresses assignment to processes Operating system tasks Hardware support CONCEPTS 3.

CSE 153 Design of Operating Systems

Portland State University ECE 587/687. Virtual Memory and Virtualization

Dr. D. M. Akbar Hussain DE5 Department of Electronic Systems

Lecture 20: Transactional Memory. Parallel Computer Architecture and Programming CMU , Spring 2013

Computer Architecture Area Fall 2009 PhD Qualifier Exam October 20 th 2008

Transcription:

Diagnosing Production-Run Concurrency-Bug Failures Shan Lu University of Wisconsin, Madison 1

Outline Myself and my group Production-run failure diagnosis What is this problem What are our solutions CCI [OOPSLA 10] PBI [ASPLOS 13] LXR [ASPLOS 14] Conclusions 2

A little bit about myself Shan 山 Lu 卢 3

The most exciting thing 5

Software bugs How many of you have been bothered by bugs? 6

Fighting software bugs is crucial Software is everywhere http://en.wikipedia.org/wiki/list_of_software_bugs Software bugs are widespread and costly Lead to 40% system down time [Blueprints 2000] Cost 312 Billion lost per year [Cambridge 2013] 7

Different aspects of fighting bugs In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing Low overhead High accuracy High accuracy 8

Work from my group In-house In-field In-field In-house bug detection failure recovery failure diagnosis bug fixing concurrency bugs [ASPLOS06];[SOSP07 ];[ASPLOS09];[ASPL OS10]; [ASPLOS11]; [OOPSLA13] [ASPLOS13.A] [FSE14] [OOPSLA10]; [ASPLOS13.B]; [ASPLOS14] [PLDI11]; [OSDI12] performance bugs [PLDI12]; [ICSE13] Not yet [OOPSLA14] [CAV13] 9

Our high-level approach [SOSP07];[ASPLOS11];[OOPSLA10]; [PLDI11];[PLDI12];[OSDI12];[ASPLOS13.A]; [CAV13] [ASPLOS06];[SOSP07];[ASPLOS09]; [OOPSLA10];[ASPLOS10];[ASPLOS11]; [ASPLOS13.B]; [ICSE13]; [OOPSLA13] fault failure [ASPLOS08] [PLDI12] Cause Effect [ASPLOS10] [ASPLOS11] error [ASPLOS06];[MICRO06]; [ASPLOS13.B];[ASPLOS14] [ASPLOS06];[SOSP07];[OOPSLA10]; [ASPLOS13.B]; [ASPLOS14];[OOPSLA14] 10

Focus of this talk In-house bug detection In-field failure recovery In-field failure diagnosis In-house bug fixing concurrency bugs [ASPLOS06]; [SOSP07]; [ASPLOS09] [ASPLOS10]; [ASPLOS11]; [OOPSLA13] [ASPLOS13.A]; [FSE14] [OOPSLA10]; [ASPLOS13.B]; [ASPLOS14] [PLDI11]; [OSDI12] performance bugs [PLDI12]; [ICSE13] Not yet [OOPSLA14] [CAV13] 11

What are concurrency bugs? Untimely accesses among threads (buggy interleavings) Thread 1 Thread 2 Thread 1 Thread 2 ptr = malloc(size); if (!ptr){ ReportOutofMem(); exit(1); } free(ptr); ptr=null; print( %u, End); print( %u, End-Start); End=time(); Mozilla FFT 12

Con. bugs are common 13

Con. bugs manifest in the field These failures need to be diagnosed before they can be fixed! 14

Failure diagnosis is challenging Limited information Failures are difficult to repeat Root causes are difficult to reason about 15

Example Thread 1 ptr = malloc(size); if (!ptr){ ReportOutofMem(); exit(1); } Thread 2 free(ptr); ptr=null; Mozilla 16

Example InitState(...){ table = New(); if (table == NULL) { ReportOutOfMemory(); return JS_FALSE; } } CALL STACK ReportOutOfMemory(){ error("out of memory"); } ReportOutofMemory() InitState() main() ***.js out of memory L 17

Design space Questions What to collect How to collect How to use the collected Goals Performance Capability Latency 18

Previous work Performance bug detector coredump replay 19

Our work Performance CCI bug detector coredump replay 20

Our work Performance PBI CCI bug detector coredump replay 21

Our work Diagnostic Latency Performance PBI LXR CCI bug detector coredump replay 22

Outline Myself & my group Production-run failure diagnosis What is the problem What are our solutions Latency PBI CCI Performance Conclusion LXR 23

How to do better than state-of-art? What to collect How to collect How to use the collected All or Nothing Performance Capability Latency 24

How to do better than state-of-art? What to collect How to collect How to use the collected Sampling Performance Capability Latency 25

How to do better than state-of-art? What to collect How to collect How to use the collected Sampling Cooperative statistical analysis Performance Capability Latency 26

Cooperative Bug Isolation (CBI) Branch Return value True in most failure runs, false in most correct runs. Failure Predictors Program Source Statistical Debugging Compiler Predicates Sampling Predicates & J/L Performance Good?? Capability 27

Does it work for concurrency bugs? Thread 1 ptr = malloc(size); if (!ptr){ //b ReportOutofMem(); exit(1); } Thread 2 free(ptr); ptr=null; Predicate J L takenb 0 1!takenb 1 0 Why does CBI not work? 28

Cooperative Con-Bug Isolation (CCI) Program Source Compiler Predicates Sampling Failure Predictors Statistical Debugging Predicates & J/L Performance Mixed Capability Good Instrumentation and Sampling Strategies for Cooperative Concurrency Bug Isolation, OOPSLA 10 29

What to collect? (predicate design) Capability reflect the root causes of many concurrency bugs Performance Simple properties that 30

Concurrency bug root cause patterns Atomicity Violation Order Violation Learning from Mistakes --- A Comprehensive Study on Real World Concurrency Bug Characteristics, ASPLOS 08 31

Concurrency bug root cause patterns Atomicity Violation thread 1 thread 2 thread 1 thread 2 Order Violation thread 1 thread 2 thread 1 thread 2 access x access x access x access x access x access x access x access x access x access x J L J L 32

CCI-Prev predicate Whether two successive accesses to a memory location were by two distinct threads or one thread 33

CCI-Prev can reflect root causes Atomicity Violation thread 1 thread 2 thread 1 thread 2 Order Violation thread 1 thread 2 thread 1 thread 2 access x access x access x access x access x access x access x access x access x access x J L J L 34

Is CCI-Prev useful? (Example) Thread 1 ptr = malloc(size); if (!ptr){ ReportOutofMem(); exit(1); } Thread 2 free(ptr); ptr=null; Mozilla 35

Example (correct runs) thread 1 thread 2 I ptr = malloc (SIZE); if (!ptr) { ReportOutofMem(); exit(1); } free (ptr); ptr=null; Predicate J L remote I 0 0 local I 01 0 J 36

Example (failure run) thread 1 ptr = malloc (SIZE); thread 2 free (ptr); ptr=null; Predicate J L remote I 0 0 1 local I 1 0 I if (!ptr) { ReportOutofMem(); exit(1); } L 37

How to evaluate? I thread 1 thread 2 ptr = malloc (SIZE); lock(glock); remote = test_and_insert(& ptr, curtid); record(i, remote); temp = ptr; unlock(glock); if (!temp) { ReportOutofMem(); exit(1); } free (ptr); ptr=null; a global hash table address ThreadID & ptr 12 Predicate J L remote I 0 01 local I 1 0 38

How to sample? 39

How to sample branch predicates? A: if (!temp2) { if (sample()) record (A, TRUE); } else { if (sample()) record (A, FALSE); } independent B: if (!temp) { if (sample()) record (B, TRUE); } else { if (sample()) record (B, FALSE); } B: if (!temp3) { if (sample()) record (C, TRUE); } else { if (sample()) record (C, FALSE); independent } 40

How to sample CCI-Prev? thread 1 thread 2 ptr = malloc (SIZE); free (ptr); ptr=null; if (!ptr) { ReportOutofMem(); exit(1); } Does traditional sampling work? 41

How to sample CCI-Prev? thread 1 thread 2 if (sample()) lock (..); ptr = tmp1; unlock(); else if (sample()) lock (..); tmp3 = ptr; unlock(); else cannot be independent cannot be independent if (sample()) lock (..); tmp2 = ptr; unlock(); else if (sample()) lock (..); ptr=null; unlock(); else Does traditional sampling work? NO! 42

Thread-coordinated, bursty sampling thread 1 thread 2 if (sample()) lock (..); ptr = tmp1; unlock(); else if (sample()) lock (..); tmp2 = ptr; unlock(); else if (sample()) lock (..); tmp3 = ptr; unlock(); else if (sample()) lock (..); ptr=null; unlock(); else 43

Capability (manual effort) Other predicates Performance (overhead) Havoc Prev FunRe 44

Evaluation methodology Program Apache-1 Apache-2 Cherokee FFT LU Mozilla-JS-1 Mozilla-JS-2 Mozilla-JS-3 PBZIP2 CCI-Prev top1 top1 top1 top1 top1 top2 top1 CIL-based static code instrumentor 1/100 sampling rate, ~3000 runs in total (failure:success~1:1) 45

Diagnosis capability (w/ sampling) Program Apache-1 Apache-2 Cherokee FFT LU Mozilla-JS-1 Mozilla-JS-2 Mozilla-JS-3 PBZIP2 CCI-Prev top1 top1 top1 top1 top1 top2 top1 1/1000 sampling rate, ~3000 runs in total (failure:success~1:1) 46

Diagnosis performance (overhead) Prev No Sampling Sampling Apache-1 62.6% 1.9% Apache-2 8.4% 0.5% Cherokee 19.1% 0.3% FFT 169 % 24.0% LU 57857 % 949 % Mozilla-JS 11311 % 606 % PBZIP2 0.2% 0.2% 47

Are we done? Performance CCI bug detector coredump replay 48

Outline Performance PBI CCI bug detector coredump replay 49

How to do better than CCI? What to collect How to collect How to use the collected CCI-Prev Sampling Cooperative statistical analysis Performance Capability Latency 50

How to do better than CCI? What to collect How to collect How to use the collected Sampling Slow sampling infrastructure Performance Capability Latency 51

How to do better than CCI? What to collect How to collect How to use the collected Sampling Slow sampling infrastructure Inaccurate evaluation Performance Capability Latency 52

How to do better than CCI? What to collect How to collect How to use the collected Hardware-based evaluation & sampling Slow sampling infrastructure Inaccurate evaluation Performance Capability Latency 53

PerfCnt-based Bug Isolation (PBI) Failure Predictors Program Binary Statistical Debugging Hardware Perf. Events Counter Overflow Interrupt Predicates & J/L Performance Capability Code Size Change Hardware? Good (<5% overhead) Good No Change NO! Production-Run Software Failure Diagnosis via Hardware Performance Counters, ASPLOS 13 54

Hardware Performance Counters Registers monitor hardware performance events 1 8 registers per core Each register can contain an event count Large collection of hardware events Instructions retired, TLB misses, cache misses, etc. Traditional usage Hardware testing/profiling How can this help diagnose software failures? 55

What to collect? Capability reflect the root causes of many concurrency bugs Performance An existing hardware performance event 56

Which event can reflect root causes? L1 data cache cache-coherence events It tracks which cache-coherence state (M/E/S/I) an instruction observes Modified Exclusive Shared Invalid Local read Local write Remote read Remote write 57

Is cache-coherence event useful? Thread 1 ptr = malloc(size); if (!ptr){ ReportOutofMem(); exit(1); } Thread 2 free(ptr); ptr=null; Mozilla 58

Example (correct runs) thread 1 (core 1) Modified Invalid ptr = malloc (SIZE); I: if (!ptr) { ReportOutofMem(); exit(1); } thread 2 (core 2) Modified Exclusive Invalid free (ptr); ptr=null; Predicate J L M I 01 0 E I 0 0 SI 0 0 II 0 0 J Concurrency Bug from Apache HTTP Server 59

Example (failure run) thread 1 (core 1) thread 2 (core 2) I: Modified Shared Invalid ptr = malloc (SIZE); if (!ptr) { ReportOutofMem(); exit(1); } Modified Shared Invalid free (ptr); ptr=null; Predicate J L M I 1 0 E I 0 0 SI 0 0 II 0 0 1 L Concurrency Bug from Apache HTTP Server 60

Useful for Atomicity Violations Bug Type WWR Violation RWR Violation RWW Violation WRW Violation FAILURE PREDICTOR INVALID INVALID INVALID SHARED 61

Useful for order violations Bug Type Read-too-early Read-too-late FAILURE PREDICTOR EXCLUSIVE (!INVALID) INVALID 62

How to evaluate & sample? Which performance events occur at a specific instruction? 63

Accessing performance counters INTERRUPT-BASED User POLLING-BASED User Config PC, e Read Count Kernel Kernel Config Interrupt Read Count HW (PMU) HW (PMU) 64

More details of counter access perf record event=<code> -c <sampling_rate> <program monitored> Log Id APP Core Performance Event 1 Httpd 2 0x140 (Invalid) Instruction 401c3b Function decrement _refcnt 65

Beyond concurrency bugs Which event? Branch taken/non-taken event How to evaluate & sample? Performance counter overflow interrupt 66

PBI vs. CBI/CCI (Qualitative) Performance Sample in this region? Sample in this region? Are other threads sampling? CBI Are other threads sampling? CCI PBI Diagnostic capability Discontinuous monitoring (CCI/CBI) Continuous monitoring (PBI) PBI differentiates interleaving reads from writes 67

Evaluation methodology Program Apache-1 Apache-2 Cherokee FFT LU Mozilla-JS-1 Mozilla-JS-2 Mozilla-JS-3 MySQL-1 MySQL-2 PBZIP2 CCI-Prev top1 top1 top1 top1 top1 top2 top1 1/100 sampling rate, ~1000 runs in total (failure:success~1:1) 68

Diagnosis capability (w/ sampling) Program CCI-Prev Apache-1 top1 Apache-2 top1 Cherokee FFT top1 LU top1 Mozilla-JS-1 Mozilla-JS-2 top1 Mozilla-JS-3 top2 MySQL-1 - MySQL-2 - PBZIP2 top1 69

Diagnosis capability (w/ sampling) Program CCI-Prev PBI Apache-1 top1 top1 Apache-2 top1 top1 Cherokee top1 FFT top1 top1 LU top1 top1 Mozilla-JS-1 top1 Mozilla-JS-2 top1 top1 Mozilla-JS-3 top2 top1 MySQL-1 - top1 MySQL-2 - top1 PBZIP2 top1 top1 70

Diagnosis capability (w/ sampling) Program CCI-Prev PBI Apache-1 top1 top1-i Apache-2 top1 top1-i Cherokee top1-i FFT top1 top1-e LU top1 top1-e Mozilla-JS-1 top1-i Mozilla-JS-2 top1 top1-i Mozilla-JS-3 top2 top1-i MySQL-1 - top1-s MySQL-2 - top1-s PBZIP2 top1 top1-i 71

Diagnosis performance (overhead) Program CCI-Prev PBI Apache-1 1.90% 0.40% Apache-2 0.40% 0.40% Cherokee 0.00% 0.50% FFT 121% 1.00% LU 285% 0.80% Mozilla-JS-1 800% 1.50% Mozilla-JS-2 432% 1.20% Mozilla-JS-3 969% 0.60% MySQL-1-3.80% MySQL-2-1.20% PBZIP2 1.40% 8.40% Sequential-bug failure diagnosis results are also good! 72

Are we done? Diagnostic Latency Performance PBI LXR CCI bug detector coredump replay 1/100 sampling rate ~100 failures required for diagnosis 73

How to do better than PBI? What to collect How to collect How to use the collected Sampling Missing failure-related information High overhead L Performance Capability Latency How to collect sufficient root-cause information in 1 run w/ small overhead? 74

How to do better than PBI? What to collect How to collect How to use the collected Biased sampling Missing failure-related information High overhead L Performance Capability Latency Collect information @ likely root-cause locations 75

LXR Last execution Record What to collect? Last few branches right before failure Last few cache-coherence events right before failures How to collect/maintain LXR? Existing* hardware support! L Performance Capability Code Size Change Hardware? Diagnosis Latency Good (<5% overhead) Good Little Change Simple Extension* Short Leveraging the Short-Term Memory of Hardware to Diagnose Production-Run Software Failures, ASPLOS 14 76

Last Branch Record (LBR) Existing hardware feature Store recently taken branches Circular buffer with 16 entries (Intel Nehalem) Negligible overhead Branch Source Instruction Pointer Branch Target Instruction Pointer Good performance 77

Last Cache-coherence Record (LCR) Existing hardware feature Configurable cache-coherence event counting Extension Buffer to collect this information Set of recent L1 data cache access instructions Negligible overhead (estimated) Cache-access Instruction Pointer Cache-coherence State (M/E/S/I) Good performance 78

Is LXR useful? Thread 1 Thread 2 Thread 1 Thread 2 ptr = malloc(size); if (!ptr){ ReportOutofMem(); exit(1); } free(ptr); ptr=null; print( %u, End); print( %u, End-Start); End=time(); Apache FFT Bugs have short error-propagation distance LXR is sufficient for failure diagnosis Good diagnosis capability ConSeq: Detecting Concurrency Bugs through Sequential Errors, ASPLOS 11 79

LXR vs PBI vs CBI/CCI Performance Capability Diagnosis Latency (#-failure-runs) LXR <5% 23/31 1~10 failures PBI <5% 25/31 1000 failures CBI/C CI 3% ~ 969% 18/31 1000 failures 80

Outline Latency PBI CCI Performance LXR 81

Conclusions & Future Work Constraints/Requirements Techniques Bugs 82

Thanks! Questions? My collaborators Prof. Tom Reps Prof. Ben Liblit Prof. Michael Swift Prof. Karthikeyan Sankaralingam Prof. Darko Marinov My students Wei Zhang (IBM Research) Guoliang Jin (N. Carolina State Univ.) Linhai Song Joy Arulraj Po-chun Chang 83