Phosphor: Illuminating Dynamic. Data Flow in Commodity JVMs

Similar documents
Chronicler: Lightweight Recording to Reproduce Field Failures

Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs

Phosphor: Illuminating Dynamic Data Flow in the JVM

Research Topics in SE & Distributed Systems

Uranine: Real-time Privacy Leakage Monitoring without System Modification for Android

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Towards Parallel, Scalable VM Services

Trace-based JIT Compilation

Identifying the Sources of Cache Misses in Java Programs Without Relying on Hardware Counters. Hiroshi Inoue and Toshio Nakatani IBM Research - Tokyo

An Empirical Study on Deoptimization in the Graal Compiler

Dynamic Taint Tracking for Java with Phosphor (Demo)

IBM Research - Tokyo 数理 計算科学特論 C プログラミング言語処理系の最先端実装技術. Trace Compilation IBM Corporation

How s the Parallel Computing Revolution Going? Towards Parallel, Scalable VM Services

String Deduplication for Java-based Middleware in Virtualized Environments

Probabilistic Calling Context

Lu Fang, University of California, Irvine Liang Dou, East China Normal University Harry Xu, University of California, Irvine

Dynamic Vertical Memory Scalability for OpenJDK Cloud Applications

The Potentials and Challenges of Trace Compilation:

Deriving Code Coverage Information from Profiling Data Recorded for a Trace-based Just-in-time Compiler

JIT Compilation Policy for Modern Machines

Introduction to Programming Using Java (98-388)

Compilation Queuing and Graph Caching for Dynamic Compilers

Efficient Runtime Tracking of Allocation Sites in Java

Chapter 1 GETTING STARTED. SYS-ED/ Computer Education Techniques, Inc.

CGO:U:Auto-tuning the HotSpot JVM

Java: framework overview and in-the-small features

6.858 Quiz 2 Review. Android Security. Haogang Chen Nov 24, 2014

Continuously Measuring Critical Section Pressure with the Free-Lunch Profiler

Contents. Figures. Tables. Examples. Foreword. Preface. 1 Basics of Java Programming 1. xix. xxi. xxiii. xxvii. xxix

INDEX. A SIMPLE JAVA PROGRAM Class Declaration The Main Line. The Line Contains Three Keywords The Output Line

A Dynamic Evaluation of the Precision of Static Heap Abstractions

Towards Garbage Collection Modeling

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Comparison of Garbage Collectors in Java Programming Language

Efficient and Viable Handling of Large Object Traces

A- Core Java Audience Prerequisites Approach Objectives 1. Introduction

Albis: High-Performance File Format for Big Data Systems

Java Primer 1: Types, Classes and Operators

Question. 1 Features of Java.

CMSC 430 Introduction to Compilers. Fall Language Virtual Machines

Object Oriented Programming: In this course we began an introduction to programming from an object-oriented approach.

Lecture Notes: Unleashing MAYHEM on Binary Code

A program execution is memory safe so long as memory access errors never occur:

Special Topics: Programming Languages

Programming Language Basics

Toward Efficient Strong Memory Model Support for the Java Platform via Hybrid Synchronization

Lecture 08. Android Permissions Demystified. Adrienne Porter Felt, Erika Chin, Steve Hanna, Dawn Song, David Wagner. Operating Systems Practical

WA1278 Introduction to Java Using Eclipse

Unit Test Virtualization with VMVM

Dynamic Analysis of Inefficiently-Used Containers

Large-Scale API Protocol Mining for Automated Bug Detection

Assessing the Scalability of Garbage Collectors on Many Cores

A Black-box Approach to Understanding Concurrency in DaCapo

Lightweight Data Race Detection for Production Runs

Continuous Object Access Profiling and Optimizations to Overcome the Memory Wall and Bloat

2 Lecture Embedded System Security A.-R. Darmstadt, Android Security Extensions

Introduction to Visual Basic and Visual C++ Introduction to Java. JDK Editions. Overview. Lesson 13. Overview

Java language. Part 1. Java fundamentals. Yevhen Berkunskyi, NUoS

Chapter 1 INTRODUCTION SYS-ED/ COMPUTER EDUCATION TECHNIQUES, INC.

Argos Emulator. Georgios Portokalidis Asia Slowinska Herbert Bos Vrije Universiteit Amsterdam

Weiss Chapter 1 terminology (parenthesized numbers are page numbers)

Advanced programming for Java platform. Introduction

Application Development in JAVA. Data Types, Variable, Comments & Operators. Part I: Core Java (J2SE) Getting Started

Adaptive Optimization using Hardware Performance Monitors. Master Thesis by Mathias Payer

On The Limits of Modeling Generational Garbage Collector Performance

DiSL: A Domain-Specific Language for Bytecode Instrumentation

Practical Malware Analysis

The Sun s Java Certification and its Possible Role in the Joint Teaching Material

Compiling Techniques

Program Calling Context. Motivation

AP COMPUTER SCIENCE JAVA CONCEPTS IV: RESERVED WORDS

Uncovering Performance Problems in Java Applications with Reference Propagation Profiling

Android Debugging ART

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Cross-Layer Memory Management for Managed Language Applications

SYLLABUS JAVA COURSE DETAILS. DURATION: 60 Hours. With Live Hands-on Sessions J P I N F O T E C H

Performance Potential of Optimization Phase Selection During Dynamic JIT Compilation

1. What are the key components of Android Architecture? 2. What are the advantages of having an emulator within the Android environment?

The NetRexx Interpreter

Jazz: A Tool for Demand-Driven Structural Testing

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

Practical DIFC Enforcement on Android

Java Overview An introduction to the Java Programming Language

Introduction. Assessment Test. Part I The Programmer s Exam 1

VIProf: A Vertically Integrated Full-System Profiler

A Quick Tour p. 1 Getting Started p. 1 Variables p. 3 Comments in Code p. 6 Named Constants p. 6 Unicode Characters p. 8 Flow of Control p.

Detecting and Fixing Memory-Related Performance Problems in Managed Languages

COMP Summer 2015 (A01) Jim (James) Young jimyoung.ca

Debugging With Behavioral Watchpoints

Java and C II. CSE 351 Spring Instructor: Ruth Anderson

Kazunori Ogata, Dai Mikurube, Kiyokuni Kawachiya and Tamiya Onodera

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Applications. Cloud. See voting example (DC Internet voting pilot) Select * from userinfo WHERE id = %%% (variable)

Agenda. CSE P 501 Compilers. Java Implementation Overview. JVM Architecture. JVM Runtime Data Areas (1) JVM Data Types. CSE P 501 Su04 T-1

Advanced Computer Architecture

Computer Components. Software{ User Programs. Operating System. Hardware

Software Development & Education Center. Java Platform, Standard Edition 7 (JSE 7)

A Platform Independent Improved GUI for the PeANUt Computer Simulator AVINASH G PRASAD Supervisor: Dr Peter Strazdins

Software Protection via Obfuscation

Transcription:

Phosphor: Illuminating Dynamic Fork me on Github Data Flow in Commodity JVMs Jonathan Bell and Gail Kaiser Columbia University, New York, NY USA

Dynamic Data Flow Analysis: Taint Tracking Output that is derived from tainted input Inputs Application Outputs Flagged ( Tainted ) Input

Taint Tracking: Applications End-user privacy testing: Does this application send my personal data to remote servers? SQL injection attack avoidance: Do SQL queries contain raw user input Debugging: Which inputs are relevant to the current (crashed) application state?

Qualities of a Successful Analysis

Soundness No data leakage!

Precision Data is tracked with the right tag

Performance Minimal slowdowns

Portability No special hardware, OS, or JVM

Normal Taint Tracking Associate tags with data, then propagate the tags Approaches: Operating System modifications [Vandebogart 07], [Zeldovich 06] Language interpreter modifications [Chandra 07], [Enck 10], [Nair 07], [Son 13] Source code modifications [Lam 06], [Xu 06] Not portable Binary instrumentation of applications [Clause 07], [Cheng 06], [Kemerlis 12] Hard to be sound, precise, and performant

Phosphor Leverages benefits of interpreter-based approaches (information about variables) but fully portably Instruments all byte code that runs in the JVM (including the JRE API) to track taint tags Add a variable for each variable Adds propagation logic

Key contribution: How do we efficiently store meta-data for every variable without modifying the JVM itself?

JVM Type Organization Primitive Types int, long, char, byte, etc. Reference Types Arrays, instances of classes All reference types are assignable to java.lang.object

Phosphor s taint tag storage Local variable Method argument Return value Operand stack Field Object Stored as a field of the object Object array Stored as a field of each object Primitive Shadow variable Shadow argument "Boxed" Below the value on stack Shadow field Primitive array Shadow array variable Shadow array argument "Boxed" Array below value on stack Shadow array field

Taint Propagation Modify all byte code instructions to be taint-aware by adding extra instructions Examples: Arithmetic -> combine tags of inputs Load variable to stack -> Also load taint tag to stack Modify method calls to pass taint tags

Two big problems

Challenge 1: Type Mayhem Primitive Types Always has extra variable! java.lang.object instanceof instanceof Sometimes has extra variable! Never has extra variable! Instances of classes (Objects) Primitive Arrays Always has extra variable

Challenge 1: Type Mayhem byte[] array = new byte[5]; Object ret = array; return ret; int[] array_tag = new int[5]; byte[] array = new byte[5]; Object ret = new TaintedByteArray(array_tag,array); Solution 1: Box taint tag with array when we lose type information

Challenge 2: Native Code We can t instrument everything!

Challenge 2: Native Code public int hashcode() { return super.hashcode() * field.hashcode(); } public native int hashcode(); Solution: Wrappers. Rename every method, and leave a wrapper behind

Challenge 2: Native Code public int hashcode() { return super.hashcode() * field.hashcode(); } public native int hashcode(); Solution: Wrappers. Rename every method, and leave a wrapper behind public TaintedInt hashcode$$wrapper() { return new TaintedInt(0, hashcode()); }

Challenge 2: Native Code Wrappers work both ways: native code can still call a method with the old signature public int[] somemethod(byte in)

Challenge 2: Native Code Wrappers work both ways: native code can still call a method with the old signature public int[] somemethod(byte in) public TaintedIntArray somemethod$$wrapper(int in_tag, byte in) { //The original method "somemethod", but with taint tracking }

Challenge 2: Native Code Wrappers work both ways: native code can still call a method with the old signature public int[] somemethod(byte in) { return somemethod$$wrapper(0, in).val; } public TaintedIntArray somemethod$$wrapper(int in_tag, byte in) { //The original method "somemethod", but with taint tracking }

Challenge 2: Native Code Wrappers work both ways: native code can still call a method with the old signature public int[] somemethod(byte in) { return somemethod$$wrapper(0, in).val; } public TaintedIntArray somemethod$$wrapper(int in_tag, byte in) { //The original method "somemethod", but with taint tracking }

Design Limitations Tracking through native code Return value s tag becomes combination of all parameters (heuristic); not found to be a problem in our evaluation Tracks explicit data flow only (not through control flow)

Evaluation Soundness & Precision Performance Portability

Soundness & Precision DroidBench - series of unit tests for Java taint tracking Passed all except for implicit flows (intended behavior)

Performance Macrobenchmarks (DaCapo, Scalabench) Microbenchmarks Versus TaintDroid [Enck, 2010] on CaffeineMark Versus Trishul [Nair, 2008] on JavaGrande

Macrobenchmarks Phosphor Relative Runtime Overhead (Hotspot 7) scalabench dacapo avrora batik eclipse fop h2 jython luindex lusearch pmd sunflow tomcat tradebeans tradesoap xalan actors apparat factorie kiama scaladoc scalap scalariform scalatest scalaxb specs tmt Average: 53.3% 0% 50% 100% 150% 200% 250% Relative Runtime Overhead

Macrobenchmarks Phosphor Relative Memory Overhead (Hotspot 7) scalabench dacapo avrora batik eclipse fop h2 jython luindex lusearch pmd sunflow tomcat tradebeans tradesoap xalan actors apparat factorie kiama scaladoc scalap scalariform scalatest scalaxb specs tmt Average: 270.9% 0% 100% 200% 300% 400% 500% 600% 700% 800% 900% Relative Memory Overhead

Microbenchmarks Phosphor (Hotspot 7) and Trishul Relative Overhead Arithmetic Assign Cast Create Exception Loop Phoshpor Trishul Math Method Serial 0% 20% 40% 60% 80% 100% 120% Relative Runtime Overhead

Microbenchmarks Phosphor (Hotspot 7) and Trishul Relative Overhead Arithmetic Assign Cast Create Exception Loop Math Phoshpor Phoshpor Trishul Trishul Kaffe Method Serial 0% 20% 40% 60% 80% 100% 120% Relative Runtime Overhead

Microbenchmarks: Taintdroid Taintdroid: Taint tracking for Android s Dalvik VM [Enck, 2010] Not very precise: one tag per array (not per array element!) Applied Phosphor to Android!

Microbenchmarks Phosphor and Taintdroid Relative Overhead String Buffer Sieve Method Taintdroid 4.3 Phosphor (Android 4.3) Loop Array-heavy benchmarks Logic Float 0% 50% 100% 150% 200% Relative Runtime Overhead

Portability JVM Version(s) Success? Oracle (Hotspot) 1.7.0_45, 1.8.0_0 Yes OpenJDK 1.7.0_45, 1.8.0_0 Yes Android Dalvik 4.3.1 Yes Apache Harmony 6.0M3 Yes Kaffe VM 1.1.9 Yes Jikes RVM 3.1.3 No, but may be possible with more work

Future Work & Extension This is a general approach for tracking metadata with variables in unmodified JVMs Could track any sort of data in principle We have already extended this approach to track path constraints on inputs

* Evaluated * OOPSLA * Artifact * AEC Fork me on Github Phosphor: Illuminating Dynamic Data Flow in Commodity JVMs Jonathan Bell and Gail Kaiser Columbia University jbell@cs.columbia.edu @_jon_bell_ https://github.com/programming-systems-lab/phosphor Consistent * Complete * Well Documented * Easy to Reuse *