Blended Program Analysis for Improving Reliability of Real-world Applications

Similar documents
Empirical study of the dynamic behavior of JavaScript objects

Exploring the Impact of Context Sensitivity on Blended Analysis

Understanding Performance in Large-scale Framework-based Systems

Helping Programmers Debug Code Using Semantic Change Impact Analysis PROLANGS

Survey of Existing CI Systems

JAVA PERFORMANCE. PR SW2 S18 Dr. Prähofer DI Leopoldseder

Go with the Flow: Profiling Copies to Find Run-time Bloat

Active Testing for Concurrent Programs

Points-to Analysis for Java Using Annotated Constraints*

Large-Scale API Protocol Mining for Automated Bug Detection

Dimensions of Precision in Reference Analysis of Object-oriented Programming Languages. Outline

Method-Level Phase Behavior in Java Workloads

On the Change in Archivability of Websites Over Time

Why Aren t HTTP-only Cookies More Widely Deployed?

Trace-based JIT Compilation

Agent-Enabling Transformation of E-Commerce Portals with Web Services

Department of Electrical Engineering and Computer Science MASSACHUSETTS INSTITUTE OF TECHNOLOGY Fall Quiz I

Towards Improving the Reliability of JavaScript-based Web 2.0 Applications

HCW WIMWAT Technical Report 1, May Identifying Web Widgets. Technical Reports. Alex Q. Chen and Simon Harper Human Centred Web Lab

Run-time characteristics of JavaScript

P17 System Testing Monday, September 24, 2007

Frameworks & Security

Flow-sensitive rewritings and Inliner improvements for the Graal JIT compiler

OOPLs - call graph construction. Example executed calls

An Empirical Study of PHP Feature Usage: A Static Analysis Perspective

Lecture 1 Introduction to Android. App Development for Mobile Devices. App Development for Mobile Devices. Announcement.

20480C: Programming in HTML5 with JavaScript and CSS3. Course Code: 20480C; Duration: 5 days; Instructor-led. JavaScript code.

CSCE 548 Building Secure Software Software Analysis Basics

COURSE OUTLINE MOC 20480: PROGRAMMING IN HTML5 WITH JAVASCRIPT AND CSS3

ISSTA : Software Testing and Analysis TAJ: Effective Taint Analysis of Web Applications, PLDI 2009

Welcome to the class of Web Information Retrieval. Min ZHANG May 11, 2012

Static analysis of PHP applications

Andromeda: XSS Accurate and Scalable Security Analysis of Web Applications. OWASP* Top Ten Security Vulnerabilities. SQL Injection.

Enterprise Architect. User Guide Series. Profiling

Enterprise Architect. User Guide Series. Profiling. Author: Sparx Systems. Date: 10/05/2018. Version: 1.0 CREATED WITH

Jazz: A Tool for Demand-Driven Structural Testing

Binding and Storage. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Frameworks & Security How web applications frameworks kill your security scanning

SERVICE-ORIENTED COMPUTING

Microsoft SAGE and LLVM KLEE. Julian Cohen Manual and Automatic Program Analysis

Glacier: A Garbage Collection Simulation System

Paytm Programming Sample paper: 1) A copy constructor is called. a. when an object is returned by value

Third-party Identity Management Usage on the Web

JavaScript Errors in the Wild: An Empirical Study

The UBot Studio SCRIPT REFERENCE. The Qualifier Functions

What do Compilers Produce?

Trusted Types - W3C TPAC

Active Endpoints. ActiveVOS Platform Architecture Active Endpoints

An Eclipse Plug-in for Model Checking

junit RV Adding Runtime Verification to junit

Performance Optimization for Informatica Data Services ( Hotfix 3)

KLEE: Effective Testing of Systems Programs Cristian Cadar

Safety Checks and Semantic Understanding via Program Analysis Techniques

WHITESTEIN. Agents in a J2EE World. Technologies. Stefan Brantschen. All rights reserved.

Hybrid DOM-Sensitive Change Impact Analysis for JavaScript

CURIOUS BROWSERS: Automated Gathering of Implicit Interest Indicators by an Instrumented Browser

Grade Weights. Language Design and Overview of COOL. CS143 Lecture 2. Programming Language Economics 101. Lecture Outline

Run-time Environments

Interprocedural Type Specialization of JavaScript Programs Without Type Analysis

A Trace-based Java JIT Compiler Retrofitted from a Method-based Compiler

Run-time Environments

Securing Software Applications Using Dynamic Dataflow Analysis. OWASP June 16, The OWASP Foundation

Formats of Translated Programs

JSAI: A STATIC ANALYSIS PLATFORM FOR JAVASCRIPT

CS 162 Operating Systems and Systems Programming Professor: Anthony D. Joseph Spring Lecture 22: Remote Procedure Call (RPC)

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

eclipse rich ajax platform (rap)

The Potentials and Challenges of Trace Compilation:

Advanced Program Analyses for Object-oriented Systems

J2EE Application Development : Conversion and Beyond Osmond Ng

MGDI, LLC. Domain Name -

Ripple: Reflection Analysis for Android Apps in Incomplete Information Environments

Profile-Guided Program Simplification for Effective Testing and Analysis

9/21/17. Outline. Expression Evaluation and Control Flow. Arithmetic Expressions. Operators. Operators. Notation & Placement

Dynamic Metrics for Java

Call Paths for Pin Tools

Practical Keystroke Timing Attacks in Sandboxed JavaScript

welcome to BOILERCAMP HOW TO WEB DEV

Don't Call Us, We'll Call You:

TUTORIAL: WHITE PAPER. VERITAS Indepth for the J2EE Platform PERFORMANCE MANAGEMENT FOR J2EE APPLICATIONS

Weeks 6&7: Procedures and Parameter Passing

CA Compiler Construction

Web Serving Architectures

Course 20480: Programming in HTML5 with JavaScript and CSS3

Chapter 14 Performance and Processor Design

Introduction to Java

COURSE 20480B: PROGRAMMING IN HTML5 WITH JAVASCRIPT AND CSS3

EQUIVALENCE PARTITIONING AS A BASIS FOR DYNAMIC CONDITIONAL INVARIANT DETECTION

Finding Vulnerabilities in Web Applications

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN. Chapter 1. Introduction

Top Ten Enterprise Java performance problems. Vincent Partington Xebia

A Trace-Scaling Agent for Parallel Application Tracing 1

Investigating Source Code Reusability for Android and Blackberry Applications

Programming in HTML5 with JavaScript and CSS3

Programming in HTML5 with JavaScript and CSS3

An Object Oriented Runtime Complexity Metric based on Iterative Decision Points

Dynamic Race Detection with LLVM Compiler

1Z Oracle. Java Platform Enterprise Edition 6 Enterprise JavaBeans Developer Certified Expert

How We Refactor, and How We Know It

Active Testing for Concurrent Programs

Transcription:

Blended Program Analysis for Improving Reliability of Real-world Applications Dr. Barbara G. Ryder J. Byron Maupin Professor of Engineering Virginia Tech Collaborators: Bruno Dufour (Rutgers), Gary Sevitsky (IBM Research), Marc Fisher II (VT), Ben Wiedermann (VT), Shiyi Wei (VT), S. Basu (ug-lafayette), Luke Marrs (ug-vt); Funded by IBM Open Collaborative Research Program and NSF 08-0811518 1

Outline What is program analysis? Challenge of modern software applications What is blended program analysis? Examples of blended analysis Performance diagnosis for framework-based applications (Java) Data integrity & program understanding for webpage codes (JavaScript) Summary 2

What is Program Analysis? Historically, Static program analysis was used for code optimization during compilation Gave a safe approximation of all possible program behaviors, without running the program Allows checking of specific program properties Dynamic program analysis was used for tracking program behavior at runtime Gathered information about program on one execution 3

What is Program Analysis? Static Analysis Input: code Output: model of semantics of program When: At compile time Cost: High Goal: code optimization, property validation, program understanding, test case generation... Dynamic Analysis Input: trace of execution(s) Output: an observed property When? At runtime Cost: Low Goal: Debugging, uncovering code dependences,test coverage... 4

Challenge: Tools for Modern SW Applications Framework-based software systems (Java) Transactional, complex, many libraries/ components E.g., personal financial managers, e-commerce applications, information records managers Performance problems difficult to isolate Need understanding of business logic as well as run-time behavior 5

Dynamic Language Constructs Java Reflection values from run-time environment influence program behavior (e.g., in dynamic class loading) //read name of class to be dynamically loaded //from command line* Class a = Class.forName(args[0]); Method mainmethod = findmain(a); mainmethod.invoke( ); *http://media.techtarget.com/tss/static/articles/content/dm_classforname/ DynLoad.pdf 6

Challenge: Tools for Modern SW Applications Websites (JavaScript) Constructed from combination of static and dynamically loaded/generated code Code can be constructed at runtime and then executed Program understanding (for maintenance), testing, and validating data integrity are hard 7

Dynamic language constructs JavaScript Ability to construct a URL at runtime and then load that webpage Functions with variable numbers of arguments Ability to write code at runtime and execute it with eval statement //xmlhttp is an instance of XMLHttpRequest eval(xmlhttp.response()); 8

What is Blended Program Analysis? ISSTA07, FSE08, ICSM10, CS@VT TR-12-18(2012) A way to integrate dynamic and static analyses to obtain scalability and precision for specific problems Use of a dynamic program representation Reflects call structure of execution(s) Use of light-weight dynamic information To prune (some) infeasible paths To tie static analysis information to actual runtime objects To deal with dynamic program constructs 9

Outline Blended analysis for performance diagnosis in framework-intensive applications (Java) 10

Framework-based Applications App Middleware Libraries and Frameworks Application is an iceberg Bulk of the code in libraries and frameworks Genre not commonly addressed by research community E.g., financial planning services, e- commerce sites, online reservation systems, Tomcat-based systems software Programs are not just large, but are more complex in interactions between frameworks Performance problems span multiple layers 11

Framework-based Applications App Middleware Libraries and Frameworks Software characteristics Not amenable to static analyses Not scalable -- too complex Not amenable to dynamic analysis Too intrusive into execution for production codes Application s main function often is data transformation Goal: design analyses for performance diagnosis of these systems 12

Goal: Find Object Churn Identify execution contexts with excessive use of temporaries Based on total number of instances Not the same as finding often-executed allocation sites Need to identify temporary objects and to approximate object lifetime Elimination strategies Optimize the frequent use of frameworks and libraries together Introduce caching for temporary data structures Code specialization 13

Current Practice: Jinsight Trace of HoldingDataBean_Ser.serialize() Tens of thousands of calls How to find churn locality? 14

Blended Analysis - Scalability 100000 10000 Calling contexts 1000 100 10 Looking at the entire trace 1473 348 17 18267 8089 3919 2223 2 orders of magnitude! 25012 5848 Approximating contexts that use temporaries 322 373 71 Identifying contexts that truly use temporaries getholdings() 1 Trade6: Dct-Std Dct-WS EJB-Std EJB-WS 15

Blended Analysis Paradigm Java Application Profile Loaded Classes Dynamic Calling Structure Models of methods Static Analysis Reflection Specification + Templates 16

Entry Pruning Code in Methods (FSE 08) Allocated types: {B} Observed targets: {D.m} x = new B() w = new A() y = D.m() z = C.m() Exit 17

Escape Analysis, by Example zag() foo() A B bar() baz() C D Captured Arg-escaping Globally escaping G F E F C E D E void bar() { a = new A(); a.x = new B(); Disposition } A C baz() { c = new B C(); c.y = new D(); c.z = C new E(); return D c; } E void foo(f f) { c = baz(); F f.w = c.z; } G void zag() { F f = new F(); foo(f); G.global = f; } 18

Invocation tree for hasconnectaccess! SecurityDescriptor geteffectiveaccess 60 captured, 40 new SecurityContext issidpresent 80 captured Each invoke of hasconnectaccess has a unique SecurityDescriptor for an ID instance; could cache w/i ID; LittleEndianUtil readint 40 captured, 40 new SecurityServer hasconnectaccess 180 captured GCDObject getattribute 20 captured, 20 new AccessControlList deserialize 80 new SecurityDescriptor loadfrombytes 100 captured, 40 new SecurityDescriptor deserialize 60 new AccessControlEntry deserialize 120 new Optimized trace; only shows allocating and capturing methods 1000 instances created but 880 not escaping LittleEndianUtil readshort 40 captured, 20 new LittleEndianUtil readintfrombytes 40 captured LittleEndianUtil readint 80 captured, 80 new PermissionSource getinstancefromint 40 captured, 40 new AceAccess getinstancefromint 40 captured, 40 new LittleEndianUtil readshort 80 captured, 40 new LittleEndianUtil readintfrombytes 80 captured 19

Metrics Designed new metrics for blended escape analysis Measure effectiveness of pruning Scalability of analysis % of blocks in methods pruned 20

Scalability Analysis time (h:m:s) Benchmark No pruning Pruned Speedup %Pruned Direct-Std 00:00:18 00:00:17 1.1 45% Direct-WS 01:34:01 00:04:41 20.0 40% EJB-Std 00:04:24 00:01:46 2.5 43% EJB-WS N/A 29:23:16 N/A 43% Eclipse 24:37:12 06:37:22 3.7 29% Jazz 02:49:55 00:39:06 4.3 27% IBM Appln 00:04:35 00:02:05 2.2 39% 21

Metrics Measure usage of temporaries Disposition- categorizes instances as globally: escaping, captured, mixed 22

Disposition of Instances 100% Captured Mixed Escaping % of dynamic object instances 75% 50% 74.3% 51.6% 56.9% 25% 33.9% 32.3% 42.1% 26.3% 0% Most instances exhibit only 1 behavior ~50% instances captured IBM App Direct/Std Direct/WS EJB/Std EJB/WS Eclipse Jazz CDMS 23

Metrics Measure usage of temporaries Concentration- measures locality of temporary usage 24

Concentration of Captured Instances 100% 5% 10% 20% % of captured instances 75% 50% 25% 0% 29.0% Direct/ Std 54.3% Direct/ WS 88.4% 62.2% 66.8% 51.2% 42.0% EJB/Std EJB/WS Eclipse Jazz IBM CDMS App 25

Outline Blended analysis for capturing effects of dynamically generated or dynamically loaded code (JavaScript) 26

JavaScript Website Glue Code JavaScript is lingua franca of clientside applications 98 out of 100 most popular websites use JavaScript (Guarnieri et al, ISSTA11) Use of dynamic features is evident in websites (Richards et al, ECOOP11, PLDI10; Zorn et al, WebApps10) 27

Blended Analysis Paradigm for JavaScript Application Profile Pruned models of methods Dynamic Calling Structure Static Analysis Gather dynamically generated/loaded code 28

How to Profile a Website for Analysis? Run several executions of the website, to explore its behaviors Each execution is comprised by a set of page traces Gather all page traces for the same webpage together and consider them a single JavaScript program for analysis 29

JavaScript Analysis Framework ` Execution collector Trace selector Dynamic analysis Static analyzer Solution integrator JS Program Solution Static analysis 30

Tainted Input Analysis Integrity violation allowing user input to reach a sensitive operation Tainted Source: user has control of its value Sensitive operation: can affect behavior of website or browser Source-sink pair reported if there is a possible dataflow between them (ignoring sanitizers) Hyp: blended tainted input analysis will report fewer false alarms and more true positives than pure static tainted input analysis 31

Benchmarks1 Website Page count Trace count bing.com 19 30 twitter.com 14 30 linkedin.com 12 30 qq.com 18 30 wordpress.com 23 30 sina.com.cn 16 30 163.com 22 30 cnn.com 18 30 msn.com 18 30 conduit.com 10 16 imdb.com 10 18 myspace.com 10 24 sohu.com 10 19 xing.com 10 18 xunlei.com 10 22 zedo.com 10 16 washingtonpost.com 10 27 pconline.com.cn 10 21 Average 14 25 32

Blended vs Pure Static Analysis Pure static solution Blended solution Part of static solution not found by blended Part of solution from dynamically loaded or generated code Shaded area is part of solution found by pure static and blended 33

a b c Tainted Input Results Website Pure Static Pure Static Blended Blended true soln false alarm true soln false alarm live.com 1 youtube.com 1 1 myspace.com 1 false negatives sohu.com 2 1 2 xunlei.com 3 false positives 3 msn.com 1 bing.com 1 totals 6 2 9 34

Statement-level Side-effects (ST-MOD) For Program Understanding Want to know the number of objects whose f field value may be changed at statement: x.f= Helpful in navigating unfamiliar code How solve? Find which objects o that x can point to Find which objects o.f can point to 35

Website Benchmarks2 Page count Trace count eval page variadic function google.com 203 2104 52 177 facebook.com 138 1098 23 65 youtube.com 122 579 19 29 yahoo.com 52 265 21 13 baidu.com 49 147 6 16 wikipedia.org 67 130 0 3 live.com 54 226 10 44 blogger.com 24 146 6 7 totals 709 4695 137 354 36

Comparison: Pure Static to Blended Solutions (points-to) a b c % Coverage of pure static solution %Additional results Website google.com 89.7 5.9 facebook.com 85.3 7.5 youtube.com 89.1 9.9 yahoo.com 78 9.8 baidu.com 93 6.7 wikipedia.org 92.1 none live.com 81.8 7.5 blogger.com 83.8 1.4 mean 86.6 7.0 37

ST-MOD Solutions Websites Pure Static Blended Average number of objects in static code Average number of objects in static code Average number of objects in dynamic code google.com 5.8 2.4 2.1 facebook.com 7.7 4.1 3.6 youtube.com 5.9 3.5 2.4 yahoo.com 5.2 2.5 2.8 baidu.com 2.6 1.4 1.8 live.com 2.9 1.6 2.2 blogger.com 4.5 2.8 2.3 average 4.9 2.6 2.5 38

Summary Modern SW applications require new approaches to program analysis - the engine behind SW tools New blended analysis paradigm seems promising for handling the more dynamic programming constructs Performance diagnosis of framework-based apps Understanding of website codes for enhancement More experimentation and investigation needed to select best analysis for specific use 39

Challenges Expect more combinations of static and dynamic analyses for web & mobile apps Challenge: analyzing 3 rd party apps (executables) in combo with known codes Challenge: analyzing event-driven codes Challenge: increased use of explicit concurrency will require new tools and supporting analyses 40

Thank You Questions? 41