Lecture 10. Dynamic Analysis

Similar documents
Dynamic Binary Instrumentation: Introduction to Pin

Program Analysis. Program Analysis

Giridhar Ravipati University of Wisconsin, Madison. The Deconstruction of Dyninst: Part 1- the SymtabAPI

Java Instrumentation for Dynamic Analysis

Pattern Mining. Knowledge Discovery and Data Mining 1. Roman Kern KTI, TU Graz. Roman Kern (KTI, TU Graz) Pattern Mining / 42

Data Structure for Association Rule Mining: T-Trees and P-Trees

Improved Frequent Pattern Mining Algorithm with Indexing

FREQUENT PATTERN MINING IN BIG DATA USING MAVEN PLUGIN. School of Computing, SASTRA University, Thanjavur , India

Lecture Topic Projects 1 Intro, schedule, and logistics 2 Data Science components and tasks 3 Data types Project #1 out 4 Introduction to R,

Association Pattern Mining. Lijun Zhang

Static and dynamic analysis: synergy and duality

Tutorial on Association Rule Mining

Association Rule Mining. Introduction 46. Study core 46

ETP-Mine: An Efficient Method for Mining Transitional Patterns

Eraser: Dynamic Data Race Detection

Lecture 9 Dynamic Compilation

Chapter 4: Association analysis:

Association-Rules-Based Recommender System for Personalization in Adaptive Web-Based Applications

Foundations of Software Engineering

Chapter 7: Frequent Itemsets and Association Rules

Web page recommendation using a stochastic process model

Program Vulnerability Analysis Using DBI

ANU MLSS 2010: Data Mining. Part 2: Association rule mining

Data Access Paths for Frequent Itemsets Discovery

ASSESSING INVARIANT MINING TECHNIQUES FOR CLOUD-BASED UTILITY COMPUTING SYSTEMS

Appropriate Item Partition for Improving the Mining Performance

Chapter 4: Mining Frequent Patterns, Associations and Correlations

Learning from Executions

Trace-based JIT Compilation

Locating Faults Through Automated Predicate Switching

Program Profiling. Xiangyu Zhang

Instrumentation and Optimization of WIN32/Intel Executables

Symbolic Execution, Dynamic Analysis

An Efficient Algorithm for Finding the Support Count of Frequent 1-Itemsets in Frequent Pattern Mining

Static Analysis of Embedded C

Chapter 6: Association Rules

DMSA TECHNIQUE FOR FINDING SIGNIFICANT PATTERNS IN LARGE DATABASE

Improving Query Plans. CS157B Chris Pollett Mar. 21, 2005.

Combining Analyses, Combining Optimizations - Summary

Just-In-Time Compilation

0/38. Detecting Invariants. Andreas Zeller + Andreas Gross. Lehrstuhl Softwaretechnik Universität des Saarlandes, Saarbrücken

Association Rules Mining using BOINC based Enterprise Desktop Grid

Course Content. Outline of Lecture 10. Objectives of Lecture 10 DBMS & WWW. CMPUT 499: DBMS and WWW. Dr. Osmar R. Zaïane. University of Alberta 4

AMAχOS Abstract Machine for Xcerpt

Execution Tracing and Profiling

Market baskets Frequent itemsets FP growth. Data mining. Frequent itemset Association&decision rule mining. University of Szeged.

In this Lecture you will Learn: Testing in Software Development Process. What is Software Testing. Static Testing vs.

What do Compilers Produce?

Modern Information Retrieval

VIRTUAL REALITY BASED END-USER ASSESSMENT TOOL for Remote Product /System Testing & Support MECHANICAL & INDUSTRIAL ENGINEERING

Data Mining Algorithms

Debugging Reinvented: Asking and Answering Why and Why Not Questions about Program Behavior

BEAMJIT: An LLVM based just-in-time compiler for Erlang. Frej Drejhammar

Mining of Web Server Logs using Extended Apriori Algorithm

Welcome to Software Analysis and Testing.

Combined Static and Dynamic Automated Test Generation

Association mining rules

Summary: Open Questions:

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data

BEAMJIT, a Maze of Twisty Little Traces

Research in Middleware Systems For In-Situ Data Analytics and Instrument Data Analysis

Comparing the Performance of Frequent Itemsets Mining Algorithms

Performance Profiling

Roadmap. PCY Algorithm

Static Analysis of Embedded C Code

CMPUT 695 Fall 2004 Assignment 2 Xelopes

Usually, target code is semantically equivalent to source code, but not always!

Infrequent Weighted Itemset Mining Using SVM Classifier in Transaction Dataset

CSE 4/521 Introduction to Operating Systems

Efficient Algorithm for Frequent Itemset Generation in Big Data

Chapter 1, Introduction

An Algorithm for Frequent Pattern Mining Based On Apriori

Association Rule Mining: FP-Growth

Deep Web Content Mining

Parallelizing Frequent Itemset Mining with FP-Trees

CS A331 Programming Language Concepts

CSE544 Database Architecture

Analyzing Working of FP-Growth Algorithm for Frequent Pattern Mining

Chapter 6: Basic Concepts: Association Rules. Basic Concepts: Frequent Patterns. (absolute) support, or, support. (relative) support, s, is the

2 CONTENTS

Roadmap DB Sys. Design & Impl. Association rules - outline. Citations. Association rules - idea. Association rules - idea.

An Evolutionary Algorithm for Mining Association Rules Using Boolean Approach

Efficient Path Profiling

Principles of Program Analysis. Lecture 1 Harry Xu Spring 2013

Improving Test Suites via Operational Abstraction

Association rule mining

Introduction. CS 2210 Compiler Design Wonsun Ahn

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

PTclose: A novel algorithm for generation of closed frequent itemsets from dense and sparse datasets

Probabilistic Calling Context

CompSci 516 Data Intensive Computing Systems

Chapter 7: Frequent Itemsets and Association Rules

Lightweight Data Race Detection for Production Runs

Association Rule Learning

A Comparative Study of Association Mining Algorithms for Market Basket Analysis

Mining Rare Periodic-Frequent Patterns Using Multiple Minimum Supports

COMS 4721: Machine Learning for Data Science Lecture 23, 4/20/2017

History of Compilers The term

ISSN Vol.03,Issue.09 May-2014, Pages:

A Technique for Enabling and Supporting Debugging of Field Failures

Transcription:

Lecture 10. Dynamic Analysis Wei Le Thank Xiangyu Zhang, Michael Ernst, Tom Ball for Some of the Slides Iowa State University 2014.11

Outline What is dynamic analysis? Instrumentation Analysis Representing dynamic information Dynamic slicing Applications: specification generation and Diakon

What is dynamic analysis Learn from executions: Run programs where are the test inputs? what are the coverage? what parts of the code should be run? how to create runtime environment? Collect data online or offline (log analysis)? what is the overhead (memory and performance) how to capture and replay the data: compress, statistical approach Analyze data dynamic slicing model-based statistical and machine learning approaches (an interesting discussion on comparing the two: https://www.quora.com/what-is-thedifference-between-statistics-and-machine-learning)

What is the meaning of Run? Abstract interpretation and static analyses run a program over an abstract domain: OUT=F(IN,s) Dynamic analysis: abstraction used in parallel with, not in place of, concrete values: OUT=F(IN, si, v)

How dynamic analysis is used? Profiling-guided code optimization: which paths are frequently executed? Performance analysis Mining program invariants, protocols Fault localization, debugging (replay concurrency bugs) Bug finding...

Running the programs The test suite determines the expense (in time and space) The test suite determines the accuracy (what executions are never seen) Coverage or frequency: statements, branches, paths, procedure calls, types, method dispatch Values computed: parameters, array indices Run time, memory usage

Instrumentation A technique that inserts extra code into a program to collect runtime information.

Instrumentation A technique that inserts extra code into a program to collect runtime information.

Instrumentation A technique that inserts extra code into a program to collect runtime information.

Source and Binary Instrumentation Source instrumentation: instrument source programs Binary instrumentation: instrument executables directly: No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes

Binary Instrumentation Tool Is Dominant Libraries are a big pain for source code level instrumentation e.g., Proprietary libraries: communication (MPI, PVM), linear algebra (NGA), database query (SQL libraries). Easily handle multi-lingual programs: source code level instrumentation is heavily language dependent. More complicated semantics: Turning off compiler optimizations can maintain a almost perfect mapping from instructions to source code lines Worms and viruses are rarely provided with source code

Static Instrumentation

Dynamic Instrumentation

Instrumentation Tools Source instrumentation tools: AIMS, Paradyn, Pablo... [1] Binary instrumentation tools: Static (insert in the code and then run): ATOM, EEL, Etch, Morph Dynamic (run while insert the instrumentation): Dyninst, Vulcan, DTrace, Valgrind, Strata, DynamoRIO, PIN

Valgrind See Valgrind slides

PIN See PIN slides

Instrumentation: a Summary Source-level: Pattern-matching over parse tree or AST and rewriting Full access to source information A* [Ladd, Ramming], Astlog [Crew], Binary-level: ATOM [Srivastava], EEL [Larus], Diablo, Bluto Analyze programs from multiple languages Typicaly done at some IR level Dynamic instrumentation (run-time): Valgrind, PIN Java instrumentation tools: ASM (http://asm.objectweb.org/), Small and fast, Supports Java 1.5 BCEL (http://jakarta.apache.org/bcel/), Works well, No longer under active development SERP (http://serp.sourceforge.net/), Not designed for speed, Somewhat memory intensive Soot (http://www.sable.mcgill.ca/soot/), Supports source, Designed for optimizations

Design Your Instrumentation How much to generate? Everything Just the necessary facts Less than necessary On-line vs. off-line analysis What/When to instrument? Source code, IR, assembly, machine code Preprocessor, compile-time, link-time, executable, run-time

Control Flow Trace

Dynamic Dependence Graph

Whole Execution Trace

Dynamic Slicing See Xiangyu s slides

Specification Generation-Diakon Daikon infers invariants from a program trace Looks for invariants between each combination of variables Polynomial in the number of variables Easy to implement offline, first pass finds equal variables More complex to implement incrementally.

Specification Generation-Diakon

Specification Generation

Specification Generation

Data Analysis Techniques Data mining (mining database) fields, related to information retrieval, text mining Associate rules mining Sequential patterns mining Beyond: Structure patterns (tree, graphs) mining Definition: what is it? Which algorithm? there exist off-the-shelf packages

Associate Rules Mining Proposed by Agrawal et al in 1993. Initially used for Market Basket Analysis to find how items purchased by customers are related: Bread Milk [sup = 5%, conf = 100%] Applications: Object, properties patient / symptoms movies / ratings web pages / keywords basket / products what are examples in computing?

The Model: Data

The Model: Rules

Algorithms and Extensions Apriori Eclat FP-growth Relevant concepts: Frequent itemset mining Maximal itemset mining [Bayardo, 1998] Closed itemset mining [Pasquier, 1999]

Patterns in Sequences Substrings Regular expressions Partial orders Structured patterns, directed acyclic graphs (mining for frequent subgraphs) Episodes Constraint-based sequential pattern mining (e.g., the item satisfies certain values)

Sequential Pattern Mining [2] Sequential pattern mining: given a set of sequences, find the complete set of frequent subsequences

Sequential Mining Algorithms

Software C++ Implementations of Apriori, Eclat, FP-growth and several other algorithms are available on http://www.adrem.ua.ac.be/ goethals/software/ and on http://fimi.cs.helsinki.fi/

Designing Your Dynamic Analysis Dynamic analysis is, at its heart, an experimental effort: Have insight Build tool Evaluate efficiency and effectiveness Rethink

Jonathan Chen. extensible source-level instrumentation tool for performance measurement. Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney and John F. Roddick. Sequential pattern mining approaches and algorithms. ACM Comput. Surv., 45(2):19:1 19:39, March 2013.