Nomen est Omen: Exploring and Exploiting Name Similarities between Arguments and Parameters

Similar documents
Automatic Testing of Sequential and Concurrent Substitutability

Understanding and Automatically Preventing Injection Attacks on Node.js

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Static Detection of Brittle Parameter Typing

Synode: Understanding and Automatically Preventing Injection Attacks on Node.js

Saying Hi! Is Not Enough: Mining Inputs for Effective Test Generation

Automatic Generation of Program Specifications

Fully Automatic and Precise Detection of Thread Safety Violations

DLint: Dynamically Checking Bad Coding Practices in JavaScript

Analyzing Software using Deep Learning

Combined Static and Dynamic Automated Test Generation

TypeDevil: Dynamic Type Inconsistency Analysis for JavaScript

Lightweight Verification of Array Indexing

Performance Issues and Optimizations in JavaScript: An Empirical Study

MINING SOURCE CODE REPOSITORIES AT MASSIVE SCALE USING LANGUAGE MODELING COMP 5900 X ERIC TORUNSKI DEC 1, 2016

JITProf: Pinpointing JIT-Unfriendly JavaScript Code

5) Attacker causes damage Different to gaining control. For example, the attacker might quit after gaining control.

DLint: Dynamically Checking Bad Coding Practices in JavaScript

Operators in java Operator operands.

Estimating Mobile Application Energy Consumption Using Program Analysis

Empirical Study on Impact of Developer Collaboration on Source Code

Variable Feature Usage Patterns in PHP

Lessons learnt from the Beijing Olympic Games Website Measurement

Large-Scale API Protocol Mining for Automated Bug Detection

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

CSCI Test 1.Spring 2004 Student Id: Grading: Undergrads, you are responsible for 110 points. Grads: you are responsible for 120 points

ANSWERS. Birkbeck (University of London) Software and Programming 1 In-class Test Feb Student Name Student Number. Answer all questions

Static Analysis in Practice

CMSC 132: Object-Oriented Programming II

Program Testing and Analysis: Manual Testing Prof. Dr. Michael Pradel Software Lab, TU Darmstadt

Type Inference Systems. Type Judgments. Deriving a Type Judgment. Deriving a Judgment. Hypothetical Type Judgments CS412/CS413

Static Analysis in Practice

Internet Traffic Classification Using Machine Learning. Tanjila Ahmed Dec 6, 2017

CS558 Programming Languages

Combining the Logical and the Probabilistic in Program Analysis. Xin Zhang Xujie Si Mayur Naik University of Pennsylvania

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1)

Refactoring with Eclipse

Procedural Programming

VARIABLES AND TYPES CITS1001

CSE 143. Computer Programming II

Inheritance Usage Patterns in Open-Source Systems. Jamie Stevenson and Murray Wood. University of Strathclyde, Glasgow, UK

Northeastern University Systems Security Lab

INDEX. A SIMPLE JAVA PROGRAM Class Declaration The Main Line. The Line Contains Three Keywords The Output Line

MACS 261J Final Exam. Question: Total Points: Score:

Management. Software Quality. Dr. Stefan Wagner Technische Universität München. Garching 28 May 2010

Cover Page. The handle holds various files of this Leiden University dissertation

The Problem with Threads

Introduction to Programming Using Java (98-388)

Object Orientated Analysis and Design. Benjamin Kenwright

Lessons learnt from the Beijing Olympic Games

An Exploratory Study on Interface Similarities in Code Clones

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

Rubicon: Scalable Bounded Verification of Web Applications

PVS. Empirical Study of Usage and Performance of Java Collections. Diego Costa, 1 Artur Andrzejak, 1 Janos Seboek, 2 David Lo

Searching and Sorting

1.00 Introduction to Computers and Engineering Problem Solving. Quiz 1 March 7, 2003

Understanding the Characteristics of Android Wear OS. Renju Liu and Felix Xiaozhu Lin Purdue ECE

ALEMBIC: AUTOMATED MODEL INFERENCE FOR STATEFUL NETWORK FUNCTIONS

CS558 Programming Languages

Differential program verification

Semantic Atomicity for Multithreaded Programs!

Because you can t fix what you don t know is broken

Automatic Repair of Real Bugs in Java: A Large-Scale Experiment on the Defects4J Dataset

Dynamic Inference of Abstract Types

כללי Extreme Programming

Safety Checks and Semantic Understanding via Program Analysis Techniques

Jeff King, Jennifer Stoll, Michael T. Hunter, Mustaque Ahamad. Georgia Tech Information Security Center (GTISC) WICOW '08

Ripple: Reflection Analysis for Android Apps in Incomplete Information Environments

Laboratory 1: Eclipse and Karel the Robot

LINEAR INSERTION SORT

VRP in LLVM. The PredicateSimplifier pass. by Nick Lewycky

Leveraging Test Generation and Specification Mining for Automated Bug Detection without False Positives

cis20.1 design and implementation of software applications I fall 2007 lecture # I.2 topics: introduction to java, part 1

EE368 Project: Visual Code Marker Detection

Towards Incremental Grounding in Tuffy

Javarifier: inference of reference immutability

Managing Complexity. Programming is more than just syntax. 23-Nov-15

age = 23 age = age + 1 data types Integers Floating-point numbers Strings Booleans loosely typed age = In my 20s

Subclass Gist Example: Chess Super Keyword Shadowing Overriding Why? L10 - Polymorphism and Abstract Classes The Four Principles of Object Oriented

Testing. CMSC 433 Programming Language Technologies and Paradigms Spring A Real Testing Example. Example (Black Box)?

Towards a Survival Analysis of Database Framework Usage in Java Projects

Using Network Traffic to Remotely Identify the Type of Applications Executing on Mobile Devices. Lanier Watkins, PhD

C30b: Inner Class, Anonymous Class, and Lambda Expression

Improving Origin Analysis with Weighting Functions

Advanced Topics in Java and on Security in Systems Effective Generics. Eirik Eltvik 26 th of February 2007

A Paradigm (para-dime) is a framework containing the basic assumptions, ways of thinking, and methodology that are commonly accepted by members of a

Lesson 26: ArrayList (W08D1)

RANDOM NUMBER GAME PROJECT

Exam 1. CSC 121 Spring Lecturer: Howard Rosenthal. March 1, 2017

Bouncer: Securing Software by Blocking Bad Input

FORM 2 (Please put your name and form # on the scantron!!!!)

JFlow: Practical Mostly-Static Information Flow Control

Software Bug Detection Using the N-gram Language Model

that focuses on building models of

Predicate Abstraction of Java Programs with Collections. Pavel Parízek, Ondřej Lhoták

CSC324 Principles of Programming Languages

! Two questions when we design an OO system : ! Our Initial goal: learn to design and implement simple objects.

ADVANCED SOFTWARE ENGINEERING CS561 Danny Dig

Integrated Impact Analysis for Managing Software Changes. Malcom Gethers, Bogdan Dit, Huzefa Kagdi, Denys Poshyvanyk

Exam Part 3. Q.1) What is the output of the following code?

Transcription:

Nomen est Omen: Exploring and Exploiting Name Similarities between Arguments and Parameters Hui Liu 1 Qiurong Liu 1 Cristian-Alexandru Staicu 2 Michael Pradel 2 Yue Luo 1 1 Beijing Institute of Technology 2 Technical University of Darmstadt May 23, 2016

Motivation How we see the source code: void writeversionfile(file file, float version) { DataOutputStream dos = new DataOutputStream(file); dos.writefloat(version); dos.close(); } 1/15

Motivation How we see the source code: void writeversionfile(file file, float version) { DataOutputStream dos = new DataOutputStream(file); dos.writefloat(version); dos.close(); } How most analyses see the source code: void a(a b, B c) { C d = new C(b); d.e(c); d.f(); } 1/15

Motivation Main Idea Use similarities between arguments and parameters names in program analysis. 2/15

Motivation Main Idea Use similarities between arguments and parameters names in program analysis. Parameters (method definition) void writeversionfile(file file, float version) {...} Arguments (call site) writeversionfile(file, version); writeversionfile(myfile, currentversion); writeversionfile(target, v); 2/15

This talk Empirical evidence that: names of arguments and parameters are similar dissimilar names can be filtered out Two applications: anomaly detection arguments recommendation 3/15

This talk Empirical evidence that: names of arguments and parameters are similar dissimilar names can be filtered out Two applications: anomaly detection arguments recommendation [Allamanis et al., FSE2014], [Allamanis et al., FSE2015], [Butler et al. CSMR2010], [Pradel and Gross, ISSTA2011] 3/15

Empirical Study Are argument and parameter names similar? Yes, in 31% of the cases even identical. 4/15

Empirical Study Are argument and parameter names similar? Yes, in 31% of the cases even identical. Why are some dissimilar? Mostly because of short and generic names. 4/15

Empirical Study Are argument and parameter names similar? Yes, in 31% of the cases even identical. Why are some dissimilar? Mostly because of short and generic names. Can we eliminate dissimilar names? Yes, a significant part of them. 4/15

Empirical Study Are argument and parameter names similar? Yes, in 31% of the cases even identical. Why are some dissimilar? Mostly because of short and generic names. Can we eliminate dissimilar names? Yes, a significant part of them. Do developers pick the most similar arguments? Yes, in most of the cases. 4/15

Methodology and Setup 60 popular Java programs >600,000 arguments Retrieve parameters using JDT s static solving lexsim(arg, par) = included terms(arg,par) + included terms(par,arg) terms(arg) + terms(par) 5/15

Methodology and Setup 60 popular Java programs >600,000 arguments Retrieve parameters using JDT s static solving lexsim(arg, par) = included terms(arg,par) + included terms(par,arg) terms(arg) + terms(par) lexsim( length, inputlength ) = 1+1 1+2 = 0.67 5/15

Are Argument and Parameter Names Similar? 60% 50% 40% 30% 20% 10% 0% [0.0, 0.1) [0.1, 0.2) [0.2, 0.3) [0.3, 0.4) [0.4, 0.5) [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0) 1 Similarity 6/15

Why are Some Names Dissimilar? Short identifiers: 40.5% of the dissimilar pairs have a parameter of length < = 3 7/15

Why are Some Names Dissimilar? Short identifiers: 40.5% of the dissimilar pairs have a parameter of length < = 3 Generic identifiers: index, item, key, value account for 14% of dissimilarities 7/15

Can We Eliminate Dissimilar Names? Example of code containing generic identifier names 1 : public int maxvalue(int array[]){ List<Integer> list = new ArrayList<Integer>(); for (int i = 0; i < array.length; i++) { list.add(array[i]); } return Collections.max(list); } 1 stackoverflow, question 1806816 8/15

Can We Eliminate Dissimilar Names? Example of code containing generic identifier names 1 : public int maxvalue(int array[]){ List<Integer> list = new ArrayList<Integer>(); for (int i = 0; i < array.length; i++) { list.add(array[i]); } return Collections.max(list); } Idea Use a corpus of programs to infer parameters names that are likely to appear in dissimilar pairs. 1 stackoverflow, question 1806816 8/15

Pruning Low-Similarity Parameters 60% 50% 40% 30% 20% 10% 0% [0.0, 0.1) [0.1, 0.2) [0.2, 0.3) [0.3, 0.4) [0.4, 0.5) [0.5, 0.6) [0.6, 0.7) [0.7, 0.8) [0.8, 0.9) [0.9, 1.0) 1 Similarity Before Filtering After Filtering ; 9/15

Do Developers Pick the Most Similar Arguments? Compare argument with potential alternatives. 10/15

Do Developers Pick the Most Similar Arguments? Compare argument with potential alternatives. Findings 50% of the arguments have no alternatives 13.5% are strictly more similar than any other alternative if filtering out is applied, this number increases two times 6.9% have a more similar alternative 10/15

Application 1: Anomaly Detection Idea An anomaly is a low-similarity pair that has a potential alternative that would significantly increase the similarity. 11/15

Application 1: Anomaly Detection Idea An anomaly is a low-similarity pair that has a potential alternative that would significantly increase the similarity. Issue in Lightweight Java Game Library: void writeversionfile(file file, float version) {... } File versionfile;... writeversionfile(dir, latestversion); 11/15

Application 1: Anomaly Detection Idea An anomaly is a low-similarity pair that has a potential alternative that would significantly increase the similarity. Issue in Lightweight Java Game Library: void writeversionfile(file file, float version) {... } File versionfile;... writeversionfile(dir, latestversion); 11/15

Anomaly Detection: Results Ground truth: 14 bugs in the history of the subject programs Approach detected: 6 / 14 and three additional ones 127 renaming opportunities Average precision: 80% 12/15

Application 2: Arguments Recommendation 13/15

Application 2: Arguments Recommendation Idea Suggest the most similar potential alternative. 13/15

Arguments Recommendation: Results Analyzed arguments in four applications. Recommended 1,588 arguments with a precision of 83%. Missing recommendations: complex expressions literals typecasts 14/15

Conclusions Empirical evidence that: names of arguments and parameters are similar short and generic parameters cause dissimilarity dissimilar names can be filtered out developers tend to pick the most similar arguments Two applications: anomaly detection arguments recommendation 15/15

Conclusions Empirical evidence that: names of arguments and parameters are similar short and generic parameters cause dissimilarity dissimilar names can be filtered out developers tend to pick the most similar arguments Two applications: anomaly detection arguments recommendation Promising opportunities for more name-based techniques 15/15

Conclusions Empirical evidence that: names of arguments and parameters are similar short and generic parameters cause dissimilarity dissimilar names can be filtered out developers tend to pick the most similar arguments Two applications: anomaly detection arguments recommendation Promising opportunities for more name-based techniques 15/15