Code Clone Detection Technique Using Program Execution Traces

Similar documents
CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization

On Refactoring for Open Source Java Program

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

On Refactoring Support Based on Code Clone Dependency Relation

Refactoring Support Based on Code Clone Analysis

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

Rearranging the Order of Program Statements for Code Clone Detection

Problematic Code Clones Identification using Multiple Detection Results

Sub-clones: Considering the Part Rather than the Whole

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014)

A Tool for Multilingual Detection of Code Clones Using Syntax Definitions

A Novel Technique for Retrieving Source Code Duplication

CODE CLONE DETECTION A NEW APPROACH. - Sanjeev Chakraborty

A Study on A Tool to Suggest Similar Program Element Modifications

Searching for Configurations in Clone Evaluation A Replication Study

What Kinds of Refactorings are Co-occurred? An Analysis of Eclipse Usage Datasets

An Experience Report on Analyzing Industrial Software Systems Using Code Clone Detection Techniques

Falsification: An Advanced Tool for Detection of Duplex Code

Detection and Behavior Identification of Higher-Level Clones in Software

Detecting and Analyzing Code Clones in HDL

Software Quality Analysis by Code Clones in Industrial Legacy Software

Gapped Code Clone Detection with Lightweight Source Code Analysis

Relation of Code Clones and Change Couplings

On the Robustness of Clone Detection to Code Obfuscation

DETECTING SIMPLE AND FILE CLONES IN SOFTWARE

Incremental Clone Detection and Elimination for Erlang Programs

Code Syntax-Comparison Algorithm based on Type-Redefinition-Preprocessing and Rehash Classification

A Study of Repetitiveness of Code Changes in Software Evolution

Code Clone Analysis and Application

An Ethnographic Study of Copy and Paste Programming Practices in OOPL

Classification of Java Programs in SPARS-J. Kazuo Kobori, Tetsuo Yamamoto, Makoto Matsusita and Katsuro Inoue Osaka University

Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools

An Exploratory Study on Interface Similarities in Code Clones

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

Enhancing Source-Based Clone Detection Using Intermediate Representation

Quick Parser Development Using Modified Compilers and Generated Syntax Rules

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code


Visualization of Clone Detection Results

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Token based clone detection using program slicing

COMPARISON AND EVALUATION ON METRICS

Incremental Code Clone Detection: A PDG-based Approach

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools

Master Thesis. Type-3 Code Clone Detection Using The Smith-Waterman Algorithm

Visualizing Clone Cohesion and Coupling

Evaluation of a Business Application Framework Using Complexity and Functionality Metrics

EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS

Algorithm to Detect Non-Contiguous Clones with High Precision

Very-Large Scale Code Clone Analysis and Visualization of Open Source Programs Using Distributed CCFinder: D-CCFinder

Source Code Reuse Evaluation by Using Real/Potential Copy and Paste

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim

Mining Revision Histories to Detect Cross-Language Clones without Intermediates

Reusing Reused Code II. CODE SUGGESTION ARCHITECTURE. A. Overview

Code Similarity Detection by Program Dependence Graph

Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments

An investigation into the impact of software licenses on copy-and-paste reuse among OSS projects

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization

Using Compilation/Decompilation to Enhance Clone Detection

Analyzing the Effect of Preprocessor Annotations on Code Clones

Detecting software defect patterns and rule violation identification in source code 1

Zjednodušení zdrojového kódu pomocí grafové struktury

Measuring Code Similarity in Large-scaled Code Corpora

Cloning in DSLs: Experiments with OCL

Re-usability based approach Reusability of code, logic, design and/or an entire system are the major reasons of code clone occurrence.

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT

Evaluating Software Maintenance Cost Using Functional Redundancy Metrics

Compiling clones: What happens?

A Proposal of Refactoring Method for Existing Program Using Code Clone Detection and Impact Analysis Method

Clone Detection via Structural Abstraction

A Technique to Detect Multi-grained Code Clones

Similar Code Detection and Elimination for Erlang Programs

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

Clone code detector using Boyer Moore string search algorithm integrated with ontology editor

A Parallel and Efficient Approach to Large Scale Clone Detection

arxiv: v1 [cs.se] 25 Mar 2014

Code Clone Detection on Specialized PDGs with Heuristics

Clone Detection and Elimination for Haskell

Editing Code. SWE 795, Spring 2017 Software Engineering Environments

Method-Level Code Clone Modification using Refactoring Techniques for Clone Maintenance

Clone Detection and Removal for Erlang/OTP within a Refactoring Environment

Beyond Templates: a Study of Clones in the STL and Some General Implications

Clone Detection Using Scope Trees

Detection of Non Continguous Clones in Software using Program Slicing

Abstract. We define an origin relationship as follows, based on [12].

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering

Cloning by Accident:

Android Open Source Project AOSP

3 3-gram. (causal difference graph) Relative Debugging[3] 2. (Execution Trace) [4] Differential Slicing[2] 2. 3-gram. Java. 3-gram 2. DaCapo.

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

Assertion with Aspect

Scenario-Based Comparison of Clone Detection Techniques

Copy & Paste Education Solving Programming Problems with Web Code Snippets

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu

Refactoring Industrial-Strength Code

Transcription:

1,a) 2,b) 1,c) Code Clone Detection Technique Using Program Execution Traces Masakazu Ioka 1,a) Norihiro Yoshida 2,b) Katsuro Inoue 1,c) Abstract: Code clone is a code fragment that has identical or similar fragments to it in the source code. Many code clone detection techniques and tools have been proposed. However, source code derived by copy-and-paste may be disguised by obfuscation because these techniques detect code clone using only static information such as source code or binary. Therefore, we propose a new clone detection technique, which divides execution trace into a set of phases, and then performs the comparison of those phases based on involved method calls. The experimental result shows that the proposed clone detection technique identified obfuscated and original classes, and an evidence of reusing source code. Keywords: Code Clone Detection, Dynamic Analysis, Obfuscation 1. 1 1 Osaka University 2 Nara Institute of Science and Technology a) m-ioka@ist.osaka-u.ac.jp b) yoshida@is.naist.jp c) inoue@ist.osaka-u.ac.jp [3] [11] c 2012 Information Processing Society of Japan 1

2 2 3 4 5 6 2. 2.1 2.1.1 [1] [6] 1 2.1.2 [4], [5], [7], [9], [10], [14] 5 CCFinder[5] CCFinder 4 STEP1() STEP2() STEP3() STEP4( ) 2.2 [13] 1 Amida[12] Amida Java 2.3 1 [13] 1 [8] Least Recently Used (LRU) 2.4 [11] c 2012 Information Processing Society of Japan 2

!#$$%&#'()%* %%(,-.#/)%.0-1%(,-2/3&/,-24%'$45%* %%%%66%78 %%(:;-!%.0-1%$0')/<-2435%* %%%%(,-2/3=>)0?5@!#$$%AB%*!%#!!#$$%&#'()%* %%(,-.#/)%.0-1%(,-2/3&/,-24%'$45%* %%%%66%78 %%(:;-!%.0-1%$0')/<-2435%* %%%%66%78!#$$%AB%* Fig. 1 1!#!!#$$%C%* %%(,-.#/)%.0-1%#3&/,-24%#5%* %%%%66%78 %%(:;-!%.0-1%;35%* %%%%#3=>)0?5@!#$$%D%*!$#!!#$$%&#'()%* %%(:;-!%.0-1%$0')/<-2435%* %%%%ABE(,-2/3/<-$F%=>)0?5@!#$$%AB%* %%(:;-!%$/#/-!%.0-1%(,-2/3&#'()% #F%&/,-24%'$45%* %%%%66%78 An example of obfuscations ( ) API 1 (a) a b 1 (b) public static 1 (c) something print print 3. 2 2 New ( ) 2 (1) (2) (3) 3.1 [13] 3.2 2 1 2 2 2 2 1 (1 ) 1 3 A( X, Y); B( Z, Z); C( W, Z); 0(1, 2); 0(1, 1); 0(1, 2) 2 () 4 A A 0(1, 2); B A B 3(4, 4); C 2(3, 1); 1 2 1 c 2012 Information Processing Society of Japan 3

<=8>*?,-.<=8>/-.@@@, 0$-.0.@@@,!!#$%&'()*, 3!()*4$56789, #$, #%, $'-./0:;, -./0(, -./0), #$, #%, %'ABC, -./0(, -./0), #$, #%, &'-./012, Fig. 2 2 An overview of the proposed approach!#$!%#$%%&'&!#$%,$%,&'&!#$-%.$%,&'& ( ) * ( ) ) ( ) * 3 1 Fig. 3 An example of normalization within one method call!#$!%#$%%&'& ( ) *!#$!%#$%%&'&!#$%,$%,&'& ( ) * -..!#$%,$%,&'&!#$/%0$%,&'& 50 2 ICCA *1 Gemini Virgo 2 ProGuard *2 4.1 3.3 A B ( ) ) * - ) 4 Fig. 4 An example of normalization according to current and previous method calls 3.3 [15] 2 1 1 1 3 2 1 1 4. similarity(a, B) = 2 (A B ) A B A B A B Gemini Virgo ProGuard 5 6 Gemini 7 Gemini Virgo Virgo *1 ICCA: http://sel.ist.osaka-u.ac.jp/icca/ *2 ProGuard: http://proguard.sourceforge.net/ c 2012 Information Processing Society of Japan 4

! #$ (#$ (!#'!#&!#%!#$! Fig. 5! #$ (#$ (!#'!#&!#%!#$! Fig. 6!#$%$! # $ $ % & ' ( ) *, -. '( Fig. 7 7!!#(!#$!#)!#%!#*!#&!#!#'!#, ( 5!#$ -./010 20345 Similarity between obfuscated and original classes!!#(!#$!#)!#%!#*!#&!#!#'!#, ( 6!#$ -./010 20345 Similarity between obfuscated and original methods *'(!#$%$!#$%&'()*& )''( Heat map of similarity between obfuscated and original classes 4.2 Gemini Virgo Gemini Virgo ( 1) Gemini Virgo ( 2) Gemini Virgo ( 3) 3 1 3 ProGuard Gemini 1.00 Gemini Virgo iccalib 1.00 3 Gemini ProGuard 3 iccalib 2 Gemini Virgo Virgo Gemini Gemini Virgo 5. Sæbjørnsen [10] 1 Table 1 Detection result of similar phases 1 75 1.00 0.227 0.612 2 75 1.00 0.192 0.612 3 61 1.00 0.048 0.632 c 2012 Information Processing Society of Japan 5

Lim Java [9] Java ( ) Cornelissen [2] 2 2 6. (A) (: 21240002) (C) (: 22500026) [1] I. D. Baxter, A. Yahin, L. Moura, M. S. Anna, and L. Bier. Clone Detection Using Abstract Syntax Trees. In Proc. of ICSM 1998, pp. 368 377, 1998. [2] B. Cornelissen and L. Moonen. Visualizing similarities in execution traces. In Proc. of PCODA 2007, pp. 6 10, 2007. [3],,.., Vol. J91-D, No. 6, pp. 1465 1481, 2008. [4] L. Jiang, G. Misherghi, Z. Su, and S. Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. In Proc. of ICSE 2007, pp. 96 105, 2007. [5] T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng., Vol. 28, No. 7, pp. 654 670, 2002. [6] M. Kim, L. Bergman, T. Lau, and D. Notkin. An Ethnographic Study of Copy and Paste Programming Practices in OOPL. In Proc. of ISESE 2004, pp. 83 92, 2004. [7] J. Krinke. Identifying similar code with program dependence graphs. In Proc. of WCRE 2001, pp. 301 309, 2001. [8] H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, Vol. 26, No. 6, pp. 419 429, 1983. [9] H. Lim, H. Park, S. Choi, and T. Han. A method for detecting the theft of Java programs through analysis of the control flow information. Information and Software Technology, Vol. 51, No. 9, pp. 1338 1350, 2009. [10] A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su. Detecting code clones in binary executables. In Proc. of ISSTA 2009, pp. 117 128, 2009. [11],,,. Java DonQuixote. FOSE 2006, pp. 113 118, 2006. [12],,,,.., Vol. 24, No. 3, pp. 153 169, 2007. [13],,.., Vol. 51, No. 12, pp. 2273 2286, 2010. [14] R. Wettel and R. Marinescu. Archeology of code duplication: recovering duplication chanins from small duplication fragments. In Proc. of SYNASC 2005, pp. 63 70, 2005. [15] R. B. Yates and B. R. Neto. Modern Information Retrieval. Addison Wesley, 1999. A.1 4.2 A 1 c 2012 Information Processing Society of Japan 6

!#$%$&'()*'.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4B-96!-/8=>=9-9</-7C.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4.<>;9:K9<$/C.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4GLG%BMN9=4.<F$:C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4GLG%BMN9=4.<F$:C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:G%E/4.<PIJC@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4.<Q/6-Q%.<OC.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:N9%9.-4.<Q97<F$:C@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-SF$:TU4.<F$:GLC@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-4.<F$:C@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-SF$:TU4C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:G%E/4.<!-/8=GLC@,$-./&'()*'0123 5$-./4949A$;;9:$D4B9<94E$:4F$:G%E/4.<HIJC@ $;;9:$D4B9<94E$:4F$:G%E/A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<HIJC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4;-9<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:G%E/4.<!-/8=GLC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:G%E/4.<F$:GLC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94;:/%4J:/%N9%9.-4.<F$:J/BF-9.#%<7C@ Fig. A 1 A 1 An example of detected pair of similar phases c 2012 Information Processing Society of Japan 7