1,a) 2,b) 1,c) Code Clone Detection Technique Using Program Execution Traces Masakazu Ioka 1,a) Norihiro Yoshida 2,b) Katsuro Inoue 1,c) Abstract: Code clone is a code fragment that has identical or similar fragments to it in the source code. Many code clone detection techniques and tools have been proposed. However, source code derived by copy-and-paste may be disguised by obfuscation because these techniques detect code clone using only static information such as source code or binary. Therefore, we propose a new clone detection technique, which divides execution trace into a set of phases, and then performs the comparison of those phases based on involved method calls. The experimental result shows that the proposed clone detection technique identified obfuscated and original classes, and an evidence of reusing source code. Keywords: Code Clone Detection, Dynamic Analysis, Obfuscation 1. 1 1 Osaka University 2 Nara Institute of Science and Technology a) m-ioka@ist.osaka-u.ac.jp b) yoshida@is.naist.jp c) inoue@ist.osaka-u.ac.jp [3] [11] c 2012 Information Processing Society of Japan 1
2 2 3 4 5 6 2. 2.1 2.1.1 [1] [6] 1 2.1.2 [4], [5], [7], [9], [10], [14] 5 CCFinder[5] CCFinder 4 STEP1() STEP2() STEP3() STEP4( ) 2.2 [13] 1 Amida[12] Amida Java 2.3 1 [13] 1 [8] Least Recently Used (LRU) 2.4 [11] c 2012 Information Processing Society of Japan 2
!#$$%&#'()%* %%(,-.#/)%.0-1%(,-2/3&/,-24%'$45%* %%%%66%78 %%(:;-!%.0-1%$0')/<-2435%* %%%%(,-2/3=>)0?5@!#$$%AB%*!%#!!#$$%&#'()%* %%(,-.#/)%.0-1%(,-2/3&/,-24%'$45%* %%%%66%78 %%(:;-!%.0-1%$0')/<-2435%* %%%%66%78!#$$%AB%* Fig. 1 1!#!!#$$%C%* %%(,-.#/)%.0-1%#3&/,-24%#5%* %%%%66%78 %%(:;-!%.0-1%;35%* %%%%#3=>)0?5@!#$$%D%*!$#!!#$$%&#'()%* %%(:;-!%.0-1%$0')/<-2435%* %%%%ABE(,-2/3/<-$F%=>)0?5@!#$$%AB%* %%(:;-!%$/#/-!%.0-1%(,-2/3&#'()% #F%&/,-24%'$45%* %%%%66%78 An example of obfuscations ( ) API 1 (a) a b 1 (b) public static 1 (c) something print print 3. 2 2 New ( ) 2 (1) (2) (3) 3.1 [13] 3.2 2 1 2 2 2 2 1 (1 ) 1 3 A( X, Y); B( Z, Z); C( W, Z); 0(1, 2); 0(1, 1); 0(1, 2) 2 () 4 A A 0(1, 2); B A B 3(4, 4); C 2(3, 1); 1 2 1 c 2012 Information Processing Society of Japan 3
<=8>*?,-.<=8>/-.@@@, 0$-.0.@@@,!!#$%&'()*, 3!()*4$56789, #$, #%, $'-./0:;, -./0(, -./0), #$, #%, %'ABC, -./0(, -./0), #$, #%, &'-./012, Fig. 2 2 An overview of the proposed approach!#$!%#$%%&'&!#$%,$%,&'&!#$-%.$%,&'& ( ) * ( ) ) ( ) * 3 1 Fig. 3 An example of normalization within one method call!#$!%#$%%&'& ( ) *!#$!%#$%%&'&!#$%,$%,&'& ( ) * -..!#$%,$%,&'&!#$/%0$%,&'& 50 2 ICCA *1 Gemini Virgo 2 ProGuard *2 4.1 3.3 A B ( ) ) * - ) 4 Fig. 4 An example of normalization according to current and previous method calls 3.3 [15] 2 1 1 1 3 2 1 1 4. similarity(a, B) = 2 (A B ) A B A B A B Gemini Virgo ProGuard 5 6 Gemini 7 Gemini Virgo Virgo *1 ICCA: http://sel.ist.osaka-u.ac.jp/icca/ *2 ProGuard: http://proguard.sourceforge.net/ c 2012 Information Processing Society of Japan 4
! #$ (#$ (!#'!#&!#%!#$! Fig. 5! #$ (#$ (!#'!#&!#%!#$! Fig. 6!#$%$! # $ $ % & ' ( ) *, -. '( Fig. 7 7!!#(!#$!#)!#%!#*!#&!#!#'!#, ( 5!#$ -./010 20345 Similarity between obfuscated and original classes!!#(!#$!#)!#%!#*!#&!#!#'!#, ( 6!#$ -./010 20345 Similarity between obfuscated and original methods *'(!#$%$!#$%&'()*& )''( Heat map of similarity between obfuscated and original classes 4.2 Gemini Virgo Gemini Virgo ( 1) Gemini Virgo ( 2) Gemini Virgo ( 3) 3 1 3 ProGuard Gemini 1.00 Gemini Virgo iccalib 1.00 3 Gemini ProGuard 3 iccalib 2 Gemini Virgo Virgo Gemini Gemini Virgo 5. Sæbjørnsen [10] 1 Table 1 Detection result of similar phases 1 75 1.00 0.227 0.612 2 75 1.00 0.192 0.612 3 61 1.00 0.048 0.632 c 2012 Information Processing Society of Japan 5
Lim Java [9] Java ( ) Cornelissen [2] 2 2 6. (A) (: 21240002) (C) (: 22500026) [1] I. D. Baxter, A. Yahin, L. Moura, M. S. Anna, and L. Bier. Clone Detection Using Abstract Syntax Trees. In Proc. of ICSM 1998, pp. 368 377, 1998. [2] B. Cornelissen and L. Moonen. Visualizing similarities in execution traces. In Proc. of PCODA 2007, pp. 6 10, 2007. [3],,.., Vol. J91-D, No. 6, pp. 1465 1481, 2008. [4] L. Jiang, G. Misherghi, Z. Su, and S. Glondu. DECKARD: Scalable and accurate tree-based detection of code clones. In Proc. of ICSE 2007, pp. 96 105, 2007. [5] T. Kamiya, S. Kusumoto, and K. Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng., Vol. 28, No. 7, pp. 654 670, 2002. [6] M. Kim, L. Bergman, T. Lau, and D. Notkin. An Ethnographic Study of Copy and Paste Programming Practices in OOPL. In Proc. of ISESE 2004, pp. 83 92, 2004. [7] J. Krinke. Identifying similar code with program dependence graphs. In Proc. of WCRE 2001, pp. 301 309, 2001. [8] H. Lieberman and C. Hewitt. A real-time garbage collector based on the lifetimes of objects. Communications of the ACM, Vol. 26, No. 6, pp. 419 429, 1983. [9] H. Lim, H. Park, S. Choi, and T. Han. A method for detecting the theft of Java programs through analysis of the control flow information. Information and Software Technology, Vol. 51, No. 9, pp. 1338 1350, 2009. [10] A. Sæbjørnsen, J. Willcock, T. Panas, D. Quinlan, and Z. Su. Detecting code clones in binary executables. In Proc. of ISSTA 2009, pp. 117 128, 2009. [11],,,. Java DonQuixote. FOSE 2006, pp. 113 118, 2006. [12],,,,.., Vol. 24, No. 3, pp. 153 169, 2007. [13],,.., Vol. 51, No. 12, pp. 2273 2286, 2010. [14] R. Wettel and R. Marinescu. Archeology of code duplication: recovering duplication chanins from small duplication fragments. In Proc. of SYNASC 2005, pp. 63 70, 2005. [15] R. B. Yates and B. R. Neto. Modern Information Retrieval. Addison Wesley, 1999. A.1 4.2 A 1 c 2012 Information Processing Society of Japan 6
!#$%$&'()*'.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4B-96!-/8=>=9-9</-7C.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4.<>;9:K9<$/C.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4GLG%BMN9=4.<F$:C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4GLG%BMN9=4.<F$:C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:G%E/4.<PIJC@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%: @@@@@@@@@@@@@@@@@@A.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:4.<Q/6-Q%.<OC.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A.#$%$4B9<94E$:4F$:IEE7<L9<94.<C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:N9%9.-4.<Q97<F$:C@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-SF$:TU4.<F$:GLC@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-4.<F$:C@ $;;9:$D4B9<94E$:4F$:N9%9.-A$;;9:$D4B9<94E$:4F$:N9%9.-SF$:TU4C@.#$%$45$645$789:47;9<<-=:/<4>;9<<-?:/<?9%:A$;;9:$D4B9<94E$:4F$:G%E/4.<!-/8=GLC@,$-./&'()*'0123 5$-./4949A$;;9:$D4B9<94E$:4F$:G%E/4.<HIJC@ $;;9:$D4B9<94E$:4F$:G%E/A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<HIJC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4.<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-4;-9<J/BF-9.#%<7C@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:G%E/4.<!-/8=GLC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94E$:4F$:G%E/4.<F$:GLC@ $;;9:$D4B9<94E$:4F$:N<-$;7N9%9.-A$;;9:$D4B9<94;:/%4J:/%N9%9.-4.<F$:J/BF-9.#%<7C@ Fig. A 1 A 1 An example of detected pair of similar phases c 2012 Information Processing Society of Japan 7