A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark

Size: px
Start display at page:

Download "A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark"

Transcription

1 A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark SCAM 16, EMSE (under reviewed) Photo: 1

2 When source code is copied and modified, which code similarity detection techniques or tools get the most accurate results? 2

3 Bellon et al. (TSE 2007) Roy et al. (Sci Comp Prog. 2009) Hage et al. (CSERC 2010) Biegel et al. (MSR 11) 3

4 1 The selected tools are limited to only a subset of clone or plagiarism detectors (and their parameters). 2 The results are based on different data sets. 4

5 5 30 tools

6 Pervasive Modifications From: /* ORIGINAL */ private static int partition (Comparable[] a, int lo, int hi) { int i = lo; int j = hi+1; Comparable v = a[lo]; while (true) { while (less(a[++i], v)) { if (i == hi) break; } while (less(v, a[--j])) { if (j == lo) break; } if (i >= j) break; exch(a, i, j); } exch(a, lo, j); return j; } /* PERVASIVELY MODIFIED CODE */ private static int partition (int[] bob, int left, int right){ int x = left; int y = right+1; for (;;) { while (less(bob[left],bob[--y])) if (y == left) break; while (less(bob[++x],bob[left])) if (x == right) break; if (x >= y) break; swap(bob, y, x); } swap(bob, y, left); return y; } SW Plagiarism clone evolution refactoring 6

7 7

8 pervasively modified code to be used in detection phase source obfuscator compiler bytecode obfuscator decompilers pervasively modified code original ARTIFICE javac ProGuard Krakatau BubbleSort.java EightQueens.java GuessWord.java TowerOfHanoi.java InfixConverter.java Kapreka_Tran.java MagicSquare.java RailRoadCar.java SLinkedList.java SqrtAlgorithm.java Procyon 8

9 Boiler-Plate Code Detection of SOurce COde re-use (SOCO). Flores E., Rosso P., Moreno L., Villatoro-Tello E. (2014) 9

10 Parameter Settings 10 Jonathan H. Ward (Wikipedia CC BY-SA 3.0)

11 11

12 Similarity Report orig orig no kraka tau orig no procy on orig pg kraka tau orig pg procy on no kraka tau no procy on pg kraka tau pg procy on Sqrt/ orig Sqrt/ Squr/ pg kraka tau Squr/ pg procy on InfConv/orig InfConv/artifice InfConv/orig_no_krakatau InfConv/orig_no_procyon InfConv/orig_pg_krakatau InfConv/orig_pg_procyon InfConv/artific_no_krakatau InfConv/artifice_no_procyon InfConv/artifice_pg_krakatau InfConv/artifice_pg_procyon Sqrt/orig Sqrt/artifice Square/artifice_pg_krakatau Square/artifice_pg_procyon

13 Similarity Threshold = 50 orig orig no kraka tau orig no procy on orig pg kraka tau orig pg procy on no kraka tau no procy on pg kraka tau pg procy on Sqrt/ orig Sqrt/ Squr/ pg kraka tau Squr/ pg procy on InfConv/orig InfConv/artifice InfConv/orig_no_krakatau InfConv/orig_no_procyon InfConv/orig_pg_krakatau InfConv/orig_pg_procyon InfConv/artific_no_krakatau InfConv/artifice_no_procyon InfConv/artifice_pg_krakatau InfConv/artifice_pg_procyon Sqrt/orig Sqrt/artifice Square/artifice_pg_krakatau Square/artifice_pg_procyon

14 Best Threshold 1.00 F-measure = F-measure Threshold Value (T) 14

15 Optimal Configuration Best Param Settings Best Threshold Pervasive: 14,880,000 pairwise comparisons SOCO: 99,816,528 pairwise comparisons Icons made by Freepik from is licensed by Creative Commons BY

16 Clone det. Plag det. Comp. Others ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-Deflate 7zncd-Deflate2 7zncd-LZMA 7zncd-Deflate64 7zncd-PPMd bzip2ncd gzipncd icd ncd-bzlib ncd-zlib xz-ncd bsdiff diff difflib fuzzywuzzy jellyfish ngram cosine Pervasive Mod F1

17 Clone det. Plag det. Comp. Others ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-Deflate 7zncd-Deflate2 7zncd-LZMA 7zncd-Deflate64 7zncd-PPMd bzip2ncd gzipncd icd ncd-bzlib ncd-zlib xz-ncd bsdiff diff difflib fuzzywuzzy jellyfish ngram cosine Boiler- Plate F1

18 Highly specialised source code similarity detection techniques and tools can perform better than more general, compression & textual similarity measures. Interesting: difflib and fuzzywuzzy. Icons made by Freepik from is licensed by Creative Commons BY

19 Optimal Configurations CCFX s Precision vs. Recall Measure Value ccfx s params b t Precision , 8, 9 Recall

20 CCFX Optimal Config. 20

21 b = 5, t = 11, 12 b = 19, t = 7, 8, 9 21

22 Pervasive Mod. Boiler- Plate

23 The optimal configurations derived from one data set has a detrimental impact on the similarity detection results for another data set. Cbuckley, Jpowell on en.wikipedia Icons made by Freepik from is licensed by Creative Commons BY

24 Normalisation by Decompilation Pervasively modified code Normalisation Normalised code Decompile Compile 24

25 Clone det. Plag det. Comp. Others ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-Deflate 7zncd-Deflate2 7zncd-LZMA 7zncd-LZMA2 7zncd-PPMd bzip2ncd gzipncd icd ncd-bzlib ncd-zlib xz-ncd bsdiff diff py-difflib py-fuzzywuzzy py-jellyfish py-ngram py-sklearn F1 F1 Orig. Dec.

26 Compilation and decompilation can be used as an effective normalisation method that greatly improves similarity detection on Java source code (with statistical significance) IWSC 17 Icons made by Freepik from is licensed by Creative Commons BY

27 Ranked Results Only Top k Results ccfx fuzzywuzzy ncd-bzlib bzip2ncd simian gzipncd ncd-zlib jplag-java difflib jplag-text simjava gzipncd ncd-zlib sherlock jplag-text 7zncd-PPMd xzncd 7zncd-Deflate64 7zncd-Deflate fuzzywuzzy Mean Average Precision (MAP) Pervasive Mod Mean Average Precision (MAP) Boiler-Plate 27

28 Distribution of tool s F1 scores vs. pervasive mod. type Original O = original Obfuscator A = Artifice (source) Pg = ProGuard (bytecode) Decompiler K = Krakatau Pc = Procyon 28

29 F1 Score Tool O A K Pc Pg K Pg Pc A K A Pc A Pg K A Pg Pc Original O = original Obfuscator A = Artifice (source) Pg = ProGuard (bytecode) Decompiler K = Krakatau Pc = Procyon ccfx deckard iclones nicad simian jplag-java jplag-text plaggie sherlock simjava simtext 7zncd-BZip2 7zncd-Deflate 7zncd-Deflate2 7zncd-LZMA 7zncd-LZMA2 7zncd-PPMd bzip2ncd gzipncd icd ncd-zlib ncd-bzlib xzncd bsdiff diff difflib fuzzywuzzy jellyfish ngram cosine

30 To Sum Up A Comparison of Code Similarity Analyzers 30 Research Note: Website:

CloPlag. A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools. Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan

CloPlag. A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools. Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan CloPlag A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan 1 Outline Background Motivation and Research Questions Tools

More information

CloPlag. A Study of Effects of Code Obfuscation to Code Similarity Detection Tools. Chaiyong Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan

CloPlag. A Study of Effects of Code Obfuscation to Code Similarity Detection Tools. Chaiyong Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan CloPlag A Study of Effects of Code Obfuscati to Code Similarity Detecti Tools Chaiyg Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan Cled Code vs Plagiarised Code A result from source code reuse by copying

More information

Research Note RN/17/04. A Comparison of Code Similarity Analysers

Research Note RN/17/04. A Comparison of Code Similarity Analysers UCL DEPARTMENT OF COMPUTER SCIENCE Research Note RN/17/04 A Comparison of Code Similarity Analysers 20 February 2017 Chaiyong Ragkhitwetsagul Jens Krinke David Clark Abstract Source code analysis to detect

More information

Code Duplication: A Measurable Technical Debt?

Code Duplication: A Measurable Technical Debt? UCL 2014 wmjkucl 05/12/2016 Code Duplication: A Measurable Technical Debt? Jens Krinke Centre for Research on Evolution, Search & Testing Software Systems Engineering Group Department of Computer Science

More information

A Comparison of Code Similarity Analysers

A Comparison of Code Similarity Analysers Noname manuscript No. (will be inserted by the editor) A Comparison of Code Similarity Analysers Chaiyong Ragkhitwetsagul Jens Krinke David Clark Received: date / Accepted: date Abstract Copying and pasting

More information

Measuring Code Similarity in Large-scaled Code Corpora

Measuring Code Similarity in Large-scaled Code Corpora Measuring Code Similarity in Large-scaled Code Corpora Chaiyong Ragkhitwetsagul CREST, Department of Computer Science University College London, UK Abstract Source code similarity measurement is a fundamental

More information

Using Compilation/Decompilation to Enhance Clone Detection

Using Compilation/Decompilation to Enhance Clone Detection Using Compilation/Decompilation to Enhance Clone Detection Chaiyong Ragkhitwetsagul, Jens Krinke University College London, UK Abstract We study effects of compilation and decompilation to code clone detection

More information

Source Code Plagiarism Detection using Machine Learning

Source Code Plagiarism Detection using Machine Learning Source Code Plagiarism Detection using Machine Learning Utrecht University Daniël Heres August 2017 Contents 1 Introduction 1 1.1 Formal Description.......................... 3 1.2 Thesis Overview...........................

More information

Searching for Configurations in Clone Evaluation A Replication Study

Searching for Configurations in Clone Evaluation A Replication Study Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College

More information

Plagiarism detection for Java: a tool comparison

Plagiarism detection for Java: a tool comparison Plagiarism detection for Java: a tool comparison Jurriaan Hage e-mail: jur@cs.uu.nl homepage: http://www.cs.uu.nl/people/jur/ Joint work with Peter Rademaker and Nikè van Vugt. Department of Information

More information

On the Robustness of Clone Detection to Code Obfuscation

On the Robustness of Clone Detection to Code Obfuscation On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de

More information

Large-Scale Clone Detection and Benchmarking

Large-Scale Clone Detection and Benchmarking Large-Scale Clone Detection and Benchmarking A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in

More information

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus

Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Instructor-Centric Source Code Plagiarism Detection and Plagiarism Corpus Jonathan Y. H. Poon, Kazunari Sugiyama, Yee Fan Tan, Min-Yen Kan National University of Singapore Introduction Plagiarism in undergraduate

More information

Duplication de code: un défi pour l assurance qualité des logiciels?

Duplication de code: un défi pour l assurance qualité des logiciels? Duplication de code: un défi pour l assurance qualité des logiciels? Foutse Khomh S.W.A.T http://swat.polymtl.ca/ 2 JHotDraw 3 Code duplication can be 4 Example of code duplication Duplication to experiment

More information

Overview of SOCO Track on the Detection of SOurce COde Re-use

Overview of SOCO Track on the Detection of SOurce COde Re-use PAN@FIRE: Overview of SOCO Track on the Detection of SOurce COde Re-use Enrique Flores 1, Paolo Rosso 1 Lidia Moreno 1, and Esaú Villatoro-Tello 2 1 Universitat Politècnica de València, Spain, {eflores,prosso,lmoreno}@dsic.upv.es

More information

Clone Detection and Maintenance with AI Techniques. Na Meng Virginia Tech

Clone Detection and Maintenance with AI Techniques. Na Meng Virginia Tech Clone Detection and Maintenance with AI Techniques Na Meng Virginia Tech Code Clones Developers copy and paste code to improve programming productivity Clone detections tools are needed to help bug fixes

More information

Toxic Code Snippets on Stack Overflow

Toxic Code Snippets on Stack Overflow 1 Toxic Code Snippets on Stack Overflow Chaiyong Ragkhitwetsagul, Jens Krinke, Matheus Paixao, Giuseppe Bianco, Rocco Oliveto University College London, London, UK University of Molise, Campobasso, Italy

More information

Compiling clones: What happens?

Compiling clones: What happens? Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca

More information

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja

More information

A Technique to Detect Multi-grained Code Clones

A Technique to Detect Multi-grained Code Clones Detection Time The Number of Detectable Clones A Technique to Detect Multi-grained Code Clones Yusuke Yuki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

A Framework for Evaluating Mobile App Repackaging Detection Algorithms

A Framework for Evaluating Mobile App Repackaging Detection Algorithms A Framework for Evaluating Mobile App Repackaging Detection Algorithms Heqing Huang, PhD Candidate. Sencun Zhu, Peng Liu (Presenter) & Dinghao Wu, PhDs Repackaging Process Downloaded APK file Unpack Repackaged

More information

CS/COE 1501

CS/COE 1501 CS/COE 1501 www.cs.pitt.edu/~lipschultz/cs1501/ Sorting The sorting problem Given a list of n items, place the items in a given order Ascending or descending Numerical Alphabetical etc. First, we ll review

More information

ISSN: (PRINT) ISSN: (ONLINE)

ISSN: (PRINT) ISSN: (ONLINE) IJRECE VOL. 5 ISSUE 2 APR.-JUNE. 217 ISSN: 2393-928 (PRINT) ISSN: 2348-2281 (ONLINE) Code Clone Detection Using Metrics Based Technique and Classification using Neural Network Sukhpreet Kaur 1, Prof. Manpreet

More information

An Information Retrieval Approach for Source Code Plagiarism Detection

An Information Retrieval Approach for Source Code Plagiarism Detection -2014: An Information Retrieval Approach for Source Code Plagiarism Detection Debasis Ganguly, Gareth J. F. Jones CNGL: Centre for Global Intelligent Content School of Computing, Dublin City University

More information

An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram

An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram Shuang Guo 1, 2, b 1, 2, a, JianBin Liu 1 School of Computer, Science Beijing Information Science & Technology

More information

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University CS 112 Introduction to Computing II Wayne Snyder Department Boston University Today Recursive Sorting Methods and their Complexity: Mergesort Conclusions on sorting algorithms and complexity Next Time:

More information

Improving Plagiarism Detection. Tomas Votroubek

Improving Plagiarism Detection. Tomas Votroubek Improving Plagiarism Detection Tomas Votroubek January 7, 208 Prohlašuji, že jsem předloženou práci vypracoval samostatně a že jsem uvedl veškeré použité informační zdroje v souladu s Metodickým pokynem

More information

switch case Logic Syntax Basics Functionality Rules Nested switch switch case Comp Sci 1570 Introduction to C++

switch case Logic Syntax Basics Functionality Rules Nested switch switch case Comp Sci 1570 Introduction to C++ Comp Sci 1570 Introduction to C++ Outline 1 Outline 1 Outline 1 switch ( e x p r e s s i o n ) { case c o n s t a n t 1 : group of statements 1; break ; case c o n s t a n t 2 : group of statements 2;

More information

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort

Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort 1 Cosc 241 Programming and Problem Solving Lecture 17 (30/4/18) Quicksort Michael Albert michael.albert@cs.otago.ac.nz Keywords: sorting, quicksort The limits of sorting performance Algorithms which sort

More information

MeCC: Memory Comparisonbased Clone Detector

MeCC: Memory Comparisonbased Clone Detector MeCC: Memory Comparisonbased Clone Detector Heejung Kim 1, Yungbum Jung 1, Sunghun Kim 2, and Kwangkeun Yi 1 1 Seoul National University 2 The Hong Kong University of Science and Technology http://ropas.snu.ac.kr/mecc/

More information

Enhancing Source-Based Clone Detection Using Intermediate Representation

Enhancing Source-Based Clone Detection Using Intermediate Representation Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software

More information

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools Jeffrey Svajlenko Chanchal K. Roy University of Saskatchewan, Canada {jeff.svajlenko, chanchal.roy}@usask.ca Slawomir

More information

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Eunjong Choi 1, Norihiro Yoshida 2, Takashi Ishio 1, Katsuro Inoue 1, Tateki Sano 3 1 Graduate School of Information Science and

More information

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University 9/5/6 CS Introduction to Computing II Wayne Snyder Department Boston University Today: Arrays (D and D) Methods Program structure Fields vs local variables Next time: Program structure continued: Classes

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

Pertinence of Lexical and Structural Features for Plagiarism Detection in Source Code

Pertinence of Lexical and Structural Features for Plagiarism Detection in Source Code Pertinence of Lexical and Structural Features for Plagiarism Detection in Source Code A. Ramírez-de-la-Cruz, G. Ramírez-de-la-Rosa, C. Sánchez-Sánchez, H. Jiménez-Salazar, and E. Villatoro-Tello Departamento

More information

An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies

An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies Ripon K. Saha Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {ripon.saha,

More information

Phase-based algorithms for file migration

Phase-based algorithms for file migration Phase-based algorithms for file migration Marcin Bieńkowski Jarek Byrka Marcin Mucha University of Wrocław University of Warsaw HALG 2018 (previously on ICALP 2017) File migration Weighted graph "2 File

More information

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code Junaid Akram (Member, IEEE), Zhendong Shi, Majid Mumtaz and Luo Ping State Key Laboratory of Information Security,

More information

Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering

Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering Manziba Akanda Nishi a, Kostadin Damevski a a Department of Computer Science, Virginia Commonwealth University Abstract Code

More information

Ctcompare: Comparing Multiple Code Trees for Similarity

Ctcompare: Comparing Multiple Code Trees for Similarity Ctcompare: Comparing Multiple Code Trees for Similarity Warren Toomey School of IT, Bond University Using lexical analysis with techniques borrowed from DNA sequencing, multiple code trees can be quickly

More information

2IS55 Software Evolution. Code duplication. Alexander Serebrenik

2IS55 Software Evolution. Code duplication. Alexander Serebrenik 2IS55 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 2: February 28, 2014, 23:59. Assignment 3 already open. Code duplication Individual Deadline: March 17, 2013, 23:59.

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

COS 226 Midterm Fall 2007

COS 226 Midterm Fall 2007 1. Partitioning (5 points). Give the result of partitioning the array with standard Quicksort partitioning (taking the rightmost N as the partitioning element). P A R T I T I O N I N G Q U E S T I O N

More information

Incremental Clone Detection and Elimination for Erlang Programs

Incremental Clone Detection and Elimination for Erlang Programs Incremental Clone Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015 RESEARCH ARTICLE Code Clone Detection and Analysis Using Software Metrics and Neural Network-A Literature Review Balwinder Kumar [1], Dr. Satwinder Singh [2] Department of Computer Science Engineering

More information

Detection and Analysis of Software Clones

Detection and Analysis of Software Clones Detection and Analysis of Software Clones By Abdullah Mohammad Sheneamer M.S., University of Colorado at Colorado Springs, Computer Science, USA, 2012 B.S., University of King Abdulaziz, Computer Science,

More information

Lecture 1: Overview of Java

Lecture 1: Overview of Java Lecture 1: Overview of Java What is java? Developed by Sun Microsystems (James Gosling) A general-purpose object-oriented language Based on C/C++ Designed for easy Web/Internet applications Widespread

More information

COMP 202 Recursion. CONTENTS: Recursion. COMP Recursion 1

COMP 202 Recursion. CONTENTS: Recursion. COMP Recursion 1 COMP 202 Recursion CONTENTS: Recursion COMP 202 - Recursion 1 Recursive Thinking A recursive definition is one which uses the word or concept being defined in the definition itself COMP 202 - Recursion

More information

2IMP25 Software Evolution. Code duplication. Alexander Serebrenik

2IMP25 Software Evolution. Code duplication. Alexander Serebrenik 2IMP25 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 1 Median 7, mean 6.87 My grades: 3-3-1-1-2-1-4 You ve done much better than me ;-) Clear, fair grading BUT tedious

More information

OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale

OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale OSSPolice - Identifying Open-Source License Violation and 1-day Security Risk at Large Scale Ruian Duan, Ashish Bijlani, Meng Xu Taesoo Kim, Wenke Lee ACM CCS 2017 1 Background Open Source Software (OSS)

More information

Lecture 4: MIPS Instruction Set

Lecture 4: MIPS Instruction Set Lecture 4: MIPS Instruction Set No class on Tuesday Today s topic: MIPS instructions Code examples 1 Instruction Set Understanding the language of the hardware is key to understanding the hardware/software

More information

Java Archives Search Engine Using Byte Code as Information Source

Java Archives Search Engine Using Byte Code as Information Source Java Archives Search Engine Using Byte Code as Information Source Oscar Karnalim School of Electrical Engineering and Informatics Bandung Institute of Technology Bandung, Indonesia 23512012@std.stei.itb.ac.id

More information

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented

More information

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP*

Compiling and running OpenMP programs. C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp. Programming with OpenMP* Advanced OpenMP Compiling and running OpenMP programs C/C++: cc fopenmp o prog prog.c -lomp CC fopenmp o prog prog.c -lomp 2 1 Running Standard environment variable determines the number of threads: tcsh

More information

Automatic Identification of Important Clones for Refactoring and Tracking

Automatic Identification of Important Clones for Refactoring and Tracking Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,

More information

Lecture 1 - Introduction (Class Notes)

Lecture 1 - Introduction (Class Notes) Lecture 1 - Introduction (Class Notes) Outline: How does a computer work? Very brief! What is programming? The evolution of programming languages Generations of programming languages Compiled vs. Interpreted

More information

ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling

ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS By Michelle Dowling Motivation Searching for similar mobile apps is becoming increasingly important Looking for substitute apps Opportunistic code reuse

More information

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space

Opera Web Browser Archive - FTP Site Statistics. Top 20 Directories Sorted by Disk Space Property Value FTP Server ftp.opera.com Description Opera Web Browser Archive Country United States Scan Date 04/Nov/2015 Total Dirs 1,557 Total Files 2,211 Total Data 43.83 GB Top 20 Directories Sorted

More information

A Tree Kernel Based Approach for Clone Detection

A Tree Kernel Based Approach for Clone Detection A Tree Kernel Based Approach for Clone Detection Anna Corazza 1, Sergio Di Martino 1, Valerio Maggio 1, Giuseppe Scanniello 2 1) University of Naples Federico II 2) University of Basilicata Outline Background

More information

2IS55 Software Evolution. Code duplication. Alexander Serebrenik

2IS55 Software Evolution. Code duplication. Alexander Serebrenik 2IS55 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 2: March 5, 2013, 23:59. Assignment 3 already open. Code duplication Individual Deadline: March 12, 2013, 23:59. /

More information

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones Detection using Textual and Metric Analysis to figure out all Types of s Kodhai.E 1, Perumal.A 2, and Kanmani.S 3 1 SMVEC, Dept. of Information Technology, Puducherry, India Email: kodhaiej@yahoo.co.in

More information

CS Programming I: Programming Process

CS Programming I: Programming Process CS 200 - Programming I: Programming Process Marc Renault Department of Computer Sciences University of Wisconsin Madison Fall 2017 TopHat Sec 3 (PM) Join Code: 719946 TopHat Sec 4 (AM) Join Code: 891624

More information

Lab5. Wooseok Kim

Lab5. Wooseok Kim Lab5 Wooseok Kim wkim3@albany.edu www.cs.albany.edu/~wooseok/201 Question Answer Points 1 A or B 8 2 A 8 3 D 8 4 20 5 for class 10 for main 5 points for output 5 D or E 8 6 B 8 7 1 15 8 D 8 9 C 8 10 B

More information

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

Programming by Delegation

Programming by Delegation Chapter 2 a Programming by Delegation I. Scott MacKenzie a These slides are mostly based on the course text: Java by abstraction: A client-view approach (4 th edition), H. Roumani (2015). 1 Topics What

More information

Crawling. CS6200: Information Retrieval. Slides by: Jesse Anderton

Crawling. CS6200: Information Retrieval. Slides by: Jesse Anderton Crawling CS6200: Information Retrieval Slides by: Jesse Anderton Motivating Problem Internet crawling is discovering web content and downloading it to add to your index. This is a technically complex,

More information

Navigating the Guix Subsystems

Navigating the Guix Subsystems Navigating the Guix Subsystems Ludovic Courtès GNU Hackers Meeting, Rennes, August 2016 The Emacs of distros When large numbers of nontechnical workers are using a programmable editor, they will be tempted

More information

Technical lossless / near lossless data compression

Technical lossless / near lossless data compression Technical lossless / near lossless data compression Nigel Atkinson (Met Office, UK) ECMWF/EUMETSAT NWP SAF Workshop 5-7 Nov 2013 Contents Survey of file compression tools Studies for AVIRIS imager Study

More information

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Jeffrey Svajlenko Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Saskatoon,

More information

Sub-clones: Considering the Part Rather than the Whole

Sub-clones: Considering the Part Rather than the Whole Sub-clones: Considering the Part Rather than the Whole Robert Tairas 1 and Jeff Gray 2 1 Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 2 Department

More information

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Accuracy Enhancement in Code Clone Detection Using Advance Normalization Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research

More information

Static Pruning of Terms In Inverted Files

Static Pruning of Terms In Inverted Files In Inverted Files Roi Blanco and Álvaro Barreiro IRLab University of A Corunna, Spain 29th European Conference on Information Retrieval, Rome, 2007 Motivation : to reduce inverted files size with lossy

More information

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005

Ascenium: A Continuously Reconfigurable Architecture. Robert Mykland Founder/CTO August, 2005 Ascenium: A Continuously Reconfigurable Architecture Robert Mykland Founder/CTO robert@ascenium.com August, 2005 Ascenium: A Continuously Reconfigurable Processor Continuously reconfigurable approach provides:

More information

INF 212 ANALYSIS OF PROG. LANGS PLUGINS. Instructors: Crista Lopes Copyright Instructors.

INF 212 ANALYSIS OF PROG. LANGS PLUGINS. Instructors: Crista Lopes Copyright Instructors. INF 212 ANALYSIS OF PROG. LANGS PLUGINS Instructors: Crista Lopes Copyright Instructors. Modules as conceptual units Modules as physical components Software modules as physical components Source components

More information

OFF-SITE LEARNING SCHEDULE

OFF-SITE LEARNING SCHEDULE OFF-SITE LEARNING SCHEDULE (2015-2016 / 2016-2017) FULL TIME STUDIES (ONE YEAR) PACK 01 Preliminary exercise: conventional vs. applied to new technologies October 15, 2015 Translation: theory and methodology

More information

Lab5. Wooseok Kim

Lab5. Wooseok Kim Lab5 Wooseok Kim wkim3@albany.edu www.cs.albany.edu/~wooseok/201 Question Answer Points 1 A 8 2 A 8 3 E 8 4 D 8 5 20 5 for class 10 for main 5 points for output 6 A 8 7 B 8 8 0 15 9 D 8 10 B 8 Question

More information

JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection

JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection Using Method IOE-Behavior Rochelle Elva and Gary T. Leavens CS-TR-12-07 October 15, 2012 Keywords: Automated semantic clone detection

More information

Playing Cupid: The IDE as a Matchmaker for Plug-Ins

Playing Cupid: The IDE as a Matchmaker for Plug-Ins Playing Cupid: The IDE as a Matchmaker for Plug-Ins Todd W. Schiller and Brandon Lucia Department of Computer Science University of Washington Seattle, Washington {tws,blucia0a}@cs.washington.edu Abstract

More information

엄현상 (Eom, Hyeonsang) School of Computer Science and Engineering Seoul National University COPYRIGHTS 2017 EOM, HYEONSANG ALL RIGHTS RESERVED

엄현상 (Eom, Hyeonsang) School of Computer Science and Engineering Seoul National University COPYRIGHTS 2017 EOM, HYEONSANG ALL RIGHTS RESERVED 엄현상 (Eom, Hyeonsang) School of Computer Science and Engineering Seoul National University COPYRIGHTS 2017 EOM, HYEONSANG ALL RIGHTS RESERVED Outline - Questionnaire Results - Java Overview - Java Examples

More information

Soot, a Tool for Analyzing and Transforming Java Bytecode

Soot, a Tool for Analyzing and Transforming Java Bytecode Soot, a Tool for Analyzing and Transforming Java Bytecode Laurie Hendren, Patrick Lam, Jennifer Lhoták, Ondřej Lhoták and Feng Qian McGill University Special thanks to John Jorgensen and Navindra Umanee

More information

A Survey of Software Clone Detection Techniques

A Survey of Software Clone Detection Techniques A Survey of Software Detection Techniques Abdullah Sheneamer Department of Computer Science University of Colorado at Colo. Springs, USA Colorado Springs, USA asheneam@uccs.edu Jugal Kalita Department

More information

MeCC: Memory Comparison-based Clone Detector

MeCC: Memory Comparison-based Clone Detector MeCC: Memory Comparison-based Clone Detector ABSTRACT Heejung Kim Seoul National University hjkim@ropas.snu.ac.kr Sunghun Kim The Hong Kong University of Science and Technology hunkim@cse.ust.hk In this

More information

Research Article An Empirical Study on the Impact of Duplicate Code

Research Article An Empirical Study on the Impact of Duplicate Code Advances in Software Engineering Volume 212, Article ID 938296, 22 pages doi:1.1155/212/938296 Research Article An Empirical Study on the Impact of Duplicate Code Keisuke Hotta, Yui Sasaki, Yukiko Sano,

More information

On the Stability of Software Clones: A Genealogy-Based Empirical Study

On the Stability of Software Clones: A Genealogy-Based Empirical Study On the Stability of Software Clones: A Genealogy-Based Empirical Study A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for the degree of Master

More information

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures

Big picture. Definitions. Internal sorting. Exchange sorts. Insertion sort Bubble sort Selection sort Comparison. Comp Sci 1575 Data Structures Internal sorting Comp Sci 1575 Data Structures Admin notes Advising appointments will eclipse office hours this week, so no guarantees about availability during normal times. With 130 appointments at 15

More information

Mining Revision Histories to Detect Cross-Language Clones without Intermediates

Mining Revision Histories to Detect Cross-Language Clones without Intermediates Mining Revision Histories to Detect Cross-Language Clones without Intermediates Xiao Cheng 1, Zhiming Peng 2, Lingxiao Jiang 2, Hao Zhong 1, Haibo Yu 3, Jianjun Zhao 4 1 Department of Computer Science

More information

Lecture 2. COMP1406/1006 (the Java course) Fall M. Jason Hinek Carleton University

Lecture 2. COMP1406/1006 (the Java course) Fall M. Jason Hinek Carleton University Lecture 2 COMP1406/1006 (the Java course) Fall 2013 M. Jason Hinek Carleton University today s agenda a quick look back (last Thursday) assignment 0 is posted and is due this Friday at 2pm Java compiling

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Cost of Your Programs

Cost of Your Programs Department of Computer Science and Engineering Chinese University of Hong Kong In the class, we have defined the RAM computation model. In turn, this allowed us to define rigorously algorithms and their

More information

arxiv: v1 [cs.se] 8 Aug 2017

arxiv: v1 [cs.se] 8 Aug 2017 Cherry-Picking of Code Commits in Long-Running, Multi-release Software University of the Thai Chamber of Commerce panuchart_bun,chadarat_phi@utcc.ac.th arxiv:1708.02393v1 [cs.se] 8 Aug 2017 ABSTRACT This

More information

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones Overview Code Clones Definition and categories Clone detection Clone removal refactoring Spiros Mancoridis[1] Modified by Na Meng 2 Code Clones Code clone is a code fragment in source files that is identical

More information

Code Clone Detection on Specialized PDGs with Heuristics

Code Clone Detection on Specialized PDGs with Heuristics 2011 15th European Conference on Software Maintenance and Reengineering Code Clone Detection on Specialized PDGs with Heuristics Yoshiki Higo Graduate School of Information Science and Technology Osaka

More information

Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments

Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments Informatics in Education, 2016, Vol. 15, No. 1, 103 126 2016 Vilnius University DOI: 10.15388/infedu.2016.06 103 Process Model Improvement for Source Code Plagiarism Detection in Student Programming Assignments

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

4.1, 4.2 Performance, with Sorting

4.1, 4.2 Performance, with Sorting 1 4.1, 4.2 Performance, with Sorting Running Time As soon as an Analytic Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question

More information

Sorting. 4.2 Sorting and Searching. Sorting. Sorting. Insertion Sort. Sorting. Sorting problem. Rearrange N items in ascending order.

Sorting. 4.2 Sorting and Searching. Sorting. Sorting. Insertion Sort. Sorting. Sorting problem. Rearrange N items in ascending order. 4.2 and Searching pentrust.org Introduction to Programming in Java: An Interdisciplinary Approach Robert Sedgewick and Kevin Wayne Copyright 2002 2010 23/2/2012 15:04:54 pentrust.org pentrust.org shanghaiscrap.org

More information

Gapped Code Clone Detection with Lightweight Source Code Analysis

Gapped Code Clone Detection with Lightweight Source Code Analysis Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information