CloPlag. A Study of Effects of Code Obfuscation to Code Similarity Detection Tools. Chaiyong Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan

Size: px
Start display at page:

Download "CloPlag. A Study of Effects of Code Obfuscation to Code Similarity Detection Tools. Chaiyong Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan"

Transcription

1 CloPlag A Study of Effects of Code Obfuscati to Code Similarity Detecti Tools Chaiyg Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan

2 Cled Code vs Plagiarised Code A result from source code reuse by copying and pasting [maybe with some modificatis] Segments of code which are identical or similar Code maintenance and management In some cases, code cling may violate software license 1 Created in a similar way as code cles but with different intenti Source code plagiarism violates academic regulatis Oracle vs Google law suit 2 [1] A. Mden, S. Okahara, Y. Manabe, and K. Matsumoto, Guilty or Not Guilty: Using Cle Metrics to Determine Open Source Licensing Violatis, IEEE Software, vol. 28,. 2, pp , [2] 2

3 3 What is Obfuscati? Modifying a program while preserving its semantics Can be achieved at 2 levels: Source code Byte code

4 4 Research Questis RQ1: how do current detecti tools perform against code obfuscati? RQ2: what is the best parameter settings and similarity threshold of each tool? RQ3: how do compilati and decompilati facilitate the detecti process? RQ4: can we apply the best parameters and threshold to other datasets effectively?

5 5 Overview of the Empirical Study Java programs are obfuscated at: Source code level Byte code level Combinati of both Several similarity detecti tools are applied to the data set Varying the settings and threshold of each tool Measure performance of each tool

6 6 Tools Obfuscators Decompilers Detectors ARTIFICE ProGuard Procy Kraka Cle SW plagiarism Compressi Others

7 7 Obfuscators ARTIFICE Source code level Renaming, changing loops & cditial statements, changing increment/ decrement statements ProGuard Bytecode level Rename classes, fields, variables to short, meaningless Schulze, S., & Meyer, D. (2013). On the robustness of cle detecti to code obfuscati th Internatial Workshop Software Cles (IWSC)

8 8 Detectors Cle detectors CCFinderX icles Simian, NiCad Deckard Plagiarism detectors JPlag Sherlock, Plaggie Sim ncd-bzlib 7zncd-BZip2 Inclusi Compressi diff, bsdiff py-difflib py-sklearn.cosine_similarity Others * Totally 21 tools ** All tools have to report similarity values (0-100)

9 9 Test Case 1 RQ1: how do current detecti tools perform against code obfuscati? RQ2: what is the best parameter settings and similarity threshold of each tool? A series of small Java programs InfixCverter SqrtAlgorithm Hai Queens MagicSquare

10 10 Test Data Preparati obfuscated source code to be used in detecti phase source obfuscator compiler bytecode obfuscator decompilers obfuscated code ARTIFICE ProGuard Procy inal javac InfixCverter.java SqrtAlgorithm.java Hai.java Queens.java MagicSquare.java Kraka

11 11 Similarity Calculati 5 sets Hai inal Detecti tools 10 files /set obfuscated ccfx* jplag* sim py-difflib similarity report * Most tools have different parameter settings which can strgly affect the results

12 12 Similarity Calculati for Unsupported Tools 0_ 0_.txt 0_.xml Simian GCF File Cverters SimCal _artifice 1_artifice.t xt 1_artifice.x ml Tools using GCF 1 + SimCal include Simian (textual report) icles (RCF format) NiCad (XML report) Deckard (textual report) [1] Wang, T., Harman, M., Jia, Y., & Krinke, J. (2013). Searching for Better Cfiguratis: A Rigorous Approach to Cle Evaluati. FSE 1

13 Similarity Report (ncd-bzlib) 13 Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice

14 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice

15 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice

16 16 1. Best threshold (T) Find the best threshold (T) of each tool with a specific parameter setting Calculate a sum of false positive and false negative (FP + FN) of all thresholds Choose T with the minimum false results BestThreshold = {T Min(FPT + FNT)}

17 17 Threshold selecti Best threshold = 31 (FP+FN=166) Threshold TP FP TN FN FP+FN Precisi Recall F-measure (F1)

18 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice

19 19 2. Manually-inspected results Pairs that are closest to the threshold are very sensitive to be false positive or false negative We have fixed cost for doing manual inspecti Remove the top 50, 100, and 200 closest to the threshold for manual inspecti Evaluate the new results after removal distance from T = classifier(x,y)-t

20 20 RQ1: how do current detecti tools perform against code obfuscati?

21 21 RQ2: what is the best parameter settings and similarity threshold of each tool? Tools 7zncd-BZip2 Setting s No removal Remove 50 Remove 100 Remove 200 T FP + FN Setting s T FP + FN Setting s T FP + FN Settings m0=2, m0=2, m0=2, m0=2, mx={1,3,5} mx={1,3,5} mx={1,3,5} mx={1,3,5} ncd-bzlib T FP + FN ccfx 1 b=20, t={1..7} 4 90 b=20, t={1..5} 4 b=21, b=20, t={6,7} 3 t={1..7} b=21, b=22,t=7 t={1..7} b=23,t=7 2 b=22, b=24,t={1..7} t={1..7} jplag-java t= t= b=20, t={1..7} b=21, t={1..7} t= t=6 28 t= b=18, t={1..7} b=19, t={1..7} b=20, t={1..5} b=20, t={6,7} b=21, t={1..7} b=22, t={1..7 b=23, t={1..7} t= jplag-text 3 t= t= t= t= simjava 2 r= r= r= r= py-difflib SM_auto junk SM_auto junk SM_auto junk SM_ whitespace _autojunk pysklearn.cosine_similarit y

22 22 Test Case 2 Observati: compiling/decompiling canicalises the data RQ3: how do compilati and decompilati facilitate the detecti process? Experiment Compiled/decompiled versi of the 1st dataset Two different decompilers: Kraka vs Procy Repeat the detecti steps of Test Case 1 and compare the tool performances

23 23 Compiling/Decompiling Process * some tools have different parameter settings which can affect the results 5 sets Hai inal Compile Decompile Procy Detecti tools similarity report 10 files /set obfuscated javac Kraka ccfx* jplag* sim py-difflib similarity report

24 24 RQ3: how do compilati and decompilati facilitate the detecti process?

25 25 Original ARTIFICE (source-code obfuscated versi)

26 26 Original (decompiled) ARTIFICE (decompiled)

27 27 Test Case 3 RQ4: can we apply the best parameters and threshold to other datasets effectively? Experiment Munich dataset ctaining simis independently developed Java programs for address validati [1] Juergens, E., Deissenboeck, F., & Hummel, B. (2011). Code similarities beyd copy & paste. Proceedings of the European Cference Software Maintenance and Reengineering, CSMR, 78 87

28 28 RQ4: can we apply the best parameters and threshold to other datasets effectively? Tools Settings Test Case 1 (2,500) Test Case 3 (munich) (11,881) T FP+FN T FP+FN ccfx b=20, t={1..7} ,797 simjava r= ,680 jplag-text t= ,770 py-difflib SM_auto junk ,446 7zdcd-Bzip2 m0=2, mx={1,3,5} ,432 ncd-bzlib ,754 jplag-java t= ,162 py-sklearn.cosine_similarity ,282

29 29 Summary Current tools behave differently obfuscated code Cle and plagiarism detectors outperform the others The best parameter settings and threshold can be found using the proposed method Compiling/decompiling can help canicalise the obfuscated code The derived settings and threshold cant be applied directly to other data sets

30 30 What s next? Replicate the experiment other data sets and compare the best parameter settings and thresholds SOCO (detecti of SOurce COde re-use) The Java collecti ctains 259 source codes The C collecti ctains 79 source codes Find a better way to learn the best parameter settings and threshold in ad-hoc manner for each data set (or pair)

31 Questis? 31

CloPlag. A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools. Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan

CloPlag. A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools. Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan CloPlag A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan 1 Outline Background Motivation and Research Questions Tools

More information

A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark

A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark SCAM 16, EMSE (under reviewed) Photo: https://c1.staticflickr.com/1/316/31831180223_38db905f28_c.jpg 1 When source code

More information

Research Note RN/17/04. A Comparison of Code Similarity Analysers

Research Note RN/17/04. A Comparison of Code Similarity Analysers UCL DEPARTMENT OF COMPUTER SCIENCE Research Note RN/17/04 A Comparison of Code Similarity Analysers 20 February 2017 Chaiyong Ragkhitwetsagul Jens Krinke David Clark Abstract Source code analysis to detect

More information

A Comparison of Code Similarity Analysers

A Comparison of Code Similarity Analysers Noname manuscript No. (will be inserted by the editor) A Comparison of Code Similarity Analysers Chaiyong Ragkhitwetsagul Jens Krinke David Clark Received: date / Accepted: date Abstract Copying and pasting

More information

Measuring Code Similarity in Large-scaled Code Corpora

Measuring Code Similarity in Large-scaled Code Corpora Measuring Code Similarity in Large-scaled Code Corpora Chaiyong Ragkhitwetsagul CREST, Department of Computer Science University College London, UK Abstract Source code similarity measurement is a fundamental

More information

Code Duplication: A Measurable Technical Debt?

Code Duplication: A Measurable Technical Debt? UCL 2014 wmjkucl 05/12/2016 Code Duplication: A Measurable Technical Debt? Jens Krinke Centre for Research on Evolution, Search & Testing Software Systems Engineering Group Department of Computer Science

More information

Searching for Configurations in Clone Evaluation A Replication Study

Searching for Configurations in Clone Evaluation A Replication Study Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College

More information

On the Robustness of Clone Detection to Code Obfuscation

On the Robustness of Clone Detection to Code Obfuscation On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de

More information

Using Compilation/Decompilation to Enhance Clone Detection

Using Compilation/Decompilation to Enhance Clone Detection Using Compilation/Decompilation to Enhance Clone Detection Chaiyong Ragkhitwetsagul, Jens Krinke University College London, UK Abstract We study effects of compilation and decompilation to code clone detection

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

Plagiarism detection for Java: a tool comparison

Plagiarism detection for Java: a tool comparison Plagiarism detection for Java: a tool comparison Jurriaan Hage e-mail: jur@cs.uu.nl homepage: http://www.cs.uu.nl/people/jur/ Joint work with Peter Rademaker and Nikè van Vugt. Department of Information

More information

A Framework for Evaluating Mobile App Repackaging Detection Algorithms

A Framework for Evaluating Mobile App Repackaging Detection Algorithms A Framework for Evaluating Mobile App Repackaging Detection Algorithms Heqing Huang, PhD Candidate. Sencun Zhu, Peng Liu (Presenter) & Dinghao Wu, PhDs Repackaging Process Downloaded APK file Unpack Repackaged

More information

An Information Retrieval Approach for Source Code Plagiarism Detection

An Information Retrieval Approach for Source Code Plagiarism Detection -2014: An Information Retrieval Approach for Source Code Plagiarism Detection Debasis Ganguly, Gareth J. F. Jones CNGL: Centre for Global Intelligent Content School of Computing, Dublin City University

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

Threshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository

Threshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository Threshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository Iman Keivanloo Department of Electrical and Computer Engineering Queen s University Kingston, Ontario, Canada iman.keivanloo@queensu.ca

More information

Classification Part 4

Classification Part 4 Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate

More information

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented

More information

ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling

ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS By Michelle Dowling Motivation Searching for similar mobile apps is becoming increasingly important Looking for substitute apps Opportunistic code reuse

More information

Rendezvous: A search engine for binary code

Rendezvous: A search engine for binary code Rendezvous: A search engine for binary code Wei Ming Khoo, Alan Mycroft, Ross Anderson University of Cambridge CREST Open Workshop on Malware 29 May 2013 Demo: http://www.rendezvousalpha.com 1 Software

More information

Source Code Plagiarism Detection using Machine Learning

Source Code Plagiarism Detection using Machine Learning Source Code Plagiarism Detection using Machine Learning Utrecht University Daniël Heres August 2017 Contents 1 Introduction 1 1.1 Formal Description.......................... 3 1.2 Thesis Overview...........................

More information

An Approach to Detect Clones in Class Diagram Based on Suffix Array

An Approach to Detect Clones in Class Diagram Based on Suffix Array An Approach to Detect Clones in Class Diagram Based on Suffix Array Amandeep Kaur, Computer Science and Engg. Department, BBSBEC Fatehgarh Sahib, Punjab, India. Manpreet Kaur, Computer Science and Engg.

More information

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,

More information

DELDroid: Determination & Enforcement of Least Privilege Architecture in AnDroid

DELDroid: Determination & Enforcement of Least Privilege Architecture in AnDroid DELDroid: Determination & Enforcement of Least Privilege Architecture in AnDroid Mahmoud Hammad Software Engineering Ph.D. Candidate Mahmoud Hammad, Hamid Bagheri, and Sam Malek IEEE International Conference

More information

An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram

An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram Shuang Guo 1, 2, b 1, 2, a, JianBin Liu 1 School of Computer, Science Beijing Information Science & Technology

More information

Web Information Retrieval. Exercises Evaluation in information retrieval

Web Information Retrieval. Exercises Evaluation in information retrieval Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need

More information

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques

Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:

More information

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Study of Different

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

Joe Raad [1,2], Wouter Beek [3], Frank van Harmelen [3], Nathalie Pernelle [2], Fatiha Saïs [2]

Joe Raad [1,2], Wouter Beek [3], Frank van Harmelen [3], Nathalie Pernelle [2], Fatiha Saïs [2] DETECTING ERRONEOUS IDENTITY LINKS IN THE WEB OF DATA Joe Raad [1,2], Wouter Beek [3], Frank van Harmelen [3], Nathalie Pernelle [2], Fatiha Saïs [2] joe.raad@agroparistech.fr [1] INRA, Paris France [2]

More information

Internet Traffic Classification using Machine Learning

Internet Traffic Classification using Machine Learning Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.

More information

Large-Scale Clone Detection and Benchmarking

Large-Scale Clone Detection and Benchmarking Large-Scale Clone Detection and Benchmarking A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in

More information

A Measurement of Similarity to Identify Identical Code Clones

A Measurement of Similarity to Identify Identical Code Clones The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department

More information

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools Jeffrey Svajlenko Chanchal K. Roy University of Saskatchewan, Canada {jeff.svajlenko, chanchal.roy}@usask.ca Slawomir

More information

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and

More information

JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection

JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection Using Method IOE-Behavior Rochelle Elva and Gary T. Leavens CS-TR-12-07 October 15, 2012 Keywords: Automated semantic clone detection

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF

More information

Detection and Analysis of Software Clones

Detection and Analysis of Software Clones Detection and Analysis of Software Clones By Abdullah Mohammad Sheneamer M.S., University of Colorado at Colorado Springs, Computer Science, USA, 2012 B.S., University of King Abdulaziz, Computer Science,

More information

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !

Floating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. ! Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Detecting Common Modules in Java Packages Based on Static Object Trace Birthmark

Detecting Common Modules in Java Packages Based on Static Object Trace Birthmark The Computer Journal Advance Access published November 5, 2009 The Author 2009. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please

More information

Classify My Social Contacts into Circles Stanford University CS224W Fall 2014

Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Amer Hammudi (SUNet ID: ahammudi) ahammudi@stanford.edu Darren Koh (SUNet: dtkoh) dtkoh@stanford.edu Jia Li (SUNet: jli14)

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Research Article An Empirical Study on the Impact of Duplicate Code

Research Article An Empirical Study on the Impact of Duplicate Code Advances in Software Engineering Volume 212, Article ID 938296, 22 pages doi:1.1155/212/938296 Research Article An Empirical Study on the Impact of Duplicate Code Keisuke Hotta, Yui Sasaki, Yukiko Sano,

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

Enhanced Compositional Safety Analysis for Distributed Embedded Systems using LTS Equivalence

Enhanced Compositional Safety Analysis for Distributed Embedded Systems using LTS Equivalence Proceedings of the 6th WSEAS Internatial Cference Applied Computer Science, Hangzhou, China, April 15-17, 2007 115 Enhanced Compositial Safety Analysis for Distributed Embedded Systems using LTS Equivalence

More information

Natural Language Processing Is No Free Lunch

Natural Language Processing Is No Free Lunch Natural Language Processing Is No Free Lunch STEFAN WAGNER UNIVERSITY OF STUTTGART, STUTTGART, GERMANY ntroduction o Impressive progress in NLP: OS with personal assistants like Siri or Cortan o Brief

More information

Evaluation of similarity metrics for programming code plagiarism detection method

Evaluation of similarity metrics for programming code plagiarism detection method Evaluation of similarity metrics for programming code plagiarism detection method Vedran Juričić Department of Information Sciences Faculty of humanities and social sciences University of Zagreb I. Lučića

More information

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Use of Source Code Similarity Metrics in Software Defect Prediction

Use of Source Code Similarity Metrics in Software Defect Prediction 1 Use of Source Code Similarity Metrics in Software Defect Prediction Ahmet Okutan arxiv:1808.10033v1 [cs.se] 29 Aug 2018 Abstract In recent years, defect prediction has received a great deal of attention

More information

A Guided Genetic Algorithm for Automated Crash Reproduction

A Guided Genetic Algorithm for Automated Crash Reproduction A Guided Genetic Algorithm for Automated Crash Reproduction Soltani, Panichella, & van Deursen 2017 International Conference on Software Engineering Presented by: Katie Keith, Emily First, Pradeep Ambati

More information

Review for Test 1 (Chapter 1-5)

Review for Test 1 (Chapter 1-5) Review for Test 1 (Chapter 1-5) 1. Introduction to Computers, Programs, and Java a) What is a computer? b) What is a computer program? c) A bit is a binary digit 0 or 1. A byte is a sequence of 8 bits.

More information

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Accuracy Enhancement in Code Clone Detection Using Advance Normalization Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research

More information

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code Junaid Akram (Member, IEEE), Zhendong Shi, Majid Mumtaz and Luo Ping State Key Laboratory of Information Security,

More information

ABBYY Smart Classifier 2.7 User Guide

ABBYY Smart Classifier 2.7 User Guide ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification

More information

Source Code Reuse Evaluation by Using Real/Potential Copy and Paste

Source Code Reuse Evaluation by Using Real/Potential Copy and Paste Source Code Reuse Evaluation by Using Real/Potential Copy and Paste Takafumi Ohta, Hiroaki Murakami, Hiroshi Igaki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology,

More information

ISSN: (PRINT) ISSN: (ONLINE)

ISSN: (PRINT) ISSN: (ONLINE) IJRECE VOL. 5 ISSUE 2 APR.-JUNE. 217 ISSN: 2393-928 (PRINT) ISSN: 2348-2281 (ONLINE) Code Clone Detection Using Metrics Based Technique and Classification using Neural Network Sukhpreet Kaur 1, Prof. Manpreet

More information

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.

Keywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation. Volume 3, Issue 12, December 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on

More information

Understanding and Detecting Wake Lock Misuses for Android Applications

Understanding and Detecting Wake Lock Misuses for Android Applications Understanding and Detecting Wake Lock Misuses for Android Applications Artifact Evaluated by FSE 2016 Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni Code Analysis, Testing and Learning Research

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Compiling clones: What happens?

Compiling clones: What happens? Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca

More information

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja

More information

KClone: A Proposed Approach to Fast Precise Code Clone Detection

KClone: A Proposed Approach to Fast Precise Code Clone Detection KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3

More information

MeCC: Memory Comparison-based Clone Detector

MeCC: Memory Comparison-based Clone Detector MeCC: Memory Comparison-based Clone Detector Heejung Kim, Yungbum Jung, Sunghun Kim, Kwangkeun Yi Seoul National University, Seoul, Korea {hjkim,dreameye,kwang}@ropas.snu.ac.kr The Hong Kong University

More information

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:

CSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem: Expression evaluation CSE 504 Order of evaluation For the abstract syntax tree + + 5 Expression Evaluation, Runtime Environments + + x 3 2 4 the equivalent expression is (x + 3) + (2 + 4) + 5 1 2 (. Contd

More information

Understanding and Detecting Wake Lock Misuses for Android Applications

Understanding and Detecting Wake Lock Misuses for Android Applications Understanding and Detecting Wake Lock Misuses for Android Applications Artifact Evaluated Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni Code Analysis, Testing and Learning Research Group

More information

A Replication and Reproduction of Code Clone Detection Studies

A Replication and Reproduction of Code Clone Detection Studies A Replication and Reproduction of Code Clone Detection Studies Xiliang Chen Electrical and Computer Engineering The University of Auckland Auckland, New Zealand xche185@aucklanduni.ac.nz Alice Yuchen Wang

More information

Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering

Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering Manziba Akanda Nishi a, Kostadin Damevski a a Department of Computer Science, Virginia Commonwealth University Abstract Code

More information

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

Code Clone Detector: A Hybrid Approach on Java Byte Code

Code Clone Detector: A Hybrid Approach on Java Byte Code Code Clone Detector: A Hybrid Approach on Java Byte Code Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted By

More information

Duplication de code: un défi pour l assurance qualité des logiciels?

Duplication de code: un défi pour l assurance qualité des logiciels? Duplication de code: un défi pour l assurance qualité des logiciels? Foutse Khomh S.W.A.T http://swat.polymtl.ca/ 2 JHotDraw 3 Code duplication can be 4 Example of code duplication Duplication to experiment

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Using Character n gram g Profiles. University of the Aegean

Using Character n gram g Profiles. University of the Aegean Intrinsic Plagiarism Detection Using Character n gram g Profiles Efstathios Stamatatos University of the Aegean Talk Layout Introduction The style change function Detecting plagiarism Evaluation Conclusions

More information

Enhancing Source-Based Clone Detection Using Intermediate Representation

Enhancing Source-Based Clone Detection Using Intermediate Representation Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software

More information

WuKong: A Scalable and Accurate Two-Phase Approach to Android App Clone Detection

WuKong: A Scalable and Accurate Two-Phase Approach to Android App Clone Detection WuKong: A Scalable and Accurate Two-Phase Approach to Android App Clone Detection Haoyu Wang, Yao Guo, Ziang Ma, Xiangqun Chen Key Laboratory of High-Confidence Software Technologies (Ministry of Education)

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Reading assignment: Reviews and Inspections

Reading assignment: Reviews and Inspections Foundations for SE Analysis Reading assignment: Reviews and Inspections M. E. Fagan, "Design and code inspections to reduce error in program development, IBM Systems Journal, 38 (2&3), 1999, pp. 258-287.

More information

CBCD: Cloned Buggy Code Detector. Technical Report UW-CSE May 2, 2011 (Revised March 20, 2012)

CBCD: Cloned Buggy Code Detector. Technical Report UW-CSE May 2, 2011 (Revised March 20, 2012) CBCD: Cloned Buggy Code Detector Technical Report UW-CSE-11-05-02 May 2, 2011 (Revised March 20, 2012) Jingyue Li DNV Research&Innovation Høvik, Norway Jingyue.Li@dnv.com Michael D. Ernst U. of Washington

More information

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

Measuring Intrusion Detection Capability: An Information- Theoretic Approach Measuring Intrusion Detection Capability: An Information- Theoretic Approach Guofei Gu, Prahlad Fogla, David Dagon, Wenke Lee Georgia Tech Boris Skoric Philips Research Lab Outline Motivation Problem Why

More information

Gapped Code Clone Detection with Lightweight Source Code Analysis

Gapped Code Clone Detection with Lightweight Source Code Analysis Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Falsification: An Advanced Tool for Detection of Duplex Code

Falsification: An Advanced Tool for Detection of Duplex Code Indian Journal of Science and Technology, Vol 9(39), DOI: 10.17485/ijst/2016/v9i39/96195, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Falsification: An Advanced Tool for Detection of

More information

DESIGN OF STANDARDIZATION ENGINE FOR SEMANTIC WEB SERVICE SELECTION

DESIGN OF STANDARDIZATION ENGINE FOR SEMANTIC WEB SERVICE SELECTION DESIGN OF STANDARDIZATION ENGINE FOR SEMANTIC WEB SERVICE SELECTION S. MAHESWARI #1, G.R. KARPAGAM *2, S. MANASAA #3 #1 Assistant Professor (Senior Grade), Department of CSE, PSG College of Technology,

More information

Computing Science 114 Solutions to Midterm Examination Tuesday October 19, In Questions 1 20, Circle EXACTLY ONE choice as the best answer

Computing Science 114 Solutions to Midterm Examination Tuesday October 19, In Questions 1 20, Circle EXACTLY ONE choice as the best answer Computing Science 114 Solutions to Midterm Examination Tuesday October 19, 2004 INSTRUCTOR: I E LEONARD TIME: 50 MINUTES In Questions 1 20, Circle EXACTLY ONE choice as the best answer 1 [2 pts] What company

More information

Detecting malware even when it is encrypted

Detecting malware even when it is encrypted Detecting malware even when it is encrypted Machine Learning for network HTTPS analysis František Střasák strasfra@fel.cvut.cz @FrenkyStrasak Sebastian Garcia sebastian.garcia@agents.fel.cvut.cz @eldracote

More information

Page 1. Reading assignment: Reviews and Inspections. Foundations for SE Analysis. Ideally want general models. Formal models

Page 1. Reading assignment: Reviews and Inspections. Foundations for SE Analysis. Ideally want general models. Formal models Reading assignment: Reviews and Inspections Foundations for SE Analysis M. E. Fagan, "Design and code inspections to reduce error in program development, IBM Systems Journal, 38 (2&3), 999, pp. 258-28.

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Code Similarities Beyond Copy & Paste

Code Similarities Beyond Copy & Paste Code Similarities Beyond Copy & Paste Elmar Juergens, Florian Deissenboeck and Benjamin Hummel Institut für Informatik, Technische Universität München, Germany {juergens,deissenb,hummelb@in.tum.de Abstract

More information

Introduction to OCaml

Introduction to OCaml Fall 2018 Introduction to OCaml Yu Zhang Course web site: http://staff.ustc.edu.cn/~yuzhang/tpl References Learn X in Y Minutes Ocaml Real World OCaml Cornell CS 3110 Spring 2018 Data Structures and Functional

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

RECENTLY, web is playing an important role in people s

RECENTLY, web is playing an important role in people s IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X. NO. X, 2014 1 Automatic Reuse of User Inputs to Services among End-users in Service Composition Shaohua Wang, Member, IEEE, Ying Zou, Member, IEEE, Iman

More information

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Offer Sharabi, Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey University of

More information

Introduction to Dynamic Analysis

Introduction to Dynamic Analysis Introduction to Dynamic Analysis Reading assignment Gary T. Leavens, Yoonsik Cheon, "Design by Contract with JML," draft paper, http://www.eecs.ucf.edu/~leavens/jml//jmldbc.pdf G. Kudrjavets, N. Nagappan,

More information

Expectation Maximization!

Expectation Maximization! Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features

More information