CloPlag. A Study of Effects of Code Obfuscation to Code Similarity Detection Tools. Chaiyong Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan
|
|
- Amanda Fowler
- 5 years ago
- Views:
Transcription
1 CloPlag A Study of Effects of Code Obfuscati to Code Similarity Detecti Tools Chaiyg Ragkhitwetsagul, Jens Krinke, Albert Cabré Juan
2 Cled Code vs Plagiarised Code A result from source code reuse by copying and pasting [maybe with some modificatis] Segments of code which are identical or similar Code maintenance and management In some cases, code cling may violate software license 1 Created in a similar way as code cles but with different intenti Source code plagiarism violates academic regulatis Oracle vs Google law suit 2 [1] A. Mden, S. Okahara, Y. Manabe, and K. Matsumoto, Guilty or Not Guilty: Using Cle Metrics to Determine Open Source Licensing Violatis, IEEE Software, vol. 28,. 2, pp , [2] 2
3 3 What is Obfuscati? Modifying a program while preserving its semantics Can be achieved at 2 levels: Source code Byte code
4 4 Research Questis RQ1: how do current detecti tools perform against code obfuscati? RQ2: what is the best parameter settings and similarity threshold of each tool? RQ3: how do compilati and decompilati facilitate the detecti process? RQ4: can we apply the best parameters and threshold to other datasets effectively?
5 5 Overview of the Empirical Study Java programs are obfuscated at: Source code level Byte code level Combinati of both Several similarity detecti tools are applied to the data set Varying the settings and threshold of each tool Measure performance of each tool
6 6 Tools Obfuscators Decompilers Detectors ARTIFICE ProGuard Procy Kraka Cle SW plagiarism Compressi Others
7 7 Obfuscators ARTIFICE Source code level Renaming, changing loops & cditial statements, changing increment/ decrement statements ProGuard Bytecode level Rename classes, fields, variables to short, meaningless Schulze, S., & Meyer, D. (2013). On the robustness of cle detecti to code obfuscati th Internatial Workshop Software Cles (IWSC)
8 8 Detectors Cle detectors CCFinderX icles Simian, NiCad Deckard Plagiarism detectors JPlag Sherlock, Plaggie Sim ncd-bzlib 7zncd-BZip2 Inclusi Compressi diff, bsdiff py-difflib py-sklearn.cosine_similarity Others * Totally 21 tools ** All tools have to report similarity values (0-100)
9 9 Test Case 1 RQ1: how do current detecti tools perform against code obfuscati? RQ2: what is the best parameter settings and similarity threshold of each tool? A series of small Java programs InfixCverter SqrtAlgorithm Hai Queens MagicSquare
10 10 Test Data Preparati obfuscated source code to be used in detecti phase source obfuscator compiler bytecode obfuscator decompilers obfuscated code ARTIFICE ProGuard Procy inal javac InfixCverter.java SqrtAlgorithm.java Hai.java Queens.java MagicSquare.java Kraka
11 11 Similarity Calculati 5 sets Hai inal Detecti tools 10 files /set obfuscated ccfx* jplag* sim py-difflib similarity report * Most tools have different parameter settings which can strgly affect the results
12 12 Similarity Calculati for Unsupported Tools 0_ 0_.txt 0_.xml Simian GCF File Cverters SimCal _artifice 1_artifice.t xt 1_artifice.x ml Tools using GCF 1 + SimCal include Simian (textual report) icles (RCF format) NiCad (XML report) Deckard (textual report) [1] Wang, T., Harman, M., Jia, Y., & Krinke, J. (2013). Searching for Better Cfiguratis: A Rigorous Approach to Cle Evaluati. FSE 1
13 Similarity Report (ncd-bzlib) 13 Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice
14 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice
15 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice
16 16 1. Best threshold (T) Find the best threshold (T) of each tool with a specific parameter setting Calculate a sum of false positive and false negative (FP + FN) of all thresholds Choose T with the minimum false results BestThreshold = {T Min(FPT + FNT)}
17 17 Threshold selecti Best threshold = 31 (FP+FN=166) Threshold TP FP TN FN FP+FN Precisi Recall F-measure (F1)
18 ncd-bzlib with similarity threshold = Sqrt/ Sqrt/ Squr/ Squr/ InfCv/ InfCv/artifice InfCv/ InfCv/ InfCv/ InfCv/ InfCv/artific InfCv/artifice InfCv/artifice InfCv/artifice Sqrt/ Sqrt/artifice Square/artifice Square/artifice
19 19 2. Manually-inspected results Pairs that are closest to the threshold are very sensitive to be false positive or false negative We have fixed cost for doing manual inspecti Remove the top 50, 100, and 200 closest to the threshold for manual inspecti Evaluate the new results after removal distance from T = classifier(x,y)-t
20 20 RQ1: how do current detecti tools perform against code obfuscati?
21 21 RQ2: what is the best parameter settings and similarity threshold of each tool? Tools 7zncd-BZip2 Setting s No removal Remove 50 Remove 100 Remove 200 T FP + FN Setting s T FP + FN Setting s T FP + FN Settings m0=2, m0=2, m0=2, m0=2, mx={1,3,5} mx={1,3,5} mx={1,3,5} mx={1,3,5} ncd-bzlib T FP + FN ccfx 1 b=20, t={1..7} 4 90 b=20, t={1..5} 4 b=21, b=20, t={6,7} 3 t={1..7} b=21, b=22,t=7 t={1..7} b=23,t=7 2 b=22, b=24,t={1..7} t={1..7} jplag-java t= t= b=20, t={1..7} b=21, t={1..7} t= t=6 28 t= b=18, t={1..7} b=19, t={1..7} b=20, t={1..5} b=20, t={6,7} b=21, t={1..7} b=22, t={1..7 b=23, t={1..7} t= jplag-text 3 t= t= t= t= simjava 2 r= r= r= r= py-difflib SM_auto junk SM_auto junk SM_auto junk SM_ whitespace _autojunk pysklearn.cosine_similarit y
22 22 Test Case 2 Observati: compiling/decompiling canicalises the data RQ3: how do compilati and decompilati facilitate the detecti process? Experiment Compiled/decompiled versi of the 1st dataset Two different decompilers: Kraka vs Procy Repeat the detecti steps of Test Case 1 and compare the tool performances
23 23 Compiling/Decompiling Process * some tools have different parameter settings which can affect the results 5 sets Hai inal Compile Decompile Procy Detecti tools similarity report 10 files /set obfuscated javac Kraka ccfx* jplag* sim py-difflib similarity report
24 24 RQ3: how do compilati and decompilati facilitate the detecti process?
25 25 Original ARTIFICE (source-code obfuscated versi)
26 26 Original (decompiled) ARTIFICE (decompiled)
27 27 Test Case 3 RQ4: can we apply the best parameters and threshold to other datasets effectively? Experiment Munich dataset ctaining simis independently developed Java programs for address validati [1] Juergens, E., Deissenboeck, F., & Hummel, B. (2011). Code similarities beyd copy & paste. Proceedings of the European Cference Software Maintenance and Reengineering, CSMR, 78 87
28 28 RQ4: can we apply the best parameters and threshold to other datasets effectively? Tools Settings Test Case 1 (2,500) Test Case 3 (munich) (11,881) T FP+FN T FP+FN ccfx b=20, t={1..7} ,797 simjava r= ,680 jplag-text t= ,770 py-difflib SM_auto junk ,446 7zdcd-Bzip2 m0=2, mx={1,3,5} ,432 ncd-bzlib ,754 jplag-java t= ,162 py-sklearn.cosine_similarity ,282
29 29 Summary Current tools behave differently obfuscated code Cle and plagiarism detectors outperform the others The best parameter settings and threshold can be found using the proposed method Compiling/decompiling can help canicalise the obfuscated code The derived settings and threshold cant be applied directly to other data sets
30 30 What s next? Replicate the experiment other data sets and compare the best parameter settings and thresholds SOCO (detecti of SOurce COde re-use) The Java collecti ctains 259 source codes The C collecti ctains 79 source codes Find a better way to learn the best parameter settings and threshold in ad-hoc manner for each data set (or pair)
31 Questis? 31
CloPlag. A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools. Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan
CloPlag A Study of Effects of Code Obfuscation to Clone/Plagiarism Detection Tools Jens Krinke, Chaiyong Ragkhitwetsagul, Albert Cabré Juan 1 Outline Background Motivation and Research Questions Tools
More informationA Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark
A Comparison of Code Similarity Analyzers C. Ragkhitwetsagul, J. Krinke, D. Clark SCAM 16, EMSE (under reviewed) Photo: https://c1.staticflickr.com/1/316/31831180223_38db905f28_c.jpg 1 When source code
More informationResearch Note RN/17/04. A Comparison of Code Similarity Analysers
UCL DEPARTMENT OF COMPUTER SCIENCE Research Note RN/17/04 A Comparison of Code Similarity Analysers 20 February 2017 Chaiyong Ragkhitwetsagul Jens Krinke David Clark Abstract Source code analysis to detect
More informationA Comparison of Code Similarity Analysers
Noname manuscript No. (will be inserted by the editor) A Comparison of Code Similarity Analysers Chaiyong Ragkhitwetsagul Jens Krinke David Clark Received: date / Accepted: date Abstract Copying and pasting
More informationMeasuring Code Similarity in Large-scaled Code Corpora
Measuring Code Similarity in Large-scaled Code Corpora Chaiyong Ragkhitwetsagul CREST, Department of Computer Science University College London, UK Abstract Source code similarity measurement is a fundamental
More informationCode Duplication: A Measurable Technical Debt?
UCL 2014 wmjkucl 05/12/2016 Code Duplication: A Measurable Technical Debt? Jens Krinke Centre for Research on Evolution, Search & Testing Software Systems Engineering Group Department of Computer Science
More informationSearching for Configurations in Clone Evaluation A Replication Study
Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College
More informationOn the Robustness of Clone Detection to Code Obfuscation
On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de
More informationUsing Compilation/Decompilation to Enhance Clone Detection
Using Compilation/Decompilation to Enhance Clone Detection Chaiyong Ragkhitwetsagul, Jens Krinke University College London, UK Abstract We study effects of compilation and decompilation to code clone detection
More informationKeywords Clone detection, metrics computation, hybrid approach, complexity, byte code
Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach
More informationPlagiarism detection for Java: a tool comparison
Plagiarism detection for Java: a tool comparison Jurriaan Hage e-mail: jur@cs.uu.nl homepage: http://www.cs.uu.nl/people/jur/ Joint work with Peter Rademaker and Nikè van Vugt. Department of Information
More informationA Framework for Evaluating Mobile App Repackaging Detection Algorithms
A Framework for Evaluating Mobile App Repackaging Detection Algorithms Heqing Huang, PhD Candidate. Sencun Zhu, Peng Liu (Presenter) & Dinghao Wu, PhDs Repackaging Process Downloaded APK file Unpack Repackaged
More informationAn Information Retrieval Approach for Source Code Plagiarism Detection
-2014: An Information Retrieval Approach for Source Code Plagiarism Detection Debasis Ganguly, Gareth J. F. Jones CNGL: Centre for Global Intelligent Content School of Computing, Dublin City University
More informationAn Exploratory Study on Interface Similarities in Code Clones
1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University
More informationThreshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository
Threshold-free Code Clone Detection for a Large-scale Heterogeneous Java Repository Iman Keivanloo Department of Electrical and Computer Engineering Queen s University Kingston, Ontario, Canada iman.keivanloo@queensu.ca
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationStudy and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique
Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented
More informationON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS. By Michelle Dowling
ON AUTOMATICALLY DETECTING SIMILAR ANDROID APPS By Michelle Dowling Motivation Searching for similar mobile apps is becoming increasingly important Looking for substitute apps Opportunistic code reuse
More informationRendezvous: A search engine for binary code
Rendezvous: A search engine for binary code Wei Ming Khoo, Alan Mycroft, Ross Anderson University of Cambridge CREST Open Workshop on Malware 29 May 2013 Demo: http://www.rendezvousalpha.com 1 Software
More informationSource Code Plagiarism Detection using Machine Learning
Source Code Plagiarism Detection using Machine Learning Utrecht University Daniël Heres August 2017 Contents 1 Introduction 1 1.1 Formal Description.......................... 3 1.2 Thesis Overview...........................
More informationAn Approach to Detect Clones in Class Diagram Based on Suffix Array
An Approach to Detect Clones in Class Diagram Based on Suffix Array Amandeep Kaur, Computer Science and Engg. Department, BBSBEC Fatehgarh Sahib, Punjab, India. Manpreet Kaur, Computer Science and Engg.
More informationQuantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study
Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,
More informationDELDroid: Determination & Enforcement of Least Privilege Architecture in AnDroid
DELDroid: Determination & Enforcement of Least Privilege Architecture in AnDroid Mahmoud Hammad Software Engineering Ph.D. Candidate Mahmoud Hammad, Hamid Bagheri, and Sam Malek IEEE International Conference
More informationAn Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram
An Approach to Source Code Plagiarism Detection Based on Abstract Implementation Structure Diagram Shuang Guo 1, 2, b 1, 2, a, JianBin Liu 1 School of Computer, Science Beijing Information Science & Technology
More informationWeb Information Retrieval. Exercises Evaluation in information retrieval
Web Information Retrieval Exercises Evaluation in information retrieval Evaluating an IR system Note: information need is translated into a query Relevance is assessed relative to the information need
More informationManning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques
Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:
More informationDr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India
Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Study of Different
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationFolding Repeated Instructions for Improving Token-based Code Clone Detection
2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki
More informationJoe Raad [1,2], Wouter Beek [3], Frank van Harmelen [3], Nathalie Pernelle [2], Fatiha Saïs [2]
DETECTING ERRONEOUS IDENTITY LINKS IN THE WEB OF DATA Joe Raad [1,2], Wouter Beek [3], Frank van Harmelen [3], Nathalie Pernelle [2], Fatiha Saïs [2] joe.raad@agroparistech.fr [1] INRA, Paris France [2]
More informationInternet Traffic Classification using Machine Learning
Internet Traffic Classification using Machine Learning by Alina Lapina 2018, UiO, INF5050 Alina Lapina, Master student at IFI, Full stack developer at Ciber Experis 2 Based on Thuy T. T. Nguyen, Grenville
More informationLarge Scale Data Analysis Using Deep Learning
Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting
More informationCross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes
Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.
More informationLarge-Scale Clone Detection and Benchmarking
Large-Scale Clone Detection and Benchmarking A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in
More informationA Measurement of Similarity to Identify Identical Code Clones
The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department
More informationForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools
ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools Jeffrey Svajlenko Chanchal K. Roy University of Saskatchewan, Canada {jeff.svajlenko, chanchal.roy}@usask.ca Slawomir
More informationIDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE
International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and
More informationJSCTracker: A Tool and Algorithm for Semantic Method Clone Detection
JSCTracker: A Tool and Algorithm for Semantic Method Clone Detection Using Method IOE-Behavior Rochelle Elva and Gary T. Leavens CS-TR-12-07 October 15, 2012 Keywords: Automated semantic clone detection
More informationSoftware Clone Detection. Kevin Tang Mar. 29, 2012
Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationEnhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching
Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF
More informationDetection and Analysis of Software Clones
Detection and Analysis of Software Clones By Abdullah Mohammad Sheneamer M.S., University of Colorado at Colorado Springs, Computer Science, USA, 2012 B.S., University of King Abdulaziz, Computer Science,
More informationFloating point. Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties. Next time. !
Floating point Today! IEEE Floating Point Standard! Rounding! Floating Point Operations! Mathematical properties Next time! The machine model Chris Riesbeck, Fall 2011 Checkpoint IEEE Floating point Floating
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationDetecting Common Modules in Java Packages Based on Static Object Trace Birthmark
The Computer Journal Advance Access published November 5, 2009 The Author 2009. Published by Oxford University Press on behalf of The British Computer Society. All rights reserved. For Permissions, please
More informationClassify My Social Contacts into Circles Stanford University CS224W Fall 2014
Classify My Social Contacts into Circles Stanford University CS224W Fall 2014 Amer Hammudi (SUNet ID: ahammudi) ahammudi@stanford.edu Darren Koh (SUNet: dtkoh) dtkoh@stanford.edu Jia Li (SUNet: jli14)
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationResearch Article An Empirical Study on the Impact of Duplicate Code
Advances in Software Engineering Volume 212, Article ID 938296, 22 pages doi:1.1155/212/938296 Research Article An Empirical Study on the Impact of Duplicate Code Keisuke Hotta, Yui Sasaki, Yukiko Sano,
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationcode pattern analysis of object-oriented programming languages
code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s
More informationEnhanced Compositional Safety Analysis for Distributed Embedded Systems using LTS Equivalence
Proceedings of the 6th WSEAS Internatial Cference Applied Computer Science, Hangzhou, China, April 15-17, 2007 115 Enhanced Compositial Safety Analysis for Distributed Embedded Systems using LTS Equivalence
More informationNatural Language Processing Is No Free Lunch
Natural Language Processing Is No Free Lunch STEFAN WAGNER UNIVERSITY OF STUTTGART, STUTTGART, GERMANY ntroduction o Impressive progress in NLP: OS with personal assistants like Siri or Cortan o Brief
More informationEvaluation of similarity metrics for programming code plagiarism detection method
Evaluation of similarity metrics for programming code plagiarism detection method Vedran Juričić Department of Information Sciences Faculty of humanities and social sciences University of Zagreb I. Lučića
More informationEVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM
EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM Assosiate professor, PhD Evgeniya Nikolova, BFU Assosiate professor, PhD Veselina Jecheva,
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationUse of Source Code Similarity Metrics in Software Defect Prediction
1 Use of Source Code Similarity Metrics in Software Defect Prediction Ahmet Okutan arxiv:1808.10033v1 [cs.se] 29 Aug 2018 Abstract In recent years, defect prediction has received a great deal of attention
More informationA Guided Genetic Algorithm for Automated Crash Reproduction
A Guided Genetic Algorithm for Automated Crash Reproduction Soltani, Panichella, & van Deursen 2017 International Conference on Software Engineering Presented by: Katie Keith, Emily First, Pradeep Ambati
More informationReview for Test 1 (Chapter 1-5)
Review for Test 1 (Chapter 1-5) 1. Introduction to Computers, Programs, and Java a) What is a computer? b) What is a computer program? c) A bit is a binary digit 0 or 1. A byte is a sequence of 8 bits.
More informationAccuracy Enhancement in Code Clone Detection Using Advance Normalization
Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research
More informationDCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code
DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code Junaid Akram (Member, IEEE), Zhendong Shi, Majid Mumtaz and Luo Ping State Key Laboratory of Information Security,
More informationABBYY Smart Classifier 2.7 User Guide
ABBYY Smart Classifier 2.7 User Guide Table of Contents Introducing ABBYY Smart Classifier... 4 ABBYY Smart Classifier architecture... 6 About Document Classification... 8 The life cycle of a classification
More informationSource Code Reuse Evaluation by Using Real/Potential Copy and Paste
Source Code Reuse Evaluation by Using Real/Potential Copy and Paste Takafumi Ohta, Hiroaki Murakami, Hiroshi Igaki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology,
More informationISSN: (PRINT) ISSN: (ONLINE)
IJRECE VOL. 5 ISSUE 2 APR.-JUNE. 217 ISSN: 2393-928 (PRINT) ISSN: 2348-2281 (ONLINE) Code Clone Detection Using Metrics Based Technique and Classification using Neural Network Sukhpreet Kaur 1, Prof. Manpreet
More informationKeywords Machine learning, Traffic classification, feature extraction, signature generation, cluster aggregation.
Volume 3, Issue 12, December 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Survey on
More informationUnderstanding and Detecting Wake Lock Misuses for Android Applications
Understanding and Detecting Wake Lock Misuses for Android Applications Artifact Evaluated by FSE 2016 Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni Code Analysis, Testing and Learning Research
More informationWhere Should the Bugs Be Fixed?
Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication
More informationCompiling clones: What happens?
Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca
More informationInternational Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN
International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja
More informationKClone: A Proposed Approach to Fast Precise Code Clone Detection
KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3
More informationMeCC: Memory Comparison-based Clone Detector
MeCC: Memory Comparison-based Clone Detector Heejung Kim, Yungbum Jung, Sunghun Kim, Kwangkeun Yi Seoul National University, Seoul, Korea {hjkim,dreameye,kwang}@ropas.snu.ac.kr The Hong Kong University
More informationCSE 504. Expression evaluation. Expression Evaluation, Runtime Environments. One possible semantics: Problem:
Expression evaluation CSE 504 Order of evaluation For the abstract syntax tree + + 5 Expression Evaluation, Runtime Environments + + x 3 2 4 the equivalent expression is (x + 3) + (2 + 4) + 5 1 2 (. Contd
More informationUnderstanding and Detecting Wake Lock Misuses for Android Applications
Understanding and Detecting Wake Lock Misuses for Android Applications Artifact Evaluated Yepang Liu, Chang Xu, Shing-Chi Cheung, and Valerio Terragni Code Analysis, Testing and Learning Research Group
More informationA Replication and Reproduction of Code Clone Detection Studies
A Replication and Reproduction of Code Clone Detection Studies Xiliang Chen Electrical and Computer Engineering The University of Auckland Auckland, New Zealand xche185@aucklanduni.ac.nz Alice Yuchen Wang
More informationScalable Code Clone Detection and Search based on Adaptive Prefix Filtering
Scalable Code Clone Detection and Search based on Adaptive Prefix Filtering Manziba Akanda Nishi a, Kostadin Damevski a a Department of Computer Science, Virginia Commonwealth University Abstract Code
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationCode Clone Detector: A Hybrid Approach on Java Byte Code
Code Clone Detector: A Hybrid Approach on Java Byte Code Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted By
More informationDuplication de code: un défi pour l assurance qualité des logiciels?
Duplication de code: un défi pour l assurance qualité des logiciels? Foutse Khomh S.W.A.T http://swat.polymtl.ca/ 2 JHotDraw 3 Code duplication can be 4 Example of code duplication Duplication to experiment
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationUsing Character n gram g Profiles. University of the Aegean
Intrinsic Plagiarism Detection Using Character n gram g Profiles Efstathios Stamatatos University of the Aegean Talk Layout Introduction The style change function Detecting plagiarism Evaluation Conclusions
More informationEnhancing Source-Based Clone Detection Using Intermediate Representation
Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software
More informationWuKong: A Scalable and Accurate Two-Phase Approach to Android App Clone Detection
WuKong: A Scalable and Accurate Two-Phase Approach to Android App Clone Detection Haoyu Wang, Yao Guo, Ziang Ma, Xiangqun Chen Key Laboratory of High-Confidence Software Technologies (Ministry of Education)
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationReading assignment: Reviews and Inspections
Foundations for SE Analysis Reading assignment: Reviews and Inspections M. E. Fagan, "Design and code inspections to reduce error in program development, IBM Systems Journal, 38 (2&3), 1999, pp. 258-287.
More informationCBCD: Cloned Buggy Code Detector. Technical Report UW-CSE May 2, 2011 (Revised March 20, 2012)
CBCD: Cloned Buggy Code Detector Technical Report UW-CSE-11-05-02 May 2, 2011 (Revised March 20, 2012) Jingyue Li DNV Research&Innovation Høvik, Norway Jingyue.Li@dnv.com Michael D. Ernst U. of Washington
More informationMeasuring Intrusion Detection Capability: An Information- Theoretic Approach
Measuring Intrusion Detection Capability: An Information- Theoretic Approach Guofei Gu, Prahlad Fogla, David Dagon, Wenke Lee Georgia Tech Boris Skoric Philips Research Lab Outline Motivation Problem Why
More informationGapped Code Clone Detection with Lightweight Source Code Analysis
Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka
More informationFalsification: An Advanced Tool for Detection of Duplex Code
Indian Journal of Science and Technology, Vol 9(39), DOI: 10.17485/ijst/2016/v9i39/96195, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Falsification: An Advanced Tool for Detection of
More informationDESIGN OF STANDARDIZATION ENGINE FOR SEMANTIC WEB SERVICE SELECTION
DESIGN OF STANDARDIZATION ENGINE FOR SEMANTIC WEB SERVICE SELECTION S. MAHESWARI #1, G.R. KARPAGAM *2, S. MANASAA #3 #1 Assistant Professor (Senior Grade), Department of CSE, PSG College of Technology,
More informationComputing Science 114 Solutions to Midterm Examination Tuesday October 19, In Questions 1 20, Circle EXACTLY ONE choice as the best answer
Computing Science 114 Solutions to Midterm Examination Tuesday October 19, 2004 INSTRUCTOR: I E LEONARD TIME: 50 MINUTES In Questions 1 20, Circle EXACTLY ONE choice as the best answer 1 [2 pts] What company
More informationDetecting malware even when it is encrypted
Detecting malware even when it is encrypted Machine Learning for network HTTPS analysis František Střasák strasfra@fel.cvut.cz @FrenkyStrasak Sebastian Garcia sebastian.garcia@agents.fel.cvut.cz @eldracote
More informationPage 1. Reading assignment: Reviews and Inspections. Foundations for SE Analysis. Ideally want general models. Formal models
Reading assignment: Reviews and Inspections Foundations for SE Analysis M. E. Fagan, "Design and code inspections to reduce error in program development, IBM Systems Journal, 38 (2&3), 999, pp. 258-28.
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationCode Similarities Beyond Copy & Paste
Code Similarities Beyond Copy & Paste Elmar Juergens, Florian Deissenboeck and Benjamin Hummel Institut für Informatik, Technische Universität München, Germany {juergens,deissenb,hummelb@in.tum.de Abstract
More informationIntroduction to OCaml
Fall 2018 Introduction to OCaml Yu Zhang Course web site: http://staff.ustc.edu.cn/~yuzhang/tpl References Learn X in Y Minutes Ocaml Real World OCaml Cornell CS 3110 Spring 2018 Data Structures and Functional
More informationThe goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.
Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which
More informationRECENTLY, web is playing an important role in people s
IEEE TRANSACTIONS ON SERVICES COMPUTING, VOL. X. NO. X, 2014 1 Automatic Reuse of User Inputs to Services among End-users in Service Composition Shaohua Wang, Member, IEEE, Ying Zou, Member, IEEE, Iman
More informationUsing Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions
Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions Offer Sharabi, Yi Sun, Mark Robinson, Rod Adams, Rene te Boekhorst, Alistair G. Rust, Neil Davey University of
More informationIntroduction to Dynamic Analysis
Introduction to Dynamic Analysis Reading assignment Gary T. Leavens, Yoonsik Cheon, "Design by Contract with JML," draft paper, http://www.eecs.ucf.edu/~leavens/jml//jmldbc.pdf G. Kudrjavets, N. Nagappan,
More informationExpectation Maximization!
Expectation Maximization! adapted from: Doug Downey and Bryan Pardo, Northwestern University and http://www.stanford.edu/class/cs276/handouts/lecture17-clustering.ppt Steps in Clustering Select Features
More information