Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools

Size: px
Start display at page:

Download "Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools"

Transcription

1 , pp Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools Harpreet Kaur 1 * (Assistant Professor) and Raman Maini (Professor) 2 Computer Engineering Department Punjabi University, Patiala Abstract Since Code Cloning is the recent area of research in software engineering, it is crucial to have good understanding of all the code-clone-detection techniques. Clones in software development increases maintenance cost and it leads to poor software quality. This paper is basically combination of two issues: literature review of code clone detection techniques and experimental work for the evaluation of chosen techniques from literature. This paper firstly list out the various studies and then evaluates the performance of three chosen techniques (Text-based, Token-based and Tree-based) by means of automated tools. Netbeans-Javadoc, JBoss and Java-Quizz source codes has been examined to validate results. From the analysis it has been observe red that token based approach reports more false positives as compared to other techniques. Text based and token based approaches have precision values greater than tree based approach, but tree based approach has higher recall values. Token based, Tree based and metric based approaches are useful in combination with refactoring tools. It has been observed that in terms of speed, text-based approach is suitable to small size projects, but token based technique is scalable to large size projects also. Tree-based and token based techniques work effectively to detect near-miss clones and give more safe and sound result. DuDe, ccfinder, solid-sdd and clonedr tools have been used for validation. From the experimental work it has been observed that Dude tool is suitable for small projects, but ccfinder is scalable from small to large projects. False positives are reported by ccfinder because of its token based approach, but clonedr leads to minimum false positives as compared to ccfinder. The aim of the paper is to find the strengths and weaknesses of these techniques which will be helpful to select a clone detection technique for a particular purpose. Keywords: Code Clone, Software Maintenance, Code fragment, Clone-class 1. Introduction Code cloning is well known problem in software engineering and leading to poor software quality projects. The reasons to copy code fragments are: 1) making a copy of code is simple and fast rather than writing from scratch 2) Producing more source code leads to better incentives for programmers in industry [6]. Techniques and tools for detecting duplicate code are of main concern in software maintenance research. Some of the definitions related to code-cloning are discussed below: Code Fragment: A code fragment (CF) is any sequence of code lines (with or without comments). It can be of any granularity, e.g., function definition, begin-end block, or sequence of statements [9]. Code Clone: A code clone is a code portion in source files, similar or identical to another code portion [7]. A code portion (CP1) is a clone of another code portion (CP2), if they are similar to each other by some relation, iff, f(cp1) = f(cp2) where f is any similarity ISSN: IJSEIA Copyright c 2017 SERSC

2 function[9]. Clone Pair: A pair of code portions/fragments is called a clone pair, if there exists a clone relation between them. Types of Clones Type-1: Identical code fragments except for variations in whitespace, layout and comments. In Type I clone, different code fragments are exact copy of each other that is why, Type I is widely known as Exact-clones, only in variation in white spaces and comments. Type-2: Syntactically identical fragments except for variations in identifiers, literals, types, whitespace, layout and comments. A Type II clone means when two code fragments are similar to each other except for some variation in the names of identifiers declared (name of variables, constants, class, methods and so on), types, layout and comments. Type-3: Copied fragments with more modifications such as with some added statements, with removal of some statements or some modified statements, in addition to variations in identifiers, literals, types, whitespace, layout and comments are known as Type-III clones. Type-4: Two or more code fragments that perform the same computation but are implemented by different syntactic variants. Two or more code fragments which are semantically similar to each other results in TYPE-IV clones. In this type of clones, it is not mandatory that code fragment should be copied from somewhere. Two code fragments under Type-IV clone may be developed by different programmers to implement same functionality. 2. Literature Surveys of Clone Detection Techniques Clone detection attempts to find out the duplicate code within whole software, which may be exactly-copied or modified somewhere. Several techniques are available to detect duplicate code. A) Token-Based Clone detection technique: Kamiya et al. [8] described the process of token-based technique is shown in Figure 1. The process consists of four steps: 1. Lexical Analysis: Each line of source files is divided into tokens according to lexical rules of respective programming language. The tokens generated from all source files are concatenated to form one single sequence of tokens. It will be easy then to perform analysis of this single token sequence. White spaces, comments and tabs are removed from source code in preprocessing. 2. Transformation. Identifiers are then replaced with customized tokens by the use of transformation rules. And this replaced information is kept at back up for future formatting into original text. 3. Match Detection. Then on transformed token sequence, token sequence of lines are then compared efficiently using similarity detection (token suffix-tree) algorithm. Then the similar lines of sequences are reported as clone pairs. 4. Formatting. Each location of clone pair is converted into line numbers on the original 32 Copyright c 2017 SERSC

3 source files. A clone detection tool Dup [22] uses a sequence of lines for the representation of source code and it detects clones line-by-line. It performs: 1) Identifiers of source code are replaced into a special identifier 2) extraction of matches by a suffix-tree algorithm [10] of O(n) time complexity (n is the number of lines in the input). The line-by-line method has a weakness in the line-structure modification. Ref: [23] Token-suffix trees scales very well in time and space, because of its linear complexity. Studies ([9], [11]) have shown that token based clone detection approaches suffer from many false positives, but this technique have high recall value with low precision. Ueda et al. [21] developed Gemini which is maintenance support environment used for visualization of clones on the output of CcFinder. Gemini specify GUI (scatter plot and metrics graph about code clones). This is basically used for the visualization of detected code clones. The scatter plot graphically demonstrates the areas of code clones among source codes. The measurements diagram indicates metric estimation of every clone. Utilizing Gemini, we can indicate the code clones that ought to be paid heed in the maintenance stage. B) Text Based Clone detection technique: In this approach, entire source code is assumed as sequence of strings. One line is compared with another line by applying string matching algorithms, and similar strings are reported as clones. Raw source code is used for detection, because this method is purely textual, no transformation to source code is performed. It spaces and comments. However, it may be needed some time to remove white spaces and comments etc. Ducasse et al. [6] proposed an approach in which, source code is transformed into internal format by removing comments and white spaces, Secondly, comparison algorithms, then performed on the internal data. This will be called as effective file on which comparison is to be performed. In this approach, one line of source code is taken as code fragment. As an example, the C line if( code & pcobjtype )f /* print type */ is condensed to: if(code&pcobjtype)f (by removing spaces) In this algorithm, comparison is performed of every source line with every other source line. The comparison is done using string matching techniques. If a string matches exactly, a Boolean true value is returned, otherwise Boolean false value is returned. This value is stored in a matrix, taking the coordinates that the two compared entities have in their respective ordered collections as the matrix coordinates for the comparison result. The result is represented in the form of dot-plot. S. Lee et al. [25] developed algorithm SDD (Similar Data Detection), this algorithm finds exact clones and same parts of software. SDD has controlled complexity using Inverted Index and an Index. Authors revealed that SDD shows better results than PMD. SDD also detects modified clones by using N-neighbor distance concept. Moreover, SDD is language independent. Copyright c 2017 SERSC 33

4 J.R. Cordy et al.[26] discussed light weight text based approach to detect near-miss clones. Basically they applied Pretty printing and Code normalization technique to find code clones. Code lines are broken into parts and clones are extracted by comparing the broken text and by applying code normalization. Basically UPI (Unique percentage of Items) is calculated and on the basis of that unique lines gapes are detected. Whole technique is implemented in a tool NICAD, which is parser based and language specific but reasonably light weight using simple line matching. case studies covered are Abyss [2] of 1500 lines and Weltab [3] of lines. These two are taken as test beds because results are already published for these. C) Metric Based Clone detection Technique: In this, different software metrics of code are gathered and on the basis of similar values of these metrics, clones are detected. At first an arrangement of programming measurements are ascertained for syntactic units, such as function, class, and even for a statement, then estimations of these measurements were thought about. If two syntactic units exhibit same metric value, these can be regarded as clone-pair. Mayrand et al. [4] used various metrics to detect clones. Functions with similar metric values are returned as clone-pairs. Metrics are calculated from names, layouts, expressions and control flow of functions. D) Abstract-syntax-trees (AST) Based: In this approach, Abstract syntax tree of a program is produced utilizing a parser of a dialect. Then tree matching technique is applied on that AST generated to detect similar sub trees. When a match is found between two subtrees, Then source code of similar sub trees is returned as clone-pair. Baxter et al., [5] uses a hash function to partition sub trees of the abstract syntax tree of a program. Then sub trees in the same partition are compared using tree matching technique. A comparable strategy was additionally proposed by Yang [2] utilizing dynamic programming to distinguish contrasts among different adaptations of same file.s Jiang et al., [24] Approach presented by Jiang et al. has been implemented through tool Deckard. This tool is platform independent. Character-stick vectors of AST are calculated in a Euclidean space and then those vectors are merged to compute similarity among subtrees. LSH (Local Sensitive Hashing) has been used to cluster similar vectors that can hash two similar vectors to the same hash value with arbitrary high probability and two distant vectors with arbitrary low probability and hence find clones. Case studies covered for evaluation of Deckard are JDK and Linux kernel as shown in Table1. Table 1. Case Studies under Deckard Tool Case Study #files and Loc covered Number of Files not Parameter s for Parsed Compariso n 8532 java files and only 2 files not Deckard JDK ,418,767 parsed for Characterization Loc JDK Vector Linux Kernel 7,988 c files and 5,287,090 Loc clonedr CP-Miner clonedr fails to work on whole JDK at once 9 Group of 1000 files in each group has been made Evaluated for Linux kernel 81 files not parsed for JDK Similarity Metri Distance c gap is used to find Clones 34 Copyright c 2017 SERSC

5 E) Program dependency graph (PDG) Based: Program Dependency Graph is used to show control flow and data flow dependencies of a program. The isomorphic sub graphs in a program dependency graph are named as clone-pairs. PDG is the method which can detect TYPE-3, 4 clones, because semantic information is carried out in PDG. Lieu, [10] has implemented a plagiarism detection algorithm and Gplag tool is implemented, which is based on PDG approach. Related Comparison Studies: Rysselberghe et al., [13] compared three techniques: simple line based matching, parameterized matching, and metric fingerprints. Research process used during experiment is based on Goal-Question-Metric worldview like what sorts of matches are found?, How accurate are the results and how useful information is gained? Etc. conclusions has been drawn that simple line matching is purely language independent on the other hand all other techniques need some kind of configuration. Function block duplication is found by metric fingerprint technique and general duplication is found by other techniques. No false matches are found by simple line matching, few false matches are found by parameterized technique and even more false matches are found by metric fingerprint (characterization of expressions which lacks accuracy is responsible for this problem). False matches therefore for metric fingerprint thus depends on the way expressions are characterized and the length of the code fragments under processing, While number of recognizable matches are high for this technique. F. Zibran et al., [27] discussed a focused approach of a selected code segment which is known as seed segment instead of the detection of all clones from the entire code-base. The limitations of available techniques are to find out type-3 clones and mainly implemented as stand-alone tools which uncovers the area of clone-aware development. Seed fragment is compared with the search space (whole source code) to find out type-3 clones in that. Mainly fingerprinting technique is used to generate finger prints for the unique lines and then syntax tree is generated for the whole fingerprint sequence. Suffix tree is generated for the generalized fingerprint sequence using Ukkonen's online algorithm. Eclipse's JDT API's is used to generate ASTs (Abstract Syntax Trees). Fabio Calefato et al., [28] described a semi automated approach to find clones in scripting code of web applications. The approach is useful to select function clones and to inspect selected script functions. Semi automated approach is both effective and efficient at identifying function clones in web applications. Muhammad Asaduzzaman [29] addressed that in spite of number of clone detection tools are available, even then there is one challenge of handling raw clone data, because of textual nature and large in volume. To address this issue, a framework VisCad is proposed for performing large scale code clone analysis. It also acts as a maintenance support environment. In VisCad: various visualization techniques, number of metrics and data filtering options are available, therefore users can analyze and identify distinctive code clones. 3. Research Methodology The methodology to carry out survey is discussed here. Step1: Information is collected from primary research as well as from empirical observations. Research papers covering different techniques and search criteria have been studied. Step2: A set of tools is chosen, which are either developed or used to implement and test different code clone detection techniques. Step3: Different Questions to cover issues related to detection techniques. Copyright c 2017 SERSC 35

6 Step4: Mostly used case studies in research papers are also mentioned which will be helpful in analyzing clone detection techniques and may serve as benchmarks. The information flow for gathering research data is shown in following Figure 1: Figure 1. Flow of Gathering Information Based on literature survey, various tools and techniques to detect clones have been summarized in Table 2 and Table 3 below: Table 2. Overview of Clone Detection Techniques Text Based Token Based Tree Based Metric Based Dependent on No need of Parsing is coarse-grained Layout parsing performed which abstractions for a Very less Independent of makes this piece of code chances of False layout technique Positives Adaptable to new complex actually useful Line by line languages [24] High precision for evaluation on method does not High recall Rate of false Thebasis of detect line break Chances of many positives is Low Functions not Compatibility with Refactoring Techniques: false positives as compared to Individual Find clones other techniques Statements sometimes which Find syntactic Are not syntactic clones Helpful in Helpful in Refactoring refactoring Token-Based Approaches works on parameterized matching and is robust against rename operations. In this manner it works best in blend with fine-grained refactoring tools that work on the level of articulations (i.e., Extract Method and so forth.) Metric fingerprints (agent for the parse-tree based procedures) are great at uncovering copied subroutines, independent of little contrasts, subsequently work best in mix with refactoring tools that work on method level (i.e., Remove Method and Pull up technique). [13] Based on study of code clone detection techniques from literature review, it has been observed that Detecting duplicate code manually is impossible for huge software. Following points are of some concern: 36 Copyright c 2017 SERSC

7 1. Because of availability of various techniques to find out clones, pushes us to think on the point that Which technique is to be followed for clone detection? This particular point is our interest to perform experiment using different techniques for duplication-detection in source-code. 2. Each technique detects different number of clones in the same software. Some code-clones can be missed by one technique and can be detected by any-other technique. It might be possible that the detected clones are not of good concern. 3. Which technique is best suitable to improve a system design with minimal effort? Table 3. Comparison of Tools Technique Internal Tool Availability Clone Clone Types Matching Algorithm Representation (Free/Paid) Relation of Source Code Text Lines Dude[30] Free CP Type-1 Based Duploc[31] CP Matching Algorithm: SDD[25] Free Type-1 Full Free CP Type-1, Type- Suffix tree algorithm NICAD[26] 2 and Type-3 (Baker and CCFinder) Token Tokens Dup[22] Free CP Type-1 and Based Type-2 DPM (Dynamic ccfinder[8] Free CP Type-1 and Pattern Matching) (Duploc Type-2 and CP-Miner[32] Free Kontogiannis) solidsdd[36] Free/paid (both) CP Type-1, Type- 2 and Type-3 Hash value comparison (Baxter, Clone CP Marrand s, and SMC) detective[33] Tree Nodes in AST clonedr[5] Free (evaluation CP Type-1, Type- Character-stick Based (Abstract Version)/ Full 2 and Type-3 vectors Of AST are Syntax Tree) Version (Paid) Calculated in a cpdetector[11] Trial Version CP Type-1 and Euclidean space Available Type-2 (Deckard) Deckard[24] Trial Version CP Type-1, Type- Available 2 and Type-3 use of an Inverted Index and an Index(SDD) Metric Functions and Mayrand et al. Function Based Methods [1] blocks or Methods Mostly Used Case Studies Type-1, Type-2 and Type-3 ScoreMaster, TextEdit [20], Brahms [1], JMocha, JavaParser of JMetric [10]., ANTLR (Version 2.7.1) [1], NetBeansJavadoc, JBoss Concerning all above issues, Three techniques are chosen for comparison (text-based, token-based and tree based) by means of three case studies: Netbeans-javadoc, Java-Quizz and Jboss SP1-src (all in java) using the automated tools mentioned in the Table 4 and Table 5 respectively: Table 4. Case Studies Case Language Total number of files processed Netbeans Javadoc Java 21 Java Quizz Java 10 Jboss SP1-src Java 4951 Copyright c 2017 SERSC 37

8 Table 5. Tools and Background Processing Technique Tool Working Technique Reference DuDe Text-based approach [17] solidsdd Token Based Approach [19] CCFinder Token Based Approach [18] clonedr Tree- Based Approach [20] DuDe is language-independent code clone detector. It works on text based approach. Clone detection is performed on duplication chains. The tool is composed in Java and keeps running on each significant stage. Despite the fact that DuDe is content based, it consolidate little copied portions to shape bigger ones by permitting gaps in its scatter plot representation. CcFinder converts source code into tokens, then those tokens are transformed into special tokens using lexical rules of respective programming language. CCfinder detects clones portions having different syntax but similar meaning. Another purpose is to filter out code portions with specified structure patterns. Token sequence helps to detect clones with different line structures, which cannot be detected by line-byline algorithm. SolidSDD (The Duplicate Code Detector) is a tool for finding and breaking down copy code (i.e., code clones). It identifies clones in source code amid advancement, for instance by duplicate copy paste operations. SolidSDD supports C, C++, C# and Java. In provides graphical interface to assess code duplication characteristics and also able to locate clone position in software stack. The graphical is helpful to developers, architects and software managers for refactoring purposes. In CloneDr, an annotated parse tree (AST) is generated. At that point sub-trees are analyzed by measurements in light of a hash work. Source code of comparative sub-trees is then returned as clones. The hash work empowers one to do parameterize coordinating and to identify gapped clones, particularly if the gaps are inside a line. The trial version analyzes the whole project but only reports 10 sample clones of medium size (max 50 lines). All printed material, including text, illustrations, and charts, must be kept within the parameters of the 8 15/16-inch (53.75 picas) column length and 5 15/16-inch (36 picas) column width. Please do not write or print outside of the column parameters. Margins are 3.3cm on the left side, 3.65cm on the right, 2.03cm on the top, and 3.05cm on the bottom. Paper orientation in all pages should be in portrait style. 4. Evaluation Criteria Different techniques are suitable under different conditions. Qualitative as well as Quantitative parameters for the evaluation of techniques outlined in literature are: Qualitative parameters: Suitable Confidence Relevance Focus No. of clones reported Quantitative parameters: False positives Kind of matches detected Precision Recall An effort has been made to compare techniques and tools using following qualitative criteria: Criteria1. No. of Clones: Which technique finds more number of clones in each file of the given project? 38 Copyright c 2017 SERSC

9 Criteria2. Suitable: Which technique is suitable to detect Type-1, Type-2, Type-3 and Type-clones? Which technique detects clones, which are suitable for refactoring? Criteria3. Relevance: Whether any technique prioritize the match found or not for refactoring purpose? Like, if a segment of code is copied again and again, then it is more relevant for refactoring, because its removal has direct impact on the code. Like, codeclones in the same class are easy to modify than clones in different classes [13]. Criteria4. Confidence Which Code-Clone-Detection tool gives reliable results, or a lineto-line manual inspection is necessary [13]? Criteria5. Focus Does one have to concentrate on a single class or is it also possible to asses an entire project [13]? Criteria6. Scalable to Speed Which technique is adjustable to speed in relation with size of project taken (from small size projects to large size projects)? 4.1 Various Issues Some of the issues or research questions in concern of clone detection techniques are: What kinds of matches are found? Overall report of the amount of duplication existing in all program files Programming constructs that one can restructure using a particular tool. How accurate are the results? -which technique results in more number of false positives that is incorrectly identified pieces of duplicated code. -number of useless matches that is the matches which are not relevant for refactoring. - number of recognizable matches that is, the matches which are interesting for refactoring. Precision: Precision is the percentage of accurate clones detected relative to total clones detected by the technique. Precision= relevant clones in detection/ Total detected Candidates Recall: Recall is the number of reference clone groups detected by each technique relative to all of the reference clone groups. Recall= no of relevant clones detected/ Total no of relevant clones in database Q: How much execution time is consumed by detection technique? Clone detection techniques takes time to process input source code to detect clones ranging from small to large applications. Q: Are code clones detected by each technique helpful to derive any information related to design and maintenance issues? This point interprets that detected clones provide some information related to structural clones or reusable components or debug removal opportunities etc. which will be helpful in understanding the design of application in terms of components. Copyright c 2017 SERSC 39

10 Q: Are clones detected by techniques removable? It means clones detected by any technique can be refactored or not. Which techniques are good in combination with refactoring tools? Q: Which technique to use to find different types of clones? This addresses the issue of application of technique to find type of clone, For Example which technique is useful to find out type-1, type-2, type-3 or type-4 clones. Q: What should be the minimum length of a clone? The minimum number of statements (threshold value)that should be considered as a clone is an utmost important factor to think upon. If threshold value is very large then the clone detection technique will report less clones. But if threshold value is very small then it cal lead to large number of clones to be detected which can consists of so many spurious clones. Mainly in previous studies the clone length varies around (in ccfinder clone length is 30 but in clone Miner clone length is value less than 30). 4.2 Mostly Used Case Studies in Literature -ScoreMaster is a Java application consequently created for the Enhydra web server. Since a large portion of the code has been created naturally, it contains a high level of duplication. - TextEdit is an projevct that is dispersed with Borland's JBuilder to exhibit GUI programming in Java. Because of its instructive nature it contains little duplication[20]. - Brahms is music sequencing and documentation programming for linux written in C++ and was earlier known as KooBase. The little measure of duplication present is of an alternate nature on the grounds that the code was composed physically in an open source context [1]. - JMocha is a Java beans benchmark developed by IBM[11]. - JavaParser of JMetric is, as indicated by its name, a Java parser generated by Java for the JMetric project. It concerns a larger example of automatically generated code full of duplication[10]. ANTLR (Version 2.7.1)[1]. ANTLR (ANother Tool for Language Recognition) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing C++ or Java actions. ANTLR includes 189 files and the size is 42000LOC [37]. 5. Results and Discussions Analysis has been performed by comparing clone detection techniques and by comparing automated tools based on those techniques. 5.1 Comparison of Techniques In order to evaluate the performance of code-clone-detection techniques, Netbeans- Javadoc, Java-Quizz and Jboss SP1-src, open-source-codes were taken. This section reports the results of the techniques using the criteria discussed in Section 3. Table 6 describes about total number of clones detected by each technique. And then all the criteria discussed above are evaluated. 40 Copyright c 2017 SERSC

11 Table 6. Number of Clones Detected by Three Techniques (Netbeans Javadoc): Approach Tool Used Output of Complexity clones reported Token- Based ccfinder 28 clone sets O(LENGTH(longest clone) * Tokens ) Tree Based clonedr 19-clone sets O( Subtrees of AST ) Text Based Dude 40 clone-pairs O (LOC) Analysis on the basis of Chosen Criteria: A) No. of Clones: The number of reported clones is also important in assessing clone detection techniques and tools. Text Based technique gives maximum number of clones in the whole source-code, but token based techniques detects more number of code-clones in each respective file as shown in graphs below (only one case study shown here) CASEA: Netbeans javadoc Figure 2. No of Clones in Each File (Netbeans Javadoc)(Text-based) Figure 3. No of clones in each file (tree-based) (Netbeans-javadoc) Copyright c 2017 SERSC 41

12 Figure 4. No of Clones in each File (Token-based) (Netbeans-javadoc) Text-Based Techniques: Because textual approaches apply negligible or no transformation on the source code during pre-processing, therefore Text-based techniques and tools are not good at detecting Type-3 near-miss clones. In text based techniques there are fewer chances to find uninterested clones because exact match of text is required in this technique [14]. B) Suitable All the techniques find Exact-Clones, but in evaluation, it has been observed that textual and token based technique find matching case structures. And it is easier to remove these kinds of clones by combining different functions or methods under onesingle-common-name. Also Text based technique gives more details about renaming of variables in source code, therefore it is easy to locate situations for removal of duplicate code. Tree based techniques are more oriented for finding near-miss clones (Type-III) (As shown in Table 7) as compared to text-based and token-based techniques. Text-based code clone detectors rely solely on the textual representation of the source code. Only minor transformations are performed, such as whitespace, comments and layout is normalized. This makes is difficult for the clone detector to detect Type II, III and IV clones [16]. The token-based detector parses the whole source code and works on a token sequence as representation of the code. During creation of this token sequence, identifiers, whitespace and layout are normalized. Therefore, the clone detector should be able to detect Type I and II clones. However, Type III and IV clones represent a difficulty for the clone detector; since they interrupt a token chain [16]. The tree-based detector uses an abstract syntax tree as work object. Similar to the token-based approach whitespace, layout and identifiers are normalized. Type I and II clones should be detectable by this clone detector. Also, the loop transformation could be easily detected, since the AST representation of a for and while loop are very similar [16]. 42 Copyright c 2017 SERSC

13 Evaluation Clone Detection Report For Project File: C:/Documents and Settings/Administrator/Desktop/parameters-final-try.prj using CloneDR tool Table 7. Near-miss Clones Reported by Tree-based Technique Clone Detection Statistics Statistic Value File Count 21 Total Source Lines of Code (SLOC) 2852 Total CloneSets 19 Exact-match CloneSets 9 Near-miss CloneSets 10 Number of cloned SLOC 469 SLOC in clones % 16.3% C) Relevance Concerning this criterion, all techniques behave more or less the same. All these techniques report clones in terms of clone-pairs, clone-sets, clone-length, line-number and file-names. No information is given about priority of any clones. But in token-based approach, up to some extent we can filter unwanted clones by the use of filter-file, filterclone-set as in ccfinder. Tree clone examination (Baxter et al, 1998) endeavors to be more precise than a literary or programming dialect token based approach by building the theoretical sentence structure tree. Because token-generation still somewhat depends upon transformation rules applied on original source-code, but tree generated gives the exact view of code and information flow. Therefore it gives more accurate results w.r.t other two techniques. Token-based technique finds some uninterested code clones. Like CCFinder. CcFinder is a language dependent code-clone. The grey colored parts in figure 5 represent clones between files A and B. The variable and method names in the code fragments are different. As the CcFinder algorithm transforms user-defined names into the same special token. Therefore the source code having different variable names; for example, after copy and paste some variable names are changed are detected. [14]. But only clone-length and file-name information is not sufficient for real evaluation of relevant code-clones. All techniques would provide information like the class-name to which a particular clone belongs, so that the user can have a better view, which will help in refactoring. Copyright c 2017 SERSC 43

14 Figure 5. Example of Uninterested Clones Detected by CCFinder [14] D) Confidence The results obtained, has shown that Textual technique gives good confidence, because it detects exact-matches rather than getting confused with language constructs, in comparison with token-based technique. This technique also gives details about number of occurrences (Instances) of each clone-id as explained in figure below: Figure 6. No. of Occurrences of each Clone-set Textual-approach gives detail about renaming of variables and fan-out (that is a codeclone is scattered among how many number of files), which is very beneficial to know about the effort required to delete a duplicate code. Token based approach becomes less confidence because more no of false positives are detected in this (as the case with ccfinder in Figure 4). Tree based approach leads to a far better confidence because it ignores accidental matches. Tree Based Technique(Syntactic approaches) use a parser to convert source programs into parse trees or abstract syntax trees (ASTs) which are then processed using tree-matching to find clones. However tree based technique is more accurate because this technique search clones syntactically, by comparing the syntax trees [14] and ignores accidental matches. 44 Copyright c 2017 SERSC

15 E) Focus In our case study we noticed that all the techniques were able to focus on the entire project at once, including all the classes and methods used in the whole-project. But for large LOC, it won t be possible to focus on whole project at once. Use of swapping is needed in those cases. Table 8. Clone Detection Run Time Duration Case Study Lines of Code Clone Detection run Duration Netbeans Javadoc SLOC-2852 DuDe- 11 secs solidsdd-4 sec clonedr secs Java Quizz 10 DuDe- 4 secs solidsdd-3 sec clonedr sces Jboss SP1-src SLOC Dude secs solidsdd- 82 secs clonedr Secs Netbeans Javadoc Dude Netbeans Javadoc solidsdd 4000 Netbeans Javadoc clonedr Java Quizz Dude 3000 Java Quizz solidsdd Java Quizz clonedr 2000 Jboss SP1-src Dude Jboss SP1-src solisdd 1000 Jboss SP1-src clonedr 0 Clone Detection run Duration Figure 7. Clone Detection Speed by the Tools Table 9. Observation on the Basis of Mentioned Criteria Criteria Relevance Confidence Focus Suitability Applicability of Detection Techniques chosen All Are Same Text based All are same Tree Based F) Scalable to Speed In the three case studies, it is clear from Table8 that Token based technique is scalable to all kinds of projects from short lines of code (Java Quizz) to large scale projects (Jboss SP1-src). But text based technique takes more time to detect clones among these three cases. Speed of tools used is shown graphically in Figure 7. Table 9 describe observation of chosen criteria on respective techniques. Copyright c 2017 SERSC 45

16 5.2. Comparison of Tools Used To study the three techniques discussed in this paper, we worked on different automated tools which operate on different background techniques (as described in Table10). Four tools are used: DuDe, solidsdd, CCFinder and clonedr. For comparison of tools used above, we described the properties of clone detection tools according to following criteria (Table11 shows details of comparison): Platform: Platform describes the execution platform for tool. Table 10. Tools and their Background Technique Tool DuDe solidsdd CCFinder clonedr Working Technique Text-based approach Token Based Approach Token Based Approach Tree- Based Approach Special Environment: whether the tool requires a special environment for operating. Availability: whether the tool is freely available, or under evaluation version or need any license under which the tool is made available. User Interface: This describes whether the tool is graphically interactive or it is used command line. Output: The Output indicates the kind of output supported by the particular tool. Like some tools provide information textually with file name and begin-end line numbers of the cloned fragments, some tools give original source of the cloned fragments in some format, some tools show scatter plot cloned code Clone Relation: The Clone Relation describes how clones are reported as clone pairs, clone classes, or clone-sets. Types of clones: whether tool detects type-1, type-2 or type-3 clones. Criteria Platform Special Environment Availability User Interface Table 11. Comparison of Tools Description of Tool Used Dude- run has been on windows, no other information available ccfinder- Run on windows, but also support Linux solidsdd is platform independent CloneDR- has been run on windows DuDe available in JAR file, JRE 1.5 required CCFinder- python and JRE 1.5 required fir its working solidsdd- no extra software required for running clonedr- no extra software required Dude- evaluation version available on request CCFinder- freely available for research solidsdd there is free available evaluation license clonedr- This tool is freely available for research All these tools provides graphical user interface to operate 46 Copyright c 2017 SERSC

17 Output Type of Clone detected Clone Relation DuDe Shows results textually (but its output is most suited and easily understandable) CCFinder- shows results textually, in scatter plot and also shows scrap book solidsdd- Shows results textually clonedr- Provides result in the form of a web page(html page) on the system directory DuDe: detects type-1, type-2 and type-3 CCFinder: detects type-1 and type-2 solidsdd : detects type-1, type-2 and type-3 clonedr: detects type-1, type-2 and type-3 Dude- produces clone pairs CCFinder- produces clone sets solidsdd- produces clone pairs clonedr- Produces clone-sets DuDe: Although DuDe [15] is text-based, but it can combine small duplicated segments to form larger ones by allowing gaps in duplicate segments thus able to detect near miss clones. In our case study also, DuDe is capable to detect type-3 clones. CCFinder is token based and detects type-1 and type-2 clones solidsdd: is also based on token based approach, but it detects type-1, type-2 and also type-3 clones. This tool shows detail of inserted and deleted lines in two copied fragments. In Figure8 detail given by tool solidsdd is shown, like in two files: Reference file: ExternalJavadocExecu torbeaninfo.java Duplicate File: JavadocModule.java Line number 27 is modified to line number 28 in same file, but lines 35 and 36 are deleted from reference file and rest of the clone is copied in duplicate file (as shown in Figure7). Therefore, solid SDD gives a better view about the source code. clonedr: In clonedr, a compiler generator is used to generate an annotated parse tree (AST) and compares its subtrees by characterization metrics based on a hash function. Source code of similar subtrees is then returned as clones. The hash function enables one to do parameterized matching and to detect gapped clones, especially if the gaps are within a line [14]. It detects type-1, type-2 and type-3 clones. Copyright c 2017 SERSC 47

18 Modifications (solid-sdd view) 27 : desc.setdisplayname (NbBundle.getMessage (ExternalJavadocExecutorBeanInfo.class, "CTL_Javadoc_executor")); //NOI18N 28 : desc.setshortdescription (NbBundle.getMessage (ExternalJavadocExecutorBeanInfo.class, HINT_Javadoc_executor")) 87 : bd.setvalue ("global", Boolean.TRUE) Deletions 35 : if (Boolean.getBoolean ("netbeans.debug.exceptions")) //NOI18N 36 : ie.printstacktrace () 6. Conclusions and Future Work Figure 8. Output of solid-sdd In this paper, we have focused on clone detection techniques and tools, providing a review of tools and techniques. Previous researches show that token based approach returns more false positives than other techniques. Text-based and the token-based tools had very similar recall values, but their Precisions were different due to the fact that they found different numbers of clone pairs. The tree-based tool had higher recall value, and less precision value than the tools in the other techniques. Token based, tree based and metric based techniques are also helpful in combination with refactoring tools. An attempt is made to evaluate the performance of Text-based, Tree-based and Token-based approaches for detecting duplicate code. The respective techniques are examined by the use of DuDe, CcFinder, solidsdd and clonedr tools. It has been observed that text based-approach is best suited for restructuring with little effort and find exact-clones. Tree based approach works effectively to find near-miss clones, and it ignores accidental matches. Token based approach also detects exact-clones as well as near-miss clones, but it strongly depends upon language constructs. In terms of scalability to speed, Token based technique is adjustable to short as well as large scale projects. All the techniques provide file-level, class level and clone-set details. Actual evaluation of relevant codeclone is measured by comparing the same-picture of one code-clone-reported-portion by three techniques to have a parallel view. Among the tools used: Dude is lass suitable to large scale projects, Dude takes maximum time to evaluate for large LOC. But solidsdd is suitable to all kinds of application from less LOC to large LOC and clonedr also takes more time to evaluate large projects. The work can be extended, by applying Metric based approach and study its performance in conjunction with evaluated techniques. Refactoring components can also be determined on the basis of metrics-based-approach. References [1] N. F. Schneidewind and H. Hoffman, An experiment in software error data collection and analysis, IEEE Transaction on Software Engineering, vol. 5, no. 3, (1979), pp [2] W. Yang. Identifying syntactic differences between two programs. Software Practice and Experience, vol. 21, no. 7, (1991), pp [3] J. H. Johnson, Identifying redundancy in source code using fingerprints, CASCON. IBM Press, (1993). [4] J. Mayrand, C. Leblanc and E. M. Merlo, Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics, Proceeding of IEEE Int l Conf. on Software Maintenance(ICSM) 96, (1996), pp [5] D. Baxter, A. Yahin, L. Moura, M. Sant Anna and L. Bier, Clone Detection Using Abstract Syntax Trees, In ICSM, (1998). [6] S. Ducasse, M. Rieger and S. Demeyer, A Language Independent Approach for Detecting Duplicated Code, ICSM, (1999). [7] T. Kamiya, S. Kusumoto and K. Inoue, A Token Based code clone detection tool-ccfinder and its 48 Copyright c 2017 SERSC

19 empirical evaluation, Technical Report (2000). [8] T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code, IEEE, vol. 28, no. 7, (2002). [9] The Source for Java Technology, (2002). [10] C. Liu, C. Chen and J. Han, GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis, in the proceedings of 12th ACM SIGKDD International Conference on knowledge discovery and data mining, (2006), pp [11] R. Koschke, R. Falke and P. Frenzel, Clone Detection Using Abstract Syntax Suffix Trees, Proceedings of the 13th Working Conference on Reverse Engineering, WCRE 2006, (2006), pp [12] C. K. Roy, J. R. Cordy and R. Koschke, Comparison and evaluation of code clone Detection Techniques and Tools: A Qualitative Approach, Science of Computer Programming, vol. 74, no. 7, (2009), pp [13] F. Van Rysselberghe and S. Demeyer Evaluating clone detection techniques from a refactoring perspective, Lab on Re-Engineering, University of Antwerp. [14] C. K. Roy, J. R. Cordya, Rainer Koschke, Comparison and Evaluation of Code Clone Detection Techniques and Tools: A Qualitative Approach, School of Computing, Queen s University, Canada University of Bremen, Germany. [15] R. Wettel and R. Marinescu, Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments, Proceedings of the 7th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC (2005). [16] D. Meyer, Analyzing the Robustness of Clone Detection Tools Regarding Code Obfuscation, University of Magdeburg School of Computer Science, (2012) October. [17] [18] ccfinder is available at http: [19] [20] [21] Y. Ueda, T. Kamiya, S. Kusumoto and K. Inoue, Gemini: Maintenance Support Environment Based on Code Clone Analysis, Proceedings of the eighth IEEE Symposium on software metrics, Ottawa, Canada, (2002). [22] B. S. Baker, A Program for Identifying Duplicated Code, Proceedings Computing Science and Statistics: 24th Symp. Interface, vol. 24, (1992) March, pp [23] R. Falke, R. Koschke and P. Frenzel, Empirical Evaluation of Clone Detection Using Syntax Suffix Trees, Empirical Software Engineering, vol. 13, (2008), pp [24] L. Jiang, G. Misherghi, Z. Su and S. Glondu, DECKARD: Scalable and Accurate Tree-based Detection of Code Clones, in: Proceedings of the 29th International Conference on Software Engineering, ICSE 2007, (2007), pp [25] S. Lee and I. Jeong, SDD: High performance Code Clone Detection System for Large Scale Source Code, in: Proceedings of the Object Oriented Programming Systems Languages and Applications Companion to the 20 th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications, OOPSLA Companion 2005, pp (2005). [26] C.K. Roy and J.R. Cordy, NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization, in: Proceedings of the 16th IEEE International Conference on Program Comprehension, ICPC 2008, pp (2008). [20] P. Bulychev and M. Minea, Duplicate Code Detection Using Anti-Unification, in: Spring Young Researchers Colloquium on Software Engineering, SYRCoSE 2008,4 pp. (2008). [27] Minhaz F. Zibran, Chanchal K. Roy: IDE-based Real-time Focused Search for Near-miss Clones in SAC 12 March 25-29, 2012, Riva del Garda, Italy. Copyright 2011 ACM /12/03 [28] Fabio Calefato, Filippo Lanubile, Teresa Mallardo, "Function Clone Detection in Web Applications: A Semiautomated Approach", Journal of Web Engineering, Vol. 3, No.1, pp , [29] Muhammad Asaduzzaman, Visualization and Analysis of Software Clones, A Thesis in the Department of Computer Science University of Saskatchewan Saskatoon January [30] R.Wettel and R. Marinescu, Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments, in:proceedings of the 7th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing, SYNASC, (2005), p. 8. [31] S. Ducasse, M. Rieger and S. Demeyer, A Language Independent Approach for Detecting Duplicated Code, in:proceedings of the 15th International Conference on Software Maintenance, ICSM 1999, (1999), pp [32] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code, IEEE Transactions on Software Engineering, vol. 32, no. 3, (2006), pp [33] Tool Clone Detective (part of ConQAT). URL Page Last accessed November [34] Tool SimScan, URL Last accessed November [35] P. Bulychev and M. Minea, Duplicate Code Detection Using Anti-Unification, in: Spring Young Researchers Colloquium on Software Engineering, SYRCoSE 2008, (2008), p. 4 Copyright c 2017 SERSC 49

20 [36] [37] Saeed Shafieian, Ying Zou, Comparison of Clone Detection Techniques, Technical Report, pp [38] K. Kontogiannis, R. DeMori, E. Merlo, M. Galler, and M. Bernstein, Pattern Matching for Clone and Concept Detection, Journal of Automated Software Engineering, vol.3, no. 1-2, (1996), pp Copyright c 2017 SERSC

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones Detection using Textual and Metric Analysis to figure out all Types of s Kodhai.E 1, Perumal.A 2, and Kanmani.S 3 1 SMVEC, Dept. of Information Technology, Puducherry, India Email: kodhaiej@yahoo.co.in

More information

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection of Potential

More information

Token based clone detection using program slicing

Token based clone detection using program slicing Token based clone detection using program slicing Rajnish Kumar PEC University of Technology Rajnish_pawar90@yahoo.com Prof. Shilpa PEC University of Technology Shilpaverma.pec@gmail.com Abstract Software

More information

On Refactoring for Open Source Java Program

On Refactoring for Open Source Java Program On Refactoring for Open Source Java Program Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 and Yoshio Kataoka 3 1 Graduate School of Information Science and Technology, Osaka University

More information

Refactoring Support Based on Code Clone Analysis

Refactoring Support Based on Code Clone Analysis Refactoring Support Based on Code Clone Analysis Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1 and Katsuro Inoue 1 1 Graduate School of Information Science and Technology, Osaka University, Toyonaka,

More information

COMPARISON AND EVALUATION ON METRICS

COMPARISON AND EVALUATION ON METRICS COMPARISON AND EVALUATION ON METRICS BASED APPROACH FOR DETECTING CODE CLONE D. Gayathri Devi 1 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu dgayadevi@gmail.com Abstract

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

Code duplication in Software Systems: A Survey

Code duplication in Software Systems: A Survey Code duplication in Software Systems: A Survey G. Anil kumar 1 Dr. C.R.K.Reddy 2 Dr. A. Govardhan 3 A. Ratna Raju 4 1,4 MGIT, Dept. of Computer science, Hyderabad, India Email: anilgkumar@mgit.ac.in, ratnaraju@mgit.ac.in

More information

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.

More information

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja

More information

Detection of Non Continguous Clones in Software using Program Slicing

Detection of Non Continguous Clones in Software using Program Slicing Detection of Non Continguous Clones in Software using Program Slicing Er. Richa Grover 1 Er. Narender Rana 2 M.Tech in CSE 1 Astt. Proff. In C.S.E 2 GITM, Kurukshetra University, INDIA Abstract Code duplication

More information

A Novel Technique for Retrieving Source Code Duplication

A Novel Technique for Retrieving Source Code Duplication A Novel Technique for Retrieving Source Code Duplication Yoshihisa Udagawa Computer Science Department, Faculty of Engineering Tokyo Polytechnic University Atsugi-city, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp

More information

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim Lecture 25 Clone Detection CCFinder Today s Agenda (1) Recap of Polymetric Views Class Presentation Suchitra (advocate) Reza (skeptic) Today s Agenda (2) CCFinder, Kamiya et al. TSE 2002 Recap of Polymetric

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su Automatic Mining of Functionally Equivalent Code Fragments via Random Testing Lingxiao Jiang and Zhendong Su Cloning in Software Development How New Software Product Cloning in Software Development Search

More information

Searching for Configurations in Clone Evaluation A Replication Study

Searching for Configurations in Clone Evaluation A Replication Study Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College

More information

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar,

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, 2 Head of Department, Department of Computer Science & Engineering, Universal Institute of Engineering

More information

A Survey of Software Clone Detection Techniques

A Survey of Software Clone Detection Techniques A Survey of Software Detection Techniques Abdullah Sheneamer Department of Computer Science University of Colorado at Colo. Springs, USA Colorado Springs, USA asheneam@uccs.edu Jugal Kalita Department

More information

Code Clone Detector: A Hybrid Approach on Java Byte Code

Code Clone Detector: A Hybrid Approach on Java Byte Code Code Clone Detector: A Hybrid Approach on Java Byte Code Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted By

More information

Visualization of Clone Detection Results

Visualization of Clone Detection Results Visualization of Clone Detection Results Robert Tairas and Jeff Gray Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham, AL 5294-1170 1-205-94-221 {tairasr,

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization 2017 24th Asia-Pacific Software Engineering Conference CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization Yuichi Semura, Norihiro Yoshida, Eunjong Choi and Katsuro Inoue Osaka University,

More information

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Accuracy Enhancement in Code Clone Detection Using Advance Normalization Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research

More information

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique 1 Syed MohdFazalulHaque, 2 Dr. V Srikanth, 3 Dr. E. Sreenivasa Reddy 1 Maulana Azad National Urdu University, 2 Professor,

More information

EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS

EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS Rupinder Kaur, Harpreet Kaur, Prabhjot Kaur Abstract The area of clone detection has considerably evolved over the last decade, leading to

More information

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF

More information

ISSN: (PRINT) ISSN: (ONLINE)

ISSN: (PRINT) ISSN: (ONLINE) IJRECE VOL. 5 ISSUE 2 APR.-JUNE. 217 ISSN: 2393-928 (PRINT) ISSN: 2348-2281 (ONLINE) Code Clone Detection Using Metrics Based Technique and Classification using Neural Network Sukhpreet Kaur 1, Prof. Manpreet

More information

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON,

More information

Incremental Clone Detection and Elimination for Erlang Programs

Incremental Clone Detection and Elimination for Erlang Programs Incremental Clone Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code

More information

On Refactoring Support Based on Code Clone Dependency Relation

On Refactoring Support Based on Code Clone Dependency Relation On Refactoring Support Based on Code Dependency Relation Norihiro Yoshida 1, Yoshiki Higo 1, Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 1 Graduate School of Information Science and Technology,

More information

On the Robustness of Clone Detection to Code Obfuscation

On the Robustness of Clone Detection to Code Obfuscation On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de

More information

A Measurement of Similarity to Identify Identical Code Clones

A Measurement of Similarity to Identify Identical Code Clones The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department

More information

Clone Detection Using Abstract Syntax Suffix Trees

Clone Detection Using Abstract Syntax Suffix Trees Clone Detection Using Abstract Syntax Suffix Trees Rainer Koschke, Raimar Falke, Pierre Frenzel University of Bremen, Germany http://www.informatik.uni-bremen.de/st/ {koschke,rfalke,saint}@informatik.uni-bremen.de

More information

Detection and Analysis of Software Clones

Detection and Analysis of Software Clones Detection and Analysis of Software Clones By Abdullah Mohammad Sheneamer M.S., University of Colorado at Colorado Springs, Computer Science, USA, 2012 B.S., University of King Abdulaziz, Computer Science,

More information

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014)

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Electronic Communications of the EASST Volume 63 (2014) Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Toward a Code-Clone Search through the Entire Lifecycle Position

More information

Master Thesis. Type-3 Code Clone Detection Using The Smith-Waterman Algorithm

Master Thesis. Type-3 Code Clone Detection Using The Smith-Waterman Algorithm Master Thesis Title Type-3 Code Clone Detection Using The Smith-Waterman Algorithm Supervisor Prof. Shinji KUSUMOTO by Hiroaki MURAKAMI February 5, 2013 Department of Computer Science Graduate School of

More information

Gapped Code Clone Detection with Lightweight Source Code Analysis

Gapped Code Clone Detection with Lightweight Source Code Analysis Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Scenario-Based Comparison of Clone Detection Techniques

Scenario-Based Comparison of Clone Detection Techniques The 16th IEEE International Conference on Program Comprehension Scenario-Based Comparison of Clone Detection Techniques Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston,

More information

Software Clone Detection and Refactoring

Software Clone Detection and Refactoring Software Clone Detection and Refactoring Francesca Arcelli Fontana *, Marco Zanoni *, Andrea Ranchetti * and Davide Ranchetti * * University of Milano-Bicocca, Viale Sarca, 336, 20126 Milano, Italy, {arcelli,marco.zanoni}@disco.unimib.it,

More information

Software Clone Detection Using Cosine Distance Similarity

Software Clone Detection Using Cosine Distance Similarity Software Clone Detection Using Cosine Distance Similarity A Dissertation SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE & ENGINEERING

More information

Compiling clones: What happens?

Compiling clones: What happens? Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca

More information

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu Deckard: Scalable and Accurate Tree-based Detection of Code Clones Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu The Problem Find similar code in large code bases, often referred to as

More information

KClone: A Proposed Approach to Fast Precise Code Clone Detection

KClone: A Proposed Approach to Fast Precise Code Clone Detection KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3

More information

A Tree Kernel Based Approach for Clone Detection

A Tree Kernel Based Approach for Clone Detection A Tree Kernel Based Approach for Clone Detection Anna Corazza 1, Sergio Di Martino 1, Valerio Maggio 1, Giuseppe Scanniello 2 1) University of Naples Federico II 2) University of Basilicata Outline Background

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

Sub-clones: Considering the Part Rather than the Whole

Sub-clones: Considering the Part Rather than the Whole Sub-clones: Considering the Part Rather than the Whole Robert Tairas 1 and Jeff Gray 2 1 Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 2 Department

More information

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Study of Different

More information

Code Clone Detection Technique Using Program Execution Traces

Code Clone Detection Technique Using Program Execution Traces 1,a) 2,b) 1,c) Code Clone Detection Technique Using Program Execution Traces Masakazu Ioka 1,a) Norihiro Yoshida 2,b) Katsuro Inoue 1,c) Abstract: Code clone is a code fragment that has identical or similar

More information

A Technique to Detect Multi-grained Code Clones

A Technique to Detect Multi-grained Code Clones Detection Time The Number of Detectable Clones A Technique to Detect Multi-grained Code Clones Yusuke Yuki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Enhancing Source-Based Clone Detection Using Intermediate Representation

Enhancing Source-Based Clone Detection Using Intermediate Representation Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software

More information

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON, Canada K7L 3N6 {croy,

More information

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented

More information

Similar Code Detection and Elimination for Erlang Programs

Similar Code Detection and Elimination for Erlang Programs Similar Code Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code smell

More information

Detection and Behavior Identification of Higher-Level Clones in Software

Detection and Behavior Identification of Higher-Level Clones in Software Detection and Behavior Identification of Higher-Level Clones in Software Swarupa S. Bongale, Prof. K. B. Manwade D. Y. Patil College of Engg. & Tech., Shivaji University Kolhapur, India Ashokrao Mane Group

More information

Clone Detection Using Scope Trees

Clone Detection Using Scope Trees Int'l Conf. Software Eng. Research and Practice SERP'18 193 Clone Detection Using Scope Trees M. Mohammed and J. Fawcett Department of Computer Science and Electrical Engineering, Syracuse University,

More information

An Effective Approach for Detecting Code Clones

An Effective Approach for Detecting Code Clones An Effective Approach for Detecting Code Clones Girija Gupta #1, Indu Singh *2 # M.Tech Student( CSE) JCD College of Engineering, Affiliated to Guru Jambheshwar University,Hisar,India * Assistant Professor(

More information

Zjednodušení zdrojového kódu pomocí grafové struktury

Zjednodušení zdrojového kódu pomocí grafové struktury Zjednodušení zdrojového kódu pomocí grafové struktury Ing. Tomáš Bublík 1. Introduction Nowadays, there is lot of programming languages. These languages differ in syntax, usage, and processing. Keep in

More information

Clone Detection via Structural Abstraction

Clone Detection via Structural Abstraction Clone Detection via Structural Abstraction William S. Evans will@cs.ubc.ca Christopher W. Fraser cwfraser@gmail.com Fei Ma Fei.Ma@microsoft.com Abstract This paper describes the design, implementation,

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files IEICE TRANS. INF. & SYST., VOL.E98 D, NO.2 FEBRUARY 2015 325 PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files Eunjong CHOI a), Nonmember, Norihiro YOSHIDA,

More information

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo.

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo. DCC / ICEx / UFMG Software Code Clone Eduardo Figueiredo http://www.dcc.ufmg.br/~figueiredo Code Clone Code Clone, also called Duplicated Code, is a well known code smell in software systems Code clones

More information

An Approach to Detect Clones in Class Diagram Based on Suffix Array

An Approach to Detect Clones in Class Diagram Based on Suffix Array An Approach to Detect Clones in Class Diagram Based on Suffix Array Amandeep Kaur, Computer Science and Engg. Department, BBSBEC Fatehgarh Sahib, Punjab, India. Manpreet Kaur, Computer Science and Engg.

More information

Problematic Code Clones Identification using Multiple Detection Results

Problematic Code Clones Identification using Multiple Detection Results Problematic Code Clones Identification using Multiple Detection Results Yoshiki Higo, Ken-ichi Sawa, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, 1-5, Yamadaoka,

More information

Code Clone Analysis and Application

Code Clone Analysis and Application Code Clone Analysis and Application Katsuro Inoue Osaka University Talk Structure Clone Detection CCFinder and Associate Tools Applications Summary of Code Clone Analysis and Application Clone Detection

More information

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code Junaid Akram (Member, IEEE), Zhendong Shi, Majid Mumtaz and Luo Ping State Key Laboratory of Information Security,

More information

Towards the Code Clone Analysis in Heterogeneous Software Products

Towards the Code Clone Analysis in Heterogeneous Software Products Towards the Code Clone Analysis in Heterogeneous Software Products 11 TIJANA VISLAVSKI, ZORAN BUDIMAC AND GORDANA RAKIĆ, University of Novi Sad Code clones are parts of source code that were usually created

More information

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Design Code Clone Detection System uses

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

SourcererCC -- Scaling Code Clone Detection to Big-Code

SourcererCC -- Scaling Code Clone Detection to Big-Code SourcererCC -- Scaling Code Clone Detection to Big-Code What did this paper do? SourcererCC a token-based clone detector, that can detect both exact and near-miss clones from large inter project repositories

More information

Code Similarity Detection by Program Dependence Graph

Code Similarity Detection by Program Dependence Graph 2016 International Conference on Computer Engineering and Information Systems (CEIS-16) Code Similarity Detection by Program Dependence Graph Zhen Zhang, Hai-Hua Yan, Xiao-Wei Zhang Dept. of Computer Science,

More information

Tool Support for Refactoring Duplicated OO Code

Tool Support for Refactoring Duplicated OO Code Tool Support for Refactoring Duplicated OO Code Stéphane Ducasse and Matthias Rieger and Georges Golomingi Software Composition Group, Institut für Informatik (IAM) Universität Bern, Neubrückstrasse 10,

More information

Identification of File and Directory Level Near-Miss Clones For Higher Level Cloning Sonam Gupta, Vishwachi

Identification of File and Directory Level Near-Miss Clones For Higher Level Cloning Sonam Gupta, Vishwachi International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249 8958, Volume-3, Issue-8 Identification of File and Directory Level Near-Miss Clones For Higher Level Cloning Sonam Gupta,

More information

Parallel and Distributed Code Clone Detection using Sequential Pattern Mining

Parallel and Distributed Code Clone Detection using Sequential Pattern Mining Parallel and Distributed Code Clone Detection using Sequential Pattern Mining Ali El-Matarawy Faculty of Computers and Information, Cairo University Mohammad El-Ramly Faculty of Computers and Information,

More information

Visual Detection of Duplicated Code

Visual Detection of Duplicated Code Visual Detection of Duplicated Code Matthias Rieger, Stéphane Ducasse Software Composition Group, University of Berne ducasse,rieger@iam.unibe.ch http://www.iam.unibe.ch/scg/ Abstract Code duplication

More information

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT Ginika Mahajan* Ashima** ABSTRACT Software systems are evolving by adding new functions and modifying existing functions over time. Through the evolution,

More information

2IS55 Software Evolution. Code duplication. Alexander Serebrenik

2IS55 Software Evolution. Code duplication. Alexander Serebrenik 2IS55 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 2: February 28, 2014, 23:59. Assignment 3 already open. Code duplication Individual Deadline: March 17, 2013, 23:59.

More information

Falsification: An Advanced Tool for Detection of Duplex Code

Falsification: An Advanced Tool for Detection of Duplex Code Indian Journal of Science and Technology, Vol 9(39), DOI: 10.17485/ijst/2016/v9i39/96195, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Falsification: An Advanced Tool for Detection of

More information

2IS55 Software Evolution. Code duplication. Alexander Serebrenik

2IS55 Software Evolution. Code duplication. Alexander Serebrenik 2IS55 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 2: March 5, 2013, 23:59. Assignment 3 already open. Code duplication Individual Deadline: March 12, 2013, 23:59. /

More information

Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments

Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments Archeology of Code Duplication: Recovering Duplication Chains From Small Duplication Fragments Richard Wettel Radu Marinescu LOOSE Research Group Institute e-austria Timişoara, Romania {wettel,radum}@cs.utt.ro

More information

Research Article Software Clone Detection and Refactoring

Research Article Software Clone Detection and Refactoring ISRN Software Engineering Volume 2013, Article ID 129437, 8 pages http://dx.doi.org/10.1155/2013/129437 Research Article Software Clone Detection and Refactoring Francesca Arcelli Fontana, Marco Zanoni,

More information

Rochester Institute of Technology. Making personalized education scalable using Sequence Alignment Algorithm

Rochester Institute of Technology. Making personalized education scalable using Sequence Alignment Algorithm Rochester Institute of Technology Making personalized education scalable using Sequence Alignment Algorithm Submitted by: Lakhan Bhojwani Advisor: Dr. Carlos Rivero 1 1. Abstract There are many ways proposed

More information

Clone Detection Using Dependence. Analysis and Lexical Analysis. Final Report

Clone Detection Using Dependence. Analysis and Lexical Analysis. Final Report Clone Detection Using Dependence Analysis and Lexical Analysis Final Report Yue JIA 0636332 Supervised by Professor Mark Harman Department of Computer Science King s College London September 2007 Acknowledgments

More information

Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2

Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2 IJSRD - International Journal for Scientific Research & Development Vol. 2, Issue 03, 2014 ISSN (online): 2321-0613 Reverse Software Engineering Using UML tools Jalak Vora 1 Ravi Zala 2 1, 2 Department

More information

Thomas LaToza 5/5/2005 A Literature Review of Clone Detection Analysis

Thomas LaToza 5/5/2005 A Literature Review of Clone Detection Analysis Thomas LaToza 5/5/2005 A Literature Review of Clone Detection Analysis Introduction Code clones, pieces of code similar enough to be considered duplicates or clones of the same functionality, are a problem.

More information

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Jeffrey Svajlenko Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Saskatoon,

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 2, Mar-Apr 2015 RESEARCH ARTICLE Code Clone Detection and Analysis Using Software Metrics and Neural Network-A Literature Review Balwinder Kumar [1], Dr. Satwinder Singh [2] Department of Computer Science Engineering

More information

Dealing with Clones in Software : A Practical Approach from Detection towards Management

Dealing with Clones in Software : A Practical Approach from Detection towards Management Dealing with Clones in Software : A Practical Approach from Detection towards Management A Thesis Submitted to the College of Graduate Studies and Research in Partial Fulfillment of the Requirements for

More information

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and

More information

2IMP25 Software Evolution. Code duplication. Alexander Serebrenik

2IMP25 Software Evolution. Code duplication. Alexander Serebrenik 2IMP25 Software Evolution Code duplication Alexander Serebrenik Assignments Assignment 1 Median 7, mean 6.87 My grades: 3-3-1-1-2-1-4 You ve done much better than me ;-) Clear, fair grading BUT tedious

More information

MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY

MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY MACHINE LEARNING FOR SOFTWARE MAINTAINABILITY Anna Corazza, Sergio Di Martino, Valerio Maggio Alessandro Moschitti, Andrea Passerini, Giuseppe Scanniello, Fabrizio Silverstri JIMSE 2012 August 28, 2012

More information

CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH

CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH by Angad Singh Gakhar, B.Tech., Guru Gobind Singh Indraprastha University, 2009 A thesis submitted to the Faculty of Graduate and Postdoctoral

More information

Designing a Semantic Ground Truth for Mathematical Formulas

Designing a Semantic Ground Truth for Mathematical Formulas Designing a Semantic Ground Truth for Mathematical Formulas Alan Sexton 1, Volker Sorge 1, and Masakazu Suzuki 2 1 School of Computer Science, University of Birmingham, UK, A.P.Sexton V.Sorge@cs.bham.ac.uk,

More information

Clone Detection and Removal for Erlang/OTP within a Refactoring Environment

Clone Detection and Removal for Erlang/OTP within a Refactoring Environment Clone Detection and Removal for Erlang/OTP within a Refactoring Environment Huiqing Li Computing Laboratory, University of Kent, UK H.Li@kent.ac.uk Simon Thompson Computing Laboratory, University of Kent,

More information

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY Yoshihisa Udagawa Faculty of Engineering, Tokyo Polytechnic University, Atsugi City, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp ABSTRACT Duplicate code

More information

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones Overview Code Clones Definition and categories Clone detection Clone removal refactoring Spiros Mancoridis[1] Modified by Na Meng 2 Code Clones Code clone is a code fragment in source files that is identical

More information

Detecting code re-use potential

Detecting code re-use potential Detecting code re-use potential Mario Konecki, Tihomir Orehovački, Alen Lovrenčić Faculty of Organization and Informatics University of Zagreb Pavlinska 2, 42000 Varaždin, Croatia {mario.konecki, tihomir.orehovacki,

More information

The University of Saskatchewan Department of Computer Science. Technical Report #

The University of Saskatchewan Department of Computer Science. Technical Report # The University of Saskatchewan Department of Computer Science Technical Report #2012-03 The Road to Software Clone Management: ASurvey Minhaz F. Zibran Chanchal K. Roy {minhaz.zibran, chanchal.roy}@usask.ca

More information

Er. Himanshi Vashisht, Sanjay Bharadwaj, Sushma Sharma

Er. Himanshi Vashisht, Sanjay Bharadwaj, Sushma Sharma International Journal Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 8 ISSN : 2456-3307 DOI : https://doi.org/10.32628/cseit183833 Impact

More information