Clone Detection Using Scope Trees

Size: px
Start display at page:

Download "Clone Detection Using Scope Trees"

Transcription

1 Int'l Conf. Software Eng. Research and Practice SERP' Clone Detection Using Scope Trees M. Mohammed and J. Fawcett Department of Computer Science and Electrical Engineering, Syracuse University, Syracuse, USA Abstract - Code clones are exact or nearly exact matching code segments in software source code. Clones make software maintenance difficult. Identification and management of code clones is an important part of software maintenance. This paper proposes a clone detection system that uses a scope tree representation of the structure of source files, using several measures of similarity. The method can be used for detecting clones in C, C++, Java, and C#. An experimental clone detection tool is evaluated based on its analysis of OpenSSL and Notepad++. Using scopes makes the clones an effective guide for code restructuring. Our approach is also compared with other clone detection tools: CCFinder and Deckard. The results are encouraging both in terms of efficiency as well as detection accuracy. In addition, we expect our method to scale well for large systems based on its computational complexity and the fact that much of the analysis is readily implemented with concurrent processing of each file analyzed. Keywords: Clone Detection, Maintenance, Restructuring, Scope Tree, Token. 1. Introduction A code clone is a segment of source code which is the same or similar to another segment in the same or another file. If software has clones, maintenance becomes burdensome [4]. When a change is required for a code segment having corresponding clones in other parts of the source, the change may be needed in all corresponding parts. Therefore, detecting and managing code clones is an essential part of software maintenance. Code may be cloned for several reasons [5]. Quick fixes using copy-and-paste is the most common one. Code cloning has pros and cons [6]. There is extensive research on clone detection. Some of the tools discussed, such as CCFinder, detect clones using methods focused on code segment texts and miss modified but important clones [1]. Others use AST based approaches to detect relevant clones. The drawback of these is that they discourage use in software maintenance process as building the AST itself is a complex process, even using available compiler components, like LLVM. Also, the resulting clones may not be suitable for restructuring. We propose a clone detection method using scope trees, similar to, but significantly simpler than ASTs, that yields relevant clones in software source code, for multiple files, relatively efficiently. The approach builds scope trees for each file using a light weight parser with tokens stored in blocks for a given scope. Structural matches within a file and among files is found using a structural match algorithm by comparing scope trees. Once a structural match is detected, tokens for corresponding scopes are compared using string comparison for exact or near matchings. An edit distance algorithm [13] is also adapted to detect modified code. The rest of the paper is organized as follows. Part II gives background about clone detection and scope trees. The proposed clone detection method is presented in part III. Details of the major algorithms is provided in part IV. Part V gives information about the implementation of our analysis tool. Part VI describes results obtained by running the clone detection tool with example test source code, and two open source software system codes to examine clone detection capability and to explore refactoring opportunities. Part VII concludes with a summary of results and planned future work. Part VIII summarizes related clone detection work. 2. Background This part gives a brief overview of clone detection and scope trees. Part 2.1 discusses clones; the concept of scope trees and how they are used in this paper is discussed in the second part, Code clone basics There is no unique agreed upon single definition of code clone [4]. However, several clone types have been defined [2, 3, 4, 5]: Type 1(exact clones) - these are fragments of code which are identical except for variation in white space and comments. Type 2 (renamed/parameterized clones) fragments of code which are structurally/syntactically the same except for changes in identifiers, literals, types, layout and comments. Type 3 (near miss clones) fragments of code that are copied with further modifications such as statement insertions/deletions in addition to changes in identifiers, literals, types and layouts.

2 194 Int'l Conf. Software Eng. Research and Practice SERP'18 Type 4 (semantic clones) fragments of code which are ally similar without having text similarity. In this work, we will address efficient detection and management of all the clone types except for type 4. In the next section, we focus on the method we used for clone detection Scope trees A scope Tree is the representation of source code using a tree structure with each scope being a node in the tree. A scope is usually delimited by open and closed braces. Functions, if,, switch statements, loops, enums, classes, structs, and namespaces are all scopes. An example is shown in Fig. 1 for pseudo code described in Table 1, File 1. Table 1 - File 1 pseudocode. Fun( ) B0 if(..) B2 B3 B4 for(...) B5 if(..) B51 B6 similar to Macabe s cyclomatic complexity [10]. However, the computation and its purpose are different. 3. Proposed method of clone detection This section discusses our clone detection method. First, a general overview is given using block diagrams. The last step, clone detections, shown in the following diagram is explained in detail, in section Description of clone detection computation The proposed method uses scope trees to detect clones in software system source code. The process is illustrated in figure.2. The block diagram shows four major steps, discussed in the following except the last one which is discussed in part Tokenizer This step takes the path to a set of files to analyze and generates a stream of tokens for each file. The result is given to the next step, rule-based parser. B3, 0 Fig. 1. Example file,file 1, simplified source and scope tree representation. The source file, File1, is represented as scopes having blocks representing sequence of tokens which are in turn representations of statements in the source file. In the scope tree diagram, the blocks are stored in a list. The index indicates the place where the blocks are found relative to the child scopes. For instance, B0,0 means block B0 is located before all the child scopes whereas B4, 2 means B4 is found next to the second child scope. In addition, scope cyclomatic complexity, henceforth called cyclomatic complexity, is defined based on the scope tree hierarchy. A leaf node has a cyclomatic complexity of one. Internal nodes have cyclomatic complexity of the sum of the complexities of child nodes plus one. This complexity is Fig. 2. Scope tree-based clone detection block diagram Rule based parser Rules are specified to extract data from source code. It takes the tokens from the previous step and generates scopes such as namespaces, classes, s, conditions, loops and even just blocks separated by curly braces. See [11] for details Scope tree builder This step builds a scope tree based on the containment of one scope in another. The output of this stage is scope tree,

3 Int'l Conf. Software Eng. Research and Practice SERP' like figure 1, for each file which will be used to check structural similarity in the next stage. Algorithm 1 describes how a scope tree is built. Token block extraction is done also using the same algorithm. This algorithm scans tokens to check start and end of scope and acts accordingly. Algorithm 1 - Scope tree builder BuildScopeTree() root = CreateNode( file ) BuildScopeTree(root) BuildScopeTree(current_node) for each tok of tokenized file If (start of scope) scope_node = CreateScopeNode() current.stack.push_back(scope_node) if (end of scope) current.blocks.puch_back(block) current.stack.pop() push token to block. In addition, cyclomatic complexity of each node is computed by assigning the value 1 to leaf nodes and the sum of the cyclomatic complexities of the child nodes plus 1 to the internal nodes. To make the structure matching step efficient, nodes are ordered based on their cyclomatic complexity and placed in buckets. The maximum bucket is determined from the cyclomatic complexities of the root nodes. Nodes having the same cyclomatic complexity are stored in the same bucket. For example, the root node of File 1, in figure 1, is stored in bucket 6 as it has cyclomatic complexity of 6. Functions fun of File 1 is stored in bucket 5 as it has complexity of 5. Buckets may be combined to detect modified code at the structural level. There will be only one scope structure that spans all of the analyzed files. In the next part the clone detection step is discussed. 4. Clone detections This is the last step shown in Fig. 1.. It detects matches within a file or among different files in the source file. It is a two-stage process. First structurally similar trees are identified. The output of this stage is given to the exact or near exact match step. This stage uses a relatively precise step to detect the clones. The following algorithm handles both cases. Algorithm 2 Matching algorithm matchtrees(node1, node2, isexact) NodePairQueue.push(n1, n2) While (!NodePairQueue.empty()) Node_pair = NodePairQueue.front NodePairQueue.pop() if(!isexact) Match = matchnodes(node_pair.first, node_pair.second) if(match < matchthreshold) return 0 // Algorithm 3. nodematch = matchnodesed(node_pair.first, node_pair.second, noofnodetokens) totalnumoftokens = totalnumberoftokens + noofnodetokens numofmatchedtreetokens += nodematch for each children node1 and node2 pair of nodepair.first and nodepair.second NodePairQueue.push(node1, node2) if(!isexact) Return match if(totalnumoftokens == 0) Return 1.0 return (numofmatchedtreetokens / totalnumoftokens) 4.1. Structural matching At this stage scope trees having same cyclomatic complexity or difference of 1 or 2, are retrieved from their buckets for comparison. Similarity checking is done using different criteria such as number of blocks, number of children, number tokens, number of identifiers and type of node. These are parameters with values we can change before clone detection starts. Two trees are structurally similar if the corresponding nodes are similar Exact or near exact match At this stage the structurally similar nodes are filtered with additional matching criteria. The blocks have sequences of tokens. By setting a certain level of accuracy say 0.90 comparison is done between nodes of the structurally similar nodes. This approach can detect modified code. To detect modified code, we also use an edit distance algorithm applied

4 196 Int'l Conf. Software Eng. Research and Practice SERP'18 to list of tokens. Algorithm 3 describes the edit distancebased algorithm. Algorithm 3 Modified edit distance algorithm matchnodeed(node1, node2) for each block in node1 for each token in block tokenlist1.push_back(token) for each block in node2 for each token in block tokenlist2.push_back(token) // EditDistanceStrLst is edit distance // algorithm for list of strings. editdist = EditDistanceStrLst(tokenList1, tokenlist2) nooftokens = max(len(tokenlist1, tokenlist2)) approxmatch = nooftokens editdist return approxmatch, nooftokens 5. Implementation As a proof of concept, a tool has been implemented using the C++ programming language. The tokenizer and rule-based parser are general purpose modules developed for source code analysis and have been used in previous research and class room projects. Some tuning is made to fit this work. The other components, Scope builder and Clone detection are implemented for this research. 6. Results and discussion We have carried out experiments on test code prepared for clone detection purposes by copy-and-pasting and systematically modifying files and fragments of code. First, we used c source code, implementing quick sort, to detect clones using our tool and two other clone detection tools: CCFinder [1] and Deckard [15]. For real case studies, we used Notepad++ [12] and OpenSSL [14]. Our tool also successfully detects clones in benchmark code specified in [2] Test code for clone detection To evaluate the clone detection capability of our clone detection, and two other tools: CCFinder and Deckard, we prepared test code examples by modifying an implementation of the quick sort algorithm. The algorithm is stored in a file (qsort_.c []). Various modifications are done to this file. The important for clone detection is the partition. Most of the modifications, 1a2 to 1a9, are based on this ; and these are for level clone detection even though each is stored separately in its respective file. The rest, 0 and 1, are prepared for file level clone detection. The original partition from is replaced with a different implementation in 0 leaving the rest of the s as they are; whereas, in 1, a different algorithm, merge sort algorithm is implemented leaving only two small s, display and main the same. Here are the descriptions of each test file used for clone detection: i. qsort_.c () quick sort implementation. ii. partition_1a2.c (1a2)- copy-and-pasted partition from original () with only identifier changes including name and parameters. iii. partion_1a3.c (1a3) If block, line 6-9, is added to the original () partition. iv. partion_1a4.c (1a4) block, line 12-13, added to the original() partition. v. partition_1a5.c (1a5) several comments are added to the original() partition. vi. Partition_1a6.c (1a6) - If block, line 6-9, is added to 1a2 s part before the outer loop. vii. Partition_1a7.c (1a7) - condition, line 16-17, is added to 1a6's part. viii. Partition_1a8.c(1a8) - comments are added at several places to 1a7. ix. Partition_1a9.c (1a9) the empty statement in 1a8 is modified by adding two statements. x. Qsort (0) copy-pasted with a completely different partition. However, the rest of the s are the same. xi. `Merge-sort_.c (1) this is a different algorithm, merge sort. However, two of the s are the same: display and main. The changes made in 1a2 to 1a5 are relative to the original file; whereas, the changes in 1a6 to 1a9 are cumulative, with 1a9 modified the most. We ran CCFinder[1], Deckard[15], and our tool with the above test files as input. To compare the clones detected, refer to figure 3, that shows the distinct parts of the code that will be used in the clone detection result. The block code insertions are shown with gray blocks. It shows most of the partition modifications. The content is actually 1a9 s, which is a result of all the modifications listed in 1a2 to 1a9. One can use figure 3, to make sense of most of the clone detection result discussions. The last two files, 0 and 1, are not shown in this result. Please see the link in reference [16] for these and the individual files for 1a9. Table 2 reports the clones detected by the three tools. As can be seen in Table 2, all three tools detected the same clone for 1a2 vs, 1a3 vs, and 1a5 vs. They detected whole partition, outer loop and whole partition respectively.

5 Int'l Conf. Software Eng. Research and Practice SERP' size_t part (int * data, int left, int right) int i = left + 1, j = right; // Left item is selected as pivot. int pivot = data[left]; // This case is redundant. Function upper If (l == r && data[pivot] <= data[r]) return pivot; If block (i <= j) (pivot < data[j]) j--; if (j == left) return left; Fig 3. Modified partition. Upper // No effect. Added to test clone. swap(&data[j], &data[left]); swap(&data[j], &data[left]); Else block (i < right && pivot > data[i]) i++; if (i == j) break; // Swapped to strictly put items to the left // and right of pivot. if (i < j) swap(&data[i], &data[j]); Lower swap(&data[left], &data[j]); return j; Function lower However, 1a4 vs, where an empty block is added to the outer loop of the partition resulted in different results. CCFinder detected only the lower part of the inner loop; but, Deckard detected the whole as clone. Our tool is slightly different here. It detected the whole partition ; however, it also reported block as code. It should be noted that 1a2 to 1a5 are modification of the original partition ; and the clones detected are at the level or within the. At the file level, all the three tools reported all the s other than the partition as clone for 0 vs ; however, only our tool detected the small s, main and display in 1 vs as clones. File Pairs 1 1a2 vs 2 1a3 vs 3 1a4 vs 4 1a5 vs 5 1a6 vs 6 1a7 vs 7 1a8 vs 8 1a9 vs 9 0 vs 10 1 vs 11 1a6 vs 1a2 12 1a7 vs 1a6 13 1a8 vs 1a7 14 1a9 vs 1a8 Table 2. Test code clones Clone Detection by: CCFinder Deckard Our Tool whole While Lower Part but lower Lower part lower Only partition different Upper, lower Only partition different but but Only partition different None None display and main Upper part, lower Upper part, lower Lower, upper part but but The files 1a6 to 1a7, cumulative modifications starting from a2, are used in two different ways. Rows 11 to 14 in Table 3, like rows 1-4 evaluated the effect of immediate change on clone detection; but rows 5-8 evaluated effect of the cumulative change as compared with the original file in. For the former one, all three tools reported the same clone for 1a6 vs 1a2 and 1a9 vs 1a8. However, for 1a7 vs 1a6,

6 198 Int'l Conf. Software Eng. Research and Practice SERP'18 CCFinder detected lower loop and upper part of the as clone; Deckard reported the whole as clone. Our tool detected the whole reporting the as such. For 1a9 vs 1a8, both Deckard reported fragments of or reported. Our tool reported the outer loop as clone reporting the block as. For rows 5 8, our tool entirely reported outer loop as clone with block. Deckard reported the outer loop as clone except in the last case it reported the lower and the upper part of the outer loop as clone. CCFinder, on the other hand, reported part of the loop in all the cases except the first one, row 5, outer loop detection. All in all, all the three tools detected similar codes and codes with identifier changes in an equivalent way. The difference in detecting clones became significant when codes are modified systematically. Deckard tolerated modification of code, like the case of empty insertion; however, it reported only the similar fragments if the change was above a certain threshold as defined by similarity metric [15]. CCFinder, on the other hand reported only fragments if there is any change, other than identifier changes, inside a given scope. Our tool includes the code as part of the clone detection report. However, if the scope dominated, the similar scopes, as determined by mutation threshold and a similarity threshold like Deckard s, our tool reported only the similar scopes. Reporting the scope as part of the clone report is one of the contribution of our work OpenSSL clones We also run the three tools with OpenSSL source code [14] as input. OpenSSL is suitable for clone detection study as it has similar cryptographic algorithms and routines. It is a widely used open source software library. It has more than 400, 000 lines of code. Unlike the test code clones, discussed above, where the expected clones are known in advance, here we report the approximate number of clones detected by each tool; and a few sample clone pairs for discussion. In addition, the number of overlapping clones between our tool and Deckard is given to see if our approach detects different clones than Deckard s. Unfortunately, we could not find a report that we can parse from CCFinder like ours and Deckard s. As a result, CCFinder is not considered in this comparison. The following table, Table 3, shows the total number of clones, CLOC (clone lines of code), and the metrics used to detect the clones. Two major metrics are used to filter small clone codes threshold and similarity. For CCFinder and Deckard, we used 50 tokens, the default, as the minimum threshold; whereas for our tool we used 5 CLOC as the minimum threshold. The major reason that we used CLOC as our threshold is that we used two separate clone matching steps for clone detection the exact clone matching step does not contain the entire tokens. Deckard, in addition to tokens uses stride to merge small clones. Table 3. OpenSSL clones Clone Detection Tool Metrics CCFinder Deckard Our Tool Threshold 50 tokens 50 tokens with stride 5 lines of code of 2 Similarity N/A CLOC overlap N/A Total CLOC Our tool and Deckard used 0.9 as the clone similarity level. We defined similarity as the ratio between the matched tokens and the total number of tokens in the pair of nodes, as defined in Algorithm 2; and Deckard s definition of similarity is provided in [15]. Deckard and our tool detected almost a 10-fold increase in the number of total CLOC as compared with CCFinder s. It should be noted that total CLOC for CCFinder is generated from the tool itself, so we could not do a calculation similar to our tool s and Deckard s calculation. In the latter case, the sum of CLOC for each clone is aggregated, using a separate program, from clone reports generated by the tools. We also did a visual inspection of OpenSSL code with comparisons to the code report for each tool. Our tool detected 5% more clone than Deckard s. This is in part due to use of scope detecting clones as a result clones that can be below the threshold are detected along with the scope. Our tool and Deckard detected approximately 60% of overlapping clones. Because of the difference in the way we detect clones, this number may be higher from what is reported. However, we expect that there will be difference between the two tools. All the three tools detect exact matches or slightly modified code such as identifier changes. Deckard takes clone detection further by reporting significantly modified code. The similarity can span more than one the spanning possibly can be partial. Please refer to [16] for sample clone pairs that demonstrate this. Deckard is capable of reporting better clones than CCFinder, in terms of similarity, however letting it detect clones spread across different scopes, possibly partial ones, may not result in clones that may reasonably be used for tasks such as refactoring. Our tool detects clones extending over more than one block. However, the similarity is examined at

7 Int'l Conf. Software Eng. Research and Practice SERP' different levels. For each scope, at the level or lowerlevel blocks, a similarity level is computed. Based on this criteria clones that meet the overall clone similarity level are included in the clone report but significantly modified code ( scopes) are reported explicitly as part of each code clone pair. Figure 4 shows a clone code pair taken from our tool to demonstrate this concept. Even though part of the code is significantly modified, as measured by the threshold level, it is reported as scope in the clone pair report. if (!noout &&!cert) if(outformat == FORMAT_ASN1) i=i2d_ssl_session_bio(out,x); if (outformat == FORMAT_PEM) i=pem_write_bio_ssl_session(out,x); BIO_printf(bio_err,"bad output format specified for outfile\n"); goto end; if (!i) BIO_printf(bio_err,"unable to write SSL_SESSION\n"); goto end; if (!noout) if (outformat == FORMAT_ASN1) i=i2d_dhparams_bio(out,dh); if (outformat == FORMAT_PEM) i=pem_write_bio_dhparams(out,dh); BIO_printf(bio_err,"bad output format specified for outfile\n"); goto end; if (!i) BIO_printf(bio_err,"unable to write DH parameters\n"); ERR_print_errors(bio_err); goto end; Fig.4 Clone with scope. The two code pairs as whole are reported as clone pairs. The last if statements pairs, in gray, are similar but did not pass the threshold criteria for the given scope; therefore, they are reported as scope as part of the overall clone report. This could be used to refactor the similar part of the code as a and leaving the significantly modified part as it is. This concept will be further explored in an extended version of this paper. 7. Conclusion We proposed and implemented a novel clone detection system. It is like AST based clone detection, such as Deckard [15], in that it builds a tree like structure. However, our tool does not build a complete AST. It just extracts scope data and tokens needed for the clone detection, using only one node type. Most of the clones that we detect may be used for software restructuring purposes as we are using scopes as boundaries for the clones. It efficiently detects clones that are not detected by CCFinder such as clones due to systematic changes in code. It does not also suffer from reporting clones spanning partial scopes. In addition, we expect our tool, to scale well for large systems based on its computational complexity and the fact that much of the analysis is readily implemented with concurrent processing of each file analyzed. In the future, we want to make use of the detection of clones for restructuring. The scope can also be examined to study program correctness modelling. We expect to use software dependency visualization to show clone distribution to plan restructuring [17]. 8. Related work There are many published clone detection papers. A detailed survey is provided in [7]. Based on their detection algorithms, clone detections are grouped into four categories. These are normalized line of code, parameter string matching, AST based, and program dependence graph based. In the following, a brief description of each category is provided. A sample work is also given Normalized lines of code This method removes white spaces and comments to do direct comparison of text at the line level [8]. This has a quadratic algorithmic complexity to detect clones as every line of a code fragment is compared with every line of the second code fragment. Even though it can be applied to multi-language source code, clone detection counts are affected by slight changes of code or even merging or splitting of lines of code Parameterized string matching This method uses a sequence of token streams as the data structure for detecting clones. It ignores white spaces and comments as in the normalized lines of code approach. Identifiers and names are replaced by place holders. A suffix tree is used to find sub strings for partial comparison. The comparison is done by tokenizing the whole file [1][9]. AST based methods This method uses an abstract syntax tree for detecting clones. The comparison may be done on blocks (open brace

8 200 Int'l Conf. Software Eng. Research and Practice SERP'18 to close brace) or s. However, this approach needs the AST to be generated which is a complex process. This is similar to our approach, but our tool builds a light weight tree, that avoids this complex AST. Deckard [15] is the most common AST based clone detection tool. It generates vectors to represent nodes in AST tree. It uses the vectors to detect clones relatively efficiently. Our approach uses scope tree instead of AST to detect clones Program dependence graphs This method uses the program dependence graph (PDG) as the underlying data structure for clone detection. Unlike the other approaches, clone detection based on PDG is not affected by statement swapping in a fragment of code. However, since it uses sub-graph comparison, which is an NPhard problem, it may not scale well. Our approach uses cyclomatic complexity ordered sub-tree comparison which grows linearly with the nodes. Token comparison is done at the node level. However, the worst-case performance is quadratic with the token size. Practically the comparison may be efficient as the tokens stored in each node is small. From existing clone detection techniques, CCFinder[1], CP-Miner[9] and Deckard[15] are among the most prominent tools. CCFinder and CP-Miner are token based. Whereas, Deckard is AST based. However, either they are not suited for parallelization or have some issues of accuracy. Based on our analysis, Deckard s performance is better than the others. In this paper, a light weight parser [11] is used to build a scope tree for all the files being analyzed. It has some similarity with AST based clone detections such as Deckard [15]. However, it does not extract detailed information like the AST does, thereby avoiding significant complexity. 9. References [1] Toshihiro Kamiya, S.Kusumoto, and K.Inoue. CCFinder: A multilinguistic token-based code clone detection system for large scale source code. Transactions on Software Engineering, 8(7): ,2002. [2] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, E. Merlo, Comparison and evaluation of clone detection tools, IEEE Transactions on Software Engineering 33 (9) (2007) [3] R. Koschke, Survey of Research on Software Clones, in Duplication, Redundancy, and Similarity in Software, Dagstuhl, Germany, [4] C.K. Roy, J.R. Cordy, A Survey on Software Clone Detection Research, Technical Report , Queen s University at Kingston Ontario, Canada, 2007, p [6] C.J. Kapser, M.W. Godfrey, Supporting the analysis of clones in software systems: a case study, Journal of Software Maintenance and Evolution: Research and Practice 18 (2) (2006) [7] Dhavleesh Rattan, Rajesh Bhatia, Maninder Singh, Software clone detection: A systematic review, Information and Software Technology, Volume 55, Issue 7, July 2013, Pages , ISSN [8] S. Ducasse, M. Rieger, and S. Demeyer, A language independent approach for detecting duplicated code, in Software Maintenance, (ICSM 99) Proceedings. IEEE International Conference on, 1999, pp [9] Z. Li, S. Lu, S. Myagmar, and Y. Zhou, CP-Miner: A Tool for Finding Copy-paste and Related Bugs in Operating System Code, in Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation - Volume 6, Berkeley, CA, USA, 2004, pp [10] T. J. McCabe, A Complexity Measure, IEEE Transactions on Software Engineering, vol. SE-2, no. 4, pp , Dec [11] Light Weight Parser, retrieved from ser.htm, December 2016 [12] Notepad++, retrieved from September [13] S. Dasgupta, C. Papadimitriou and U. Vazirani, Algorithms, Mc Graw Hill, [14] OpenSSL, retrieved from September [15] L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones, ICSE 07. [16] Support files, retrieved from Mohammed/CloneDetectionExamples/, April [17] M. Mohammed and James W. Fawcett, Package Dependency Visualization: Exploration and Rule Generation, The 23rd International Conference on Distributed Multimedia Systems, Visual Languages and Sentient Systems, 2017 [5] Cory J. Kapser, Toward an Understanding of Software Code Cloning as a Development Practice, PhD dissertation, University of Waterloo, 2009.

Token based clone detection using program slicing

Token based clone detection using program slicing Token based clone detection using program slicing Rajnish Kumar PEC University of Technology Rajnish_pawar90@yahoo.com Prof. Shilpa PEC University of Technology Shilpaverma.pec@gmail.com Abstract Software

More information

Detection of Non Continguous Clones in Software using Program Slicing

Detection of Non Continguous Clones in Software using Program Slicing Detection of Non Continguous Clones in Software using Program Slicing Er. Richa Grover 1 Er. Narender Rana 2 M.Tech in CSE 1 Astt. Proff. In C.S.E 2 GITM, Kurukshetra University, INDIA Abstract Code duplication

More information

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja

More information

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection of Potential

More information

Searching for Configurations in Clone Evaluation A Replication Study

Searching for Configurations in Clone Evaluation A Replication Study Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College

More information

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones Detection using Textual and Metric Analysis to figure out all Types of s Kodhai.E 1, Perumal.A 2, and Kanmani.S 3 1 SMVEC, Dept. of Information Technology, Puducherry, India Email: kodhaiej@yahoo.co.in

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

COMPARISON AND EVALUATION ON METRICS

COMPARISON AND EVALUATION ON METRICS COMPARISON AND EVALUATION ON METRICS BASED APPROACH FOR DETECTING CODE CLONE D. Gayathri Devi 1 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu dgayadevi@gmail.com Abstract

More information

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones

1/30/18. Overview. Code Clones. Code Clone Categorization. Code Clones. Code Clone Categorization. Key Points of Code Clones Overview Code Clones Definition and categories Clone detection Clone removal refactoring Spiros Mancoridis[1] Modified by Na Meng 2 Code Clones Code clone is a code fragment in source files that is identical

More information

An Effective Approach for Detecting Code Clones

An Effective Approach for Detecting Code Clones An Effective Approach for Detecting Code Clones Girija Gupta #1, Indu Singh *2 # M.Tech Student( CSE) JCD College of Engineering, Affiliated to Guru Jambheshwar University,Hisar,India * Assistant Professor(

More information

A Novel Technique for Retrieving Source Code Duplication

A Novel Technique for Retrieving Source Code Duplication A Novel Technique for Retrieving Source Code Duplication Yoshihisa Udagawa Computer Science Department, Faculty of Engineering Tokyo Polytechnic University Atsugi-city, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp

More information

Enhancing Source-Based Clone Detection Using Intermediate Representation

Enhancing Source-Based Clone Detection Using Intermediate Representation Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software

More information

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

Code duplication in Software Systems: A Survey

Code duplication in Software Systems: A Survey Code duplication in Software Systems: A Survey G. Anil kumar 1 Dr. C.R.K.Reddy 2 Dr. A. Govardhan 3 A. Ratna Raju 4 1,4 MGIT, Dept. of Computer science, Hyderabad, India Email: anilgkumar@mgit.ac.in, ratnaraju@mgit.ac.in

More information

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su Automatic Mining of Functionally Equivalent Code Fragments via Random Testing Lingxiao Jiang and Zhendong Su Cloning in Software Development How New Software Product Cloning in Software Development Search

More information

Towards the Code Clone Analysis in Heterogeneous Software Products

Towards the Code Clone Analysis in Heterogeneous Software Products Towards the Code Clone Analysis in Heterogeneous Software Products 11 TIJANA VISLAVSKI, ZORAN BUDIMAC AND GORDANA RAKIĆ, University of Novi Sad Code clones are parts of source code that were usually created

More information

ISSN: (PRINT) ISSN: (ONLINE)

ISSN: (PRINT) ISSN: (ONLINE) IJRECE VOL. 5 ISSUE 2 APR.-JUNE. 217 ISSN: 2393-928 (PRINT) ISSN: 2348-2281 (ONLINE) Code Clone Detection Using Metrics Based Technique and Classification using Neural Network Sukhpreet Kaur 1, Prof. Manpreet

More information

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014)

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Electronic Communications of the EASST Volume 63 (2014) Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Toward a Code-Clone Search through the Entire Lifecycle Position

More information

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Design Code Clone Detection System uses

More information

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files IEICE TRANS. INF. & SYST., VOL.E98 D, NO.2 FEBRUARY 2015 325 PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files Eunjong CHOI a), Nonmember, Norihiro YOSHIDA,

More information

On Refactoring for Open Source Java Program

On Refactoring for Open Source Java Program On Refactoring for Open Source Java Program Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 and Yoshio Kataoka 3 1 Graduate School of Information Science and Technology, Osaka University

More information

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented

More information

A Technique to Detect Multi-grained Code Clones

A Technique to Detect Multi-grained Code Clones Detection Time The Number of Detectable Clones A Technique to Detect Multi-grained Code Clones Yusuke Yuki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim Lecture 25 Clone Detection CCFinder Today s Agenda (1) Recap of Polymetric Views Class Presentation Suchitra (advocate) Reza (skeptic) Today s Agenda (2) CCFinder, Kamiya et al. TSE 2002 Recap of Polymetric

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

A Tree Kernel Based Approach for Clone Detection

A Tree Kernel Based Approach for Clone Detection A Tree Kernel Based Approach for Clone Detection Anna Corazza 1, Sergio Di Martino 1, Valerio Maggio 1, Giuseppe Scanniello 2 1) University of Naples Federico II 2) University of Basilicata Outline Background

More information

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo.

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo. DCC / ICEx / UFMG Software Code Clone Eduardo Figueiredo http://www.dcc.ufmg.br/~figueiredo Code Clone Code Clone, also called Duplicated Code, is a well known code smell in software systems Code clones

More information

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization 2017 24th Asia-Pacific Software Engineering Conference CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization Yuichi Semura, Norihiro Yoshida, Eunjong Choi and Katsuro Inoue Osaka University,

More information

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique 1 Syed MohdFazalulHaque, 2 Dr. V Srikanth, 3 Dr. E. Sreenivasa Reddy 1 Maulana Azad National Urdu University, 2 Professor,

More information

A Measurement of Similarity to Identify Identical Code Clones

A Measurement of Similarity to Identify Identical Code Clones The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department

More information

SourcererCC -- Scaling Code Clone Detection to Big-Code

SourcererCC -- Scaling Code Clone Detection to Big-Code SourcererCC -- Scaling Code Clone Detection to Big-Code What did this paper do? SourcererCC a token-based clone detector, that can detect both exact and near-miss clones from large inter project repositories

More information

Refactoring Support Based on Code Clone Analysis

Refactoring Support Based on Code Clone Analysis Refactoring Support Based on Code Clone Analysis Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1 and Katsuro Inoue 1 1 Graduate School of Information Science and Technology, Osaka University, Toyonaka,

More information

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON,

More information

DETECTING SIMPLE AND FILE CLONES IN SOFTWARE

DETECTING SIMPLE AND FILE CLONES IN SOFTWARE DETECTING SIMPLE AND FILE CLONES IN SOFTWARE *S.Ajithkumar, P.Gnanagurupandian, M.Senthilvadivelan, Final year Information Technology **Mr.K.Palraj ME, Assistant Professor, ABSTRACT: The objective of this

More information

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar,

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, 2 Head of Department, Department of Computer Science & Engineering, Universal Institute of Engineering

More information

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.

More information

Sub-clones: Considering the Part Rather than the Whole

Sub-clones: Considering the Part Rather than the Whole Sub-clones: Considering the Part Rather than the Whole Robert Tairas 1 and Jeff Gray 2 1 Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 2 Department

More information

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT Ginika Mahajan* Ashima** ABSTRACT Software systems are evolving by adding new functions and modifying existing functions over time. Through the evolution,

More information

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Study of Different

More information

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu Deckard: Scalable and Accurate Tree-based Detection of Code Clones Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu The Problem Find similar code in large code bases, often referred to as

More information

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Accuracy Enhancement in Code Clone Detection Using Advance Normalization Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research

More information

Similar Code Detection and Elimination for Erlang Programs

Similar Code Detection and Elimination for Erlang Programs Similar Code Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code smell

More information

Incremental Clone Detection and Elimination for Erlang Programs

Incremental Clone Detection and Elimination for Erlang Programs Incremental Clone Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code

More information

KClone: A Proposed Approach to Fast Precise Code Clone Detection

KClone: A Proposed Approach to Fast Precise Code Clone Detection KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3

More information

Detection and Behavior Identification of Higher-Level Clones in Software

Detection and Behavior Identification of Higher-Level Clones in Software Detection and Behavior Identification of Higher-Level Clones in Software Swarupa S. Bongale, Prof. K. B. Manwade D. Y. Patil College of Engg. & Tech., Shivaji University Kolhapur, India Ashokrao Mane Group

More information

Software Clone Detection and Refactoring

Software Clone Detection and Refactoring Software Clone Detection and Refactoring Francesca Arcelli Fontana *, Marco Zanoni *, Andrea Ranchetti * and Davide Ranchetti * * University of Milano-Bicocca, Viale Sarca, 336, 20126 Milano, Italy, {arcelli,marco.zanoni}@disco.unimib.it,

More information

Taxonomy Dimensions of Complexity Metrics

Taxonomy Dimensions of Complexity Metrics 96 Int'l Conf. Software Eng. Research and Practice SERP'15 Taxonomy Dimensions of Complexity Metrics Bouchaib Falah 1, Kenneth Magel 2 1 Al Akhawayn University, Ifrane, Morocco, 2 North Dakota State University,

More information

Problematic Code Clones Identification using Multiple Detection Results

Problematic Code Clones Identification using Multiple Detection Results Problematic Code Clones Identification using Multiple Detection Results Yoshiki Higo, Ken-ichi Sawa, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, 1-5, Yamadaoka,

More information

LEXIMET: A Lexical Analyzer Generator including McCabe's Metrics.

LEXIMET: A Lexical Analyzer Generator including McCabe's Metrics. IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661, p- ISSN: 2278-8727Volume 16, Issue 1, Ver. VI (Feb. 2014), PP 11-18 LEXIMET: A Lexical Analyzer Generator including McCabe's Metrics.

More information

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Jeffrey Svajlenko Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Saskatoon,

More information

CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming

CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming Daqing Hou, Patricia Jablonski, and Ferosh Jacob Electrical and Computer Engineering, Clarkson University, Potsdam,

More information

Code Clone Detector: A Hybrid Approach on Java Byte Code

Code Clone Detector: A Hybrid Approach on Java Byte Code Code Clone Detector: A Hybrid Approach on Java Byte Code Thesis submitted in partial fulfillment of the requirements for the award of degree of Master of Engineering in Software Engineering Submitted By

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY Yoshihisa Udagawa Faculty of Engineering, Tokyo Polytechnic University, Atsugi City, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp ABSTRACT Duplicate code

More information

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE

FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE TED (10)-3071 Reg. No.. (REVISION-2010) Signature. FORTH SEMESTER DIPLOMA EXAMINATION IN ENGINEERING/ TECHNOLIGY- OCTOBER, 2012 DATA STRUCTURE (Common to CT and IF) [Time: 3 hours (Maximum marks: 100)

More information

Packet Classification Using Dynamically Generated Decision Trees

Packet Classification Using Dynamically Generated Decision Trees 1 Packet Classification Using Dynamically Generated Decision Trees Yu-Chieh Cheng, Pi-Chung Wang Abstract Binary Search on Levels (BSOL) is a decision-tree algorithm for packet classification with superior

More information

Compiling clones: What happens?

Compiling clones: What happens? Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca

More information

CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH

CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH CONVERTING CODE CLONES TO ASPECTS USING ALGORITHMIC APPROACH by Angad Singh Gakhar, B.Tech., Guru Gobind Singh Indraprastha University, 2009 A thesis submitted to the Faculty of Graduate and Postdoctoral

More information

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON, Canada K7L 3N6 {croy,

More information

Code Clone Detection Technique Using Program Execution Traces

Code Clone Detection Technique Using Program Execution Traces 1,a) 2,b) 1,c) Code Clone Detection Technique Using Program Execution Traces Masakazu Ioka 1,a) Norihiro Yoshida 2,b) Katsuro Inoue 1,c) Abstract: Code clone is a code fragment that has identical or similar

More information

Parser Design. Neil Mitchell. June 25, 2004

Parser Design. Neil Mitchell. June 25, 2004 Parser Design Neil Mitchell June 25, 2004 1 Introduction A parser is a tool used to split a text stream, typically in some human readable form, into a representation suitable for understanding by a computer.

More information

Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools

Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools , pp. 31-50 http://dx.doi.org/10.14257/ijseia.2017.11.3.04 Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools Harpreet Kaur 1 * (Assistant Professor) and Raman

More information

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler

From Whence It Came: Detecting Source Code Clones by Analyzing Assembler From Whence It Came: Detecting Source Code Clones by Analyzing Assembler Ian J. Davis and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo Waterloo, Ontario, Canada

More information

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing. Basic Concepts Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition

More information

File Structures and Indexing

File Structures and Indexing File Structures and Indexing CPS352: Database Systems Simon Miner Gordon College Last Revised: 10/11/12 Agenda Check-in Database File Structures Indexing Database Design Tips Check-in Database File Structures

More information

Software Clone Detection Using Cosine Distance Similarity

Software Clone Detection Using Cosine Distance Similarity Software Clone Detection Using Cosine Distance Similarity A Dissertation SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE & ENGINEERING

More information

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching

Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching Enhancing Program Dependency Graph Based Clone Detection Using Approximate Subgraph Matching A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF

More information

Visualization of Clone Detection Results

Visualization of Clone Detection Results Visualization of Clone Detection Results Robert Tairas and Jeff Gray Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham, AL 5294-1170 1-205-94-221 {tairasr,

More information

Chapter 12: Indexing and Hashing

Chapter 12: Indexing and Hashing Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

A Survey of Software Clone Detection Techniques

A Survey of Software Clone Detection Techniques A Survey of Software Detection Techniques Abdullah Sheneamer Department of Computer Science University of Colorado at Colo. Springs, USA Colorado Springs, USA asheneam@uccs.edu Jugal Kalita Department

More information

Gapped Code Clone Detection with Lightweight Source Code Analysis

Gapped Code Clone Detection with Lightweight Source Code Analysis Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Clone code detector using Boyer Moore string search algorithm integrated with ontology editor

Clone code detector using Boyer Moore string search algorithm integrated with ontology editor EUROPEAN ACADEMIC RESEARCH Vol. IV, Issue 2/ May 2016 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Clone code detector using Boyer Moore string search algorithm integrated

More information

MeCC: Memory Comparisonbased Clone Detector

MeCC: Memory Comparisonbased Clone Detector MeCC: Memory Comparisonbased Clone Detector Heejung Kim 1, Yungbum Jung 1, Sunghun Kim 2, and Kwangkeun Yi 1 1 Seoul National University 2 The Hong Kong University of Science and Technology http://ropas.snu.ac.kr/mecc/

More information

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; } Ex: The difference between Compiler and Interpreter The interpreter actually carries out the computations specified in the source program. In other words, the output of a compiler is a program, whereas

More information

Code Clone Detection on Specialized PDGs with Heuristics

Code Clone Detection on Specialized PDGs with Heuristics 2011 15th European Conference on Software Maintenance and Reengineering Code Clone Detection on Specialized PDGs with Heuristics Yoshiki Higo Graduate School of Information Science and Technology Osaka

More information

On the Robustness of Clone Detection to Code Obfuscation

On the Robustness of Clone Detection to Code Obfuscation On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de

More information

Code Similarity Detection by Program Dependence Graph

Code Similarity Detection by Program Dependence Graph 2016 International Conference on Computer Engineering and Information Systems (CEIS-16) Code Similarity Detection by Program Dependence Graph Zhen Zhang, Hai-Hua Yan, Xiao-Wei Zhang Dept. of Computer Science,

More information

An Information Retrieval Process to Aid in the Analysis of Code Clones

An Information Retrieval Process to Aid in the Analysis of Code Clones An Information Retrieval Process to Aid in the Analysis of Code Clones Robert Tairas Jeff Gray Abstract The advent of new static analysis tools has automated the searching for code clones, which are duplicated

More information

Intro to DB CHAPTER 12 INDEXING & HASHING

Intro to DB CHAPTER 12 INDEXING & HASHING Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing

More information

A Study on A Tool to Suggest Similar Program Element Modifications

A Study on A Tool to Suggest Similar Program Element Modifications WASEDA UNIVERSITY Graduate School of Fundamental Science and Engineering A Study on A Tool to Suggest Similar Program Element Modifications A Thesis Submitted in Partial Fulfillment of the Requirements

More information

A Weighted Layered Approach for Code Clone Detection

A Weighted Layered Approach for Code Clone Detection Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 12, December 2014,

More information

Comparison of Online Record Linkage Techniques

Comparison of Online Record Linkage Techniques International Research Journal of Engineering and Technology (IRJET) e-issn: 2395-0056 Volume: 02 Issue: 09 Dec-2015 p-issn: 2395-0072 www.irjet.net Comparison of Online Record Linkage Techniques Ms. SRUTHI.

More information

An Approach to Detect Clones in Class Diagram Based on Suffix Array

An Approach to Detect Clones in Class Diagram Based on Suffix Array An Approach to Detect Clones in Class Diagram Based on Suffix Array Amandeep Kaur, Computer Science and Engg. Department, BBSBEC Fatehgarh Sahib, Punjab, India. Manpreet Kaur, Computer Science and Engg.

More information

A Simple Syntax-Directed Translator

A Simple Syntax-Directed Translator Chapter 2 A Simple Syntax-Directed Translator 1-1 Introduction The analysis phase of a compiler breaks up a source program into constituent pieces and produces an internal representation for it, called

More information

Er. Himanshi Vashisht, Sanjay Bharadwaj, Sushma Sharma

Er. Himanshi Vashisht, Sanjay Bharadwaj, Sushma Sharma International Journal Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 8 ISSN : 2456-3307 DOI : https://doi.org/10.32628/cseit183833 Impact

More information

Identification of Structural Clones Using Association Rule and Clustering

Identification of Structural Clones Using Association Rule and Clustering Identification of Structural Clones Using Association Rule and Clustering Dr.A.Muthu Kumaravel Dept. of MCA, Bharath University, Chennai-600073, India ABSTRACT: Code clones are similar program structures

More information

Comparing Multiple Source Code Trees, version 3.1

Comparing Multiple Source Code Trees, version 3.1 Comparing Multiple Source Code Trees, version 3.1 Warren Toomey School of IT Bond University April 2010 This is my 3 rd version of a tool to compare source code trees to find similarities. The latest algorithm

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree

More information

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL

More information

How Much Logic Should Go in an FPGA Logic Block?

How Much Logic Should Go in an FPGA Logic Block? How Much Logic Should Go in an FPGA Logic Block? Vaughn Betz and Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto Toronto, Ontario, Canada M5S 3G4 {vaughn, jayar}@eecgutorontoca

More information

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1 Defining Program Syntax Chapter Two Modern Programming Languages, 2nd ed. 1 Syntax And Semantics Programming language syntax: how programs look, their form and structure Syntax is defined using a kind

More information

CS134 Spring 2005 Final Exam Mon. June. 20, 2005 Signature: Question # Out Of Marks Marker Total

CS134 Spring 2005 Final Exam Mon. June. 20, 2005 Signature: Question # Out Of Marks Marker Total CS134 Spring 2005 Final Exam Mon. June. 20, 2005 Please check your tutorial (TUT) section from the list below: TUT 101: F 11:30, MC 4042 TUT 102: M 10:30, MC 4042 TUT 103: M 11:30, MC 4058 TUT 104: F 10:30,

More information

Undergraduate Compilers in a Day

Undergraduate Compilers in a Day Question of the Day Backpatching o.foo(); In Java, the address of foo() is often not known until runtime (due to dynamic class loading), so the method call requires a table lookup. After the first execution

More information

Database System Concepts

Database System Concepts Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Clone Detection via Structural Abstraction

Clone Detection via Structural Abstraction Software Quality Journal manuscript No. (will be inserted by the editor) Clone Detection via Structural Abstraction William S. Evans Christopher W. Fraser Fei Ma Received: date / Accepted: date Abstract

More information

Parallel and Distributed Code Clone Detection using Sequential Pattern Mining

Parallel and Distributed Code Clone Detection using Sequential Pattern Mining Parallel and Distributed Code Clone Detection using Sequential Pattern Mining Ali El-Matarawy Faculty of Computers and Information, Cairo University Mohammad El-Ramly Faculty of Computers and Information,

More information