Detecting and Analyzing Code Clones in HDL

Size: px
Start display at page:

Download "Detecting and Analyzing Code Clones in HDL"

Transcription

1 Detecting and Analyzing Code Clones in HDL Kyohei Uemura, Akira Mori, Kenji Fujiwara, Eunjong Choi, and Hajimu Iida Nara Institute of Science and Technology, Japan National Institute of Advanced Industrial Science and Technology, Japan National Institute of Technology, Toyota College, Japan Abstract In this paper, we study code clones in hardware description languages (HDLs) in comparison with general programming languages. For this purpose, we have developed a method for detecting code clones in Verilog HDL. A key idea of the proposed method is to convert the Verilog HDL code into the pseudo C++ code, which is then processed by an existing code clone detector for C++. We conducted an experiment on 10 open source hardware products described in Verilog HDL, where we succeeded in detecting nearly 1,800 clone sets with approximately 80% precision. We compared code clones in Verilog HDL with those in Java/C based on the metrics to identify the differences among languages. We identified patterns on how code clones are created in Verilog HDL, which include cases for increasing stability and capability of parallel processing of the circuit. I. INTRODUCTION Recently, there have been studies on hardware based computation, in addition to software development on general-purpose CPU, for improving efficiency and saving energy consumption [1], [2]. The field-programmable gate array (FPGA) is such hardware that can rewrite the structure of the circuit. In general, the structure of circuits on FPGAs is defined by hardware description languages (HDLs) such as VHDL and Verilog HDL. The number of logic gates and clock frequency of FPGAs are growing. An FPGA with a large number of gates can constitute a single complex circuit. By improving performances of FPGAs, the usage of FPGAs is expanding. For example, Bing is an internet search engine operated by Microsoft using FPGA clusters [3]. However, the development of complex circuits takes a long time for validation and bug fixes. The growing development period will become more problematic in the future. HDLs have characteristics similar to general programming languages. Applying software engineering methods to HDLs is important for improving efficiencies of circuit developments. Sudakrishnan et al. investigated the patterns of bug fixes in hardware development projects written in Verilog HDL [4]. Duley et al. proposed a differencing algorithm for Verilog HDL [5]. Analysis of the structures of the HDL code is not studied in depth as far as we know. In this paper, we focus on code clones in the HDL code. A code clone is a duplicated code fragment in the source code. In general, a code clone is generated by a copy-andpaste operation when a developer adds a new feature similar to an existing feature [6], [7]. Although reusing code fragments facilitates software development, code clones are said to be one of the causes to reduce the maintainability of the source code [8], [9]. Code clone detection techniques [10], [11], [12] and code clone management techniques have been proposed [13], [14], [15]. Our contributions in this paper are as follows. Firstly, we developed a code clone detection method for Verilog HDL which is the most popular HDL. To the best of our knowledge, this is the first reported method on the subject. In the method, we use CCFinderX [10], [16] for detecting code clones. CCFinderX only can process the source code of C/C++, Java, C#, and COBOL. We proposed rules for converting the Verilog HDL code into the pseudo C++ code. Since the pseudo C++ code can be parsed by CCFinderX, CCFinderX can detect code clones in the pseudo C++ code. We conducted an experiment for 10 open source hardware products described in Verilog HDL, where we succeeded in detecting 1,798 clone sets. Secondly, we analyzed the differences between code clones in general programming languages and Verilog HDL. We studied 10 products written in Verilog HDL, 5 in C, and 5 in Java. By comparing the metrics, we discovered that the code clones in Verilog HDL are longer and more complex than those in C/Java. In addition, the ratio of code clones in Verilog HDL is higher than the ratio in C/Java. We also discovered that the code clone in Verilog HDL are created for mitigating problems peculiar to circuits. For example, bit operations and reset circuits are described by similar statements and blocks. The rest of the paper is organized as follows. Section 2 describes the overview of Verilog HDL. Section 3 describes the proposed method. Section 4 describes the investigation design. Section 5 shows the results of the experiment. Section 6 discusses the results and threats to validity. Section 7 introduces the related work. Finally, section 8 concludes with future work. II. OVERVIEW OF VERILOG HDL A hardware description language (HDL) is a language for defining the structure of circuits. HDLs are typically used for developing FPGAs and application specific integrated circuits (ASICs). Verilog HDL is the most popular HDL. In Verilog HDL, a module is a base unit to describe a circuit. In hardware development, developers define a module for each

2 Verilog HDL Code List of Code Clone Convertor Conversion Rules Fig. 1: Overview of Proposed Method Pseudo C++ Code CCFinderX function and connect modules to construct the whole circuit. A module contains definitions of input/output ports, wires, registers, and the behaviors of the circuits. The behaviors of the circuits are described by combinatorial circuits and sequential circuits. A combinatorial circuit is defined by sets of assignment statements and function declarations. A sequential circuit is defined by always blocks which describe events that should happen under certain conditions. A module may refer to other modules for specifying circuits that have already been defined. III. PROPOSED METHOD This section describes our proposed code clone detection method for Verilog HDL. Section III-A gives an overview and Section III-B describes the rules for converting the Verilog HDL code into the pseudo C++ code. A. Overview Fig. 1 shows an overview of the proposed method. In the method, the Verilog HDL code is converted into the pseudo C++ code, which is then processed by CCFinderX. CCFinderX parses the source code to identify syntactic units such as functions and statements. However, it does not perform strict parsing to construct the entire abstract syntax trees (ASTs). Therefore, we can use CCFinderX for detecting clones in the pseudo C++ code that may not follow the grammar of C++. Although there are tools that convert Verilog HDL into C++ [17], [18], they are not appropriate for our approach. These tools are designed for simulating a circuit written in Verilog HDL as a program and does not preserve the structure of the original code. It is not meaningful to detect code clones in the converted code when the original structure is lost. Even if we detect code clones in the converted C++ code, there may not be matching instances in the original Verilog HDL code. In our proposed method, the Verilog HDL code is converted into the pseudo C++ code in a way that the original structure is preserved. The approach is simpler and easier than developing a parser of CCFinderX for Verilog HDL since the conversion rules can be described by simple regular expressions. We believe that the proposed method can be easily applied to other languages than Verilog HDL. B. Conversion Rules Table I shows the proposed conversion rules. A Verilog HDL module is translated to a C++ class constructor whose List 1: Sample Code of Verilog HDL 1 module sample (ck, j, k, q, rb, sb ); 2 input ck, j, k, rb, sb; 3 output q; 4 reg q; 5 6 posedge ck or negedge rb or negedge sb ) begin 7 if ( rb == 1 b0 ) 8 q <= 1 b0; 9 else if ( sb == 1 b0 ) 10 q <= 1 b1; 11 else begin 12 case ( { j, k } ) 13 2 b00 : q <= q; 14 2 b01 : q <= 1 b0; 15 2 b10 : q <= 1 b1; 16 2 b11 : q <= q; 17 endcase 18 end 19 end 20 endmodule List 2: Sample Code of Pseudo C++ 1 class sample { sample (ck, j, k, q, rb, sb ) { 2 input ck, j, k, rb, sb; 3 output q; 4 reg q; 5 6 posedge ck or negedge rb or negedge sb ) { 7 if ( rb == 1 b0 ) 8 q <= 1 b0; 9 else if ( sb == 1 b0 ) 10 q <= 1 b1; 11 else { 12 case ( { j, k } ) { 13 2 b00 : q <= q; 14 2 b01 : q <= 1 b0; 15 2 b10 : q <= 1 b1; 16 2 b11 : q <= q; 17 } 18 } 19 } 20 }}; arguments are the original input/output ports. Always and case blocks are translated to C++ function definitions and if and else statements are translated to the similar statements in a straight-forward manner. List 1 and List 2 show an example of conversion by applying the rules. As we mentioned earlier, the converted pseudo C++ code is not compilable but can be parsed and processed by CCFinderX. At the first line in List 2, a module definition is replaced by a definition of a class and a constructor. CCFinderX recognizes a port list as an argument list of a constructor. A typical definition of an argument consists of the type and the name of the argument. Although the converted code does not contain type information, it does not cause any problem because the CCFinderX parser is not strict. The beginning of the always block at the sixth line and the beginning of the case block at the twelfth line are regarded as function definitions. CCFinderX outputs the position information of each clone set and calculates the metrics for overall clone sets. The position information consists of file paths, start token/line numbers and end token/line numbers. The metrics are classified into file metrics and clone metrics. Since our conversion keeps the structure and line numbers of the original code, we can just use the outputs of CCFinderX for inspecting the cloned code regions. This is a very convenient feature of our approach. The proposed conversion rules only replace the word which will be tokenized by the CCFinderX parser. This means that

3 TABLE I: CONVERSION RULES Original Source Code Converted Source Code module < ModuleName > ( < Port List > ) begin < Description of Module > endmodule class < Module Name > { < Module Name > ( < Port List > ) { < Description of Module > }}; < Condition > ) begin < Description of Always-Statement < Condition > ) { < Description of Always-Statement > } > end if ( < Condition > ) < Description of If-Statement > end if (condition) { < Description of If-Statement > } else < Description of If-Statement > end else { < Description of If-Statement > } case ( < Condition > ) < Description of Case-Statement > end case ( < Condition > ) { < Description of Case-Statement > } TABLE II: SELECTED VERILOG HDL SUBJECT PRODUCTS Product Name #Files #LoC mor1kx 48 22,335 ridecore 59 12,726 sd card mass storage controller 40 15,807 ethernet tri mode 43 43,153 nova 52 25,826 jpegencode 21 34,808 sdr ctrl 16 10,552 hpdmc 22 5,738 aes highthroughput lowarea 12 2,362 aes-encryption 15 6,064 the conversion preserves the token level structures. Since CCFinderX is a token-based code clone detector, the proposed conversion does not create code clones which do not exist in the original code, nor misses the code clones which do exist in the original code. C. Evaluations of the Proposed Method In this section, we present evaluations of the proposed method. Table II shows the selected products for our evaluations. #Files represents the number of files. #LoC represents the total number of lines including empty lines and comment lines. We chose 9 Verilog HDL products from the Open Cores 1 and one Verilog HDL product from GitHub 2. The Open Cores is a community for open source hardware and classifies circuits into several categories such as Processor, Communication Controller, Video Controller, Memory Core, and Crypto Core. We chose two products from each category exept for the Processors. For the category of Processors, we chose mor1kx from OpenCores and ridecore from Github. The ridecore is one of the several implementations of RISC-V which is a project for a novel open RISC processor. Companies such as Google, Oracle, and Hewlett Packard participate in this project. In all products, the status of the development is stable. We detected code clones by three methods including the proposed method. Every method uses CCFinderX. Method-1 (the proposed method): Detecting code clones in the converted pseudo C++ code by the C/C++ mode. Method-2: Detecting code clones in the original Verilog HDL code by the C/C++ mode Method-3: Detecting code clones in the original Verilog HDL code as plaintext by the plaintext mode. Table III shows the detection results of the above three methods. In the mode of detecting as the C/C++ code, CCFinderX excludes empty lines, comments, and simple definitions such as consecutive variable declarations. LoC refers to the total number of lines. SLoC refers to the number of lines which are the target of detection. In this experiment, we delete empty lines and comment lines. Therefore, the difference between LoC and SLoC represents the number of lines which are excluded by CCFinderX as simple definitions. CLoC refers to the number of lines which have clone fragments. CVR is the ratio of lines including the clone fragments. LEN denotes the number of tokens in the source code. This value is measured by excluding simple definitions recognized by CCFinderX. #Clone set is the number of detected clone set. The method-1 (the proposed method) detected more clone sets than the method-2. In addition, SLoC and LEN of the method-2 are smaller than those of the other methods. It means that CCFinderX failed to parse the Verilog HDL code and excluded many fragments from the detection target in the method-2. In the method-1, by converting the Verilog HDL code into the pseudo C++ code, CCFinderX could correctly tokenize and detect more code clones. While detecting as plaintext, CCFinderX detected similar token lists from the entire code including comments and simple definitions. If the detection ability had been equal, the method-3 should have detected more code clones because the method-3 targets simple definitions, too. However, the method-3 has a lower number of detected clone sets and lower coverage (CVR) than the method-1. In tokenizing plaintext, CCFinderX divides texts by characters such as space, - and. Thus, an identifier named by snake case is divided into two or more different tokens. Due to tokenization failures, the method-3 has a lower ability of detection than the method-1. IV. INVESTIGATION DESIGN This section explains our approach to code clone analysis for Verilog HDL. Section IV-A presents research questions and motivations behind them. Section IV-B shows subjects and procedures of the investigation. A. Motivations of Research Question In this study, we conducted an experiment to answer the following research questions.

4 TABLE III: COMPARISON OF METHODS Method LoC SLoC CLoC CVR LEN #Clone Set Method-1 (proposed method) 144, ,365 47, ,547 1,798 Method-2 (not converted, as C/C++) 144,344 46,843 23, ,166 1,066 Method-3 (not converted, as plaintext) 144, ,344 45, ,634,703 1,610 RQ1: RQ2: Metrics LEN CLN NBR RSA RSI CVR RNR Metrics LEN POP NIF RAD RNR TKS LOOP COND TABLE IV: FILE METRICS Definition The number of tokens in a file. The number of clone fragments included in the file. The number of files which have clone pairs to other files. The ratio of tokens that are clone fragments between other files. The ratio of tokens that are clone fragments within only same file. The ratio of tokens that are clone fragments. The ratio of non repeated code. TABLE V: CLONE SET METRICS Definition The length (number of tokens) of the clone fragment. The number of clone fragments included in the clone set. The number of files which have one or more clone fragments included in the clone set. The range of clone fragments in the directory tree. The ratio of tokens which are not included in a repeated fragment. The number of kinds of the token included in the clone set. The number of loops included in the clone fragment. The number of conditional branches included in the clone fragment. What is the difference between general programming languages and Verilog HDL with regard to code clones? Why are code clones created in Verilog HDL? RQ1 and RQ2 intend to clarify the characteristics of code clones in Verilog HDL, which is important to determine approaches for supporting code clone management in HDL. For RQ1, we focus on the clone metrics offered by CCFinderX. For RQ2, we examine detected instances to identify the situations where code clones are created. B. Procedure of Investigation For answering RQ1, we investigated code clone metrics and file metrics calculated by GemX, a GUI frontend of CCFinderX [19]. Table IV and Table V explain the meaning of the metrics. We focused on the difference of the metrics between Verilog HDL and general programming languages. As a general programming language, we selected two languages with different abstraction levels, C and Java. TABLE VI: SELECTED C/JAVA SUBJECT PRODUCTS Language Product Name #Files #LoC git ,480 C How-to-Make- a-computer ,947 Operating-System Linux 23,485 15,870,993 netdata 53 33,883 redis ,296 elasticsearch 5, ,983 okhttp ,144 Java react-native ,455 retrofit ,304 RxJava ,685 For answering RQ2, we analyzed detected code clones in Verilog HDL. For this, we choose instances that are peculiar in terms of differences in the metrics. For comparison, we collected 10 Verilog HDL products shown in TableII and the top 5 C/Java products in the numbers of stars in GitHub as shown in TableVI. V. INVESTIGATION RESULTS In this section, we show the investigation results on the code clone characteristics in Verilog HDL. A. Quantitative Analysis Table VII show the file metrics and clone-set metrics, respectively. The values are the averages of the medians of each product. Let us look at the file metrics for analysis. Verilog HDL has the largest number of clone fragments per file. In addition, Verilog HDL has the highest coverage of code clones occupying files. On the other hand, the NBR values indicate that clone pairs are less distributed across files in Verilog HDL. One should be warned, however, that the result may have been affected by the fact that the Verilog HDL products we chose do not include large numbers of files. Now let us turn to the clone metrics for analysis. Verilog HDL has the least number of clone sets, but the value of POP (the number of code clones contained in one clone set) is the largest. In addition, Verilog HDL has the largest length in the clone fragment. The COND values that indicate code complexity, in Verilog HDL and C are significantly larger than those in Java. From these results, it can be said that the code in Verilog HDL products tends to be copied to multiple places in a big chunk across relatively fewer files. In addition, these code clones contain large numbers of conditional branches such as if or case sentences and hence are logically complex.

5 File Metrics TABLE VII: FILE AND CLONE SET METRICS Clone Set Metrics B. Qualitative Analysis Verilog HDL C Java LEN CLN NBR RSA RSI CVR RNR #Clone Set LEN POP NIF RAD RNR TKS LOOP COND Let us examine detected code clone instances in Verilog HDL products based on the metrics values in comparison with Java and C to identify frequent code clone patterns in Verilog HDL. In the following sections, we analyze code clone instances classified by syntactical units. 1) Assign statements: Code clones involving assignment statements are frequently found in Verilog HDL. For example, in a code fragment shown in List 3, the rows of consecutive non-blocking assignments are detected as a clone set. An assign statement shown in List 4 also appears in different files. These code clones have high POP values and low RNR values. Bit array operations are usually modeled by lists of similar assignments. Developers often copy-and-paste such a code fragment for convenience. However, it can cause defects when variable names are inconsistently changed. In general, one has to edit all clone fragments when the original statements are changed. One may use macros to avoid such inconvenience. In addition, tools for consistent renaming identifiers are proposed [13]. 2) Always block: List 5 contains an example of code clones having always blocks. An always block is a syntactic unit that describes a circuit that is executed asynchronously in accordance with the rising/falling edge of a signal. A circuit for reset is usually defined by an always block and is an obvious cause of code clones. These code clones also have high POP values and low RNR values. 3) If/case block: A circuit that changes its output depending upon a state of registers is common. The behavior of such a circuit is usually defined by conditional branches. There are many such code clones involving if/case statements as shown in List 6. This explains why the COND values for Verilog HDL products are much higher than those for Java products. These code clones can be merged by rearranging conditional expressions. However, there is a side effect to it. A degree of parallel processing of the circuit may decrease. Also, the operation of the circuit may become unstable since there may be signals such as the reset signal that should not be used in logical operations with other signals. List 3: nova/src/bs decoding.v 1 mvx H0 diff a <= (MB inter size!= I8x8)? 0:mvx CurrMb2[7:0]; 2 mvx H0 diff b <= (MB inter size!= I8x8)? 0:mvx CurrMb2[23:16]; 3 mvx H1 diff a <= (MB inter size!= I8x8)? 0:mvx CurrMb2[15:8]; 4 mvx H1 diff b <= (MB inter size!= I8x8)? 0:mvx CurrMb2[31:24]; 5 mvx H2 diff a <= (MB inter size!= I8x8)? 0:mvx CurrMb3[7:0]; 6 mvx H2 diff b <= (MB inter size!= I8x8)? 0:mvx CurrMb3[23:16]; 7 mvx H3 diff a <= (MB inter size!= I8x8)? 0:mvx CurrMb3[15:8]; 8 mvx H3 diff b <= (MB inter size!= I8x8)? 0:mvx CurrMb3[31:24]; 9 10 mvy H0 diff a <= (MB inter size!= I8x8)? 0:mvy CurrMb2[7:0]; 11 mvy H0 diff b <= (MB inter size!= I8x8)? 0:mvy CurrMb2[23:16]; 12 mvy H1 diff a <= (MB inter size!= I8x8)? 0:mvy CurrMb2[15:8]; 13 mvy H1 diff b <= (MB inter size!= I8x8)? 0:mvy CurrMb2[31:24]; 14 mvy H2 diff a <= (MB inter size!= I8x8)? 0:mvy CurrMb3[7:0]; 15 mvy H2 diff b <= (MB inter size!= I8x8)? 0:mvy CurrMb3[23:16]; 16 mvy H3 diff a <= (MB inter size!= I8x8)? 0:mvy CurrMb3[15:8]; 17 mvy H3 diff b <= (MB inter size!= I8x8)? 0:mvy CurrMb3[31:24]; List 4: mor1kx/rtl/verilog/mor1kx lsu cappuccino.v 1 assign next dbus adr = (OPTION DCACHE BLOCK WIDTH == 5)? 2 {dbus adr[31:5], dbus adr[4:0] + 5 d4} : // 32 byte 3 {dbus adr[31:4], dbus adr[3:0] + 4 d4}; // 16 byte List 5: aes-encryption/aes 10cycle 10stage/aes cipher top.v 1 clk) 2 begin 3 text out stage9[127:120] <= sa00 next round10; 4 text out stage9[095:088] <= sa01 next round10; 5 text out stage9[063:056] <= sa02 next round10; 6 text out stage9[031:024] <= sa03 next round10; 7 text out stage9[119:112] <= sa10 next round10; 8 text out stage9[087:080] <= sa11 next round10; 9 text out stage9[055:048] <= sa12 next round10; 10 text out stage9[023:016] <= sa13 next round10; 11 text out stage9[111:104] <= sa20 next round10; 12 text out stage9[079:072] <= sa21 next round10; 13 text out stage9[047:040] <= sa22 next round10; 14 text out stage9[015:008] <= sa23 next round10; 15 text out stage9[103:096] <= sa30 next round10; 16 text out stage9[071:064] <= sa31 next round10; 17 text out stage9[039:032] <= sa32 next round10; 18 text out stage9[007:000] <= sa33 next round10; end List 6: sdr ctrl/verif/model/mt48lc8m8a2.v 1 if ((Auto precharge[0] == 1 b1) && (Read precharge[0] == 1 b1)) begin 2 if ((($time RAS chk0 >= tras) && // Case 2 3 ((Burst length 1 == 1 b1 && Count precharge[0] >= 1) // Case 1 4 (Burst length 2 == 1 b1 && Count precharge[0] >= 2) 5 (Burst length 4 == 1 b1 && Count precharge[0] >= 4) 6 (Burst length 8 == 1 b1 && Count precharge[0] >= 8))) 7 (RW interrupt read[0] == 1 b1)) begin // Case 3 8 Pc b0 = 1 b1; 9 Act b0 = 1 b0; 10 RP chk0 = $time; 11 Auto precharge[0] = 1 b0; 12 Read precharge[0] = 1 b0; 13 RW interrupt read[0] = 1 b0; 14 if (Debug) $display ("at time %t NOTE : Start Internal Auto Precharge for Bank 0", $time); 15 end 16 end 4) Module/file: Files or module blocks are frequently cloned in Verilog HDL. This explains why the length of code clones tends to be longer and the coverage of code clones tends to be higher in Verilog HDL products than in C/Java products. Some of these code clones are created for increasing the degree of parallel processing. However, most of these are created for implementing multiple specifications. In circuit development, similar circuits are often developed simultaneously. For example, circuits with different performance considerations such as the number of the bits or the speed

6 are defined in the same Verilog HDL source files. This is particularly common in FPGA development where we have many variations with different specifications. Because of the low-level features of Verilog HDL for defining raw events in the circuits, it is difficult to merge this type of code clones. VI. DISCUSSION This section answers RQs based on the previous investigations. In addition, we discuss threats to validity. A. Answer to Research Question 1) RQ1:What is the difference between general programming languages and Verilog HDL with regard to code clones?: By comparing the metrics between C/Java products and Verilog HDL products, we could identify several differences. Code clones in Verilog HDL do not spread across a large number of files compared to those in C/Java. On the other hand, clone coverage of Verilog HDL files are higher than those of C/Java files. These results indicate that code clones in Verilog HDL are relatively large and spread within a single file or a small number of files. In addition, clone fragments in Verilog HDL contain numerous conditional branches. The overall trend of the Verilog HDL metrics is similar to C rather than Java. This may reflect the fact that both C and Verilog HDL are used for describing low-level aspects of the system, such as device drivers and circuit constructions, respectively. On the other hand, Java is an object-oriented language capable of abstract description. By qualitative analysis, we found many code clones in Verilog HDL involving if statements where only conditions are different. From software developers viewpoints, the code should be merged by organizing the conditions. However, if these if statements are merged, problems in circuit performance and stability may occur. The problems include increase in circuit footprints, decrease in processing speed, and unintended behaviors. This is an important difference between general programming languages and Verilog HDL. HDLs must deal with not only logical but also physical consequences. 2) RQ2:Why are code clones created?: As the result of observation, the following patterns of code clones were confirmed. 1: Code clones related to assignment statements. 2: Code clones which are always block units. 3: Code clones including if/case blocks. 4: Code clones which are module/file units. The bit operations are peculiar in digital circuit design and they are repeated in many places in the form of assignment statements. This accounts for the code clones in the first category above. The always block specifying asynchronous events are often replicated by modifying conditions in its if/case statements to increase degrees of parallel processing and stability of the system. This accounts for the second and the third categories above. Multiple products having different performance requirements are often developed simultaneously. The lack of abstraction support in Verilog HDL forces duplication in the large. This accounts for the last category above. B. Threats to Validity Although we have argued the effectiveness of the conversion from Verilog HDL to the pseudo C++, we have not thoroughly evaluated the accuracy of code clone detection. Since no tokens recognized by CCFinderX are added or deleted, the accuracy of the proposed method depends on the ability of CCFinderX. The 1st author manually classified detected code clones and confirmed that 1,466 cases which are 80% of the detection results are actual code clones. The 20% of the detection results are unrelated code fragments where the sequence of grammatical elements happens to be identical. It is the characteristic of token base detection tools such as CCFinderX. In order to evaluate the accuracy, we need an oracle. However, it is hard to determine the complete set of clones as an oracle. Bellon et al. created such an oracle for evaluating multiple detection tools [20]. Roy and Cordy proposed a method to automatically evaluate recall by making an obviously correct answer using mutation/injection [21]. Evaluation of the proposed method, especially the recall, is left for a future work. In this study, we investigated 10 Verilog HDL products and 10 C/Java products. They may not be sufficient for us to generalize the results reported in the current paper. However, we chose the Verilog HDL products in 5 categories, and the characteristics of Verilog HDL code clones should be well covered. Although the current results are based on observation of code clones, we plan to analyze statistical differences across languages in the future. In this study, we relied on CCFinderX for code clone detection. CCFinderX can not detect the Type-3 code clone, which is the code fragment extensively modified after duplication. Although we believe that our analysis based on the Type-1 and the Type-2 code clones covers fundamental aspects of code clones in Verilog HDL, analysis of Type-3 code clones may offer new insights. Since our method is applicable to other code clone detection tools that allow non-strict parsing, we may be able to analyze Type-3 code clones in the future. VII. RELATED WORK Sudakrishnan et al. investigated the bug fix patterns of the Verilog HDL code [4]. They reported that 29-55% of changes for the bug fix were related to assignment statements and 18-25% related to if statements. Their investigation used the bug reports and the change histories of the code. In our investigation, many code clones related assignments and if statements are found. It would be interested to investigate whether the code clone affects the bugs. Alternatively, appropriately management for copy and paste may help reduce bugs. In our proposed method, the Verilog HDL code are converted into the pseudo C++ code and CCFinderX detects code clone from the pseudo C++ code. As a similar method, Roy and Cordy proposed the code clone detection tool NiCAD [22]. NiCAD parses and deforms the original code according to the rules and detects code clones. The rules are described in TXL which is a programming language designed for source code analysis. However, it is not easy to describe rules in TXL.

7 Our proposed method can detect code clones using CCFinderX and the conversion rules which consists of only a few regular expressions. This method can be applied quickly and easily to other languages. Kapser and Godfrey classified intentions of code clones in software and analyze their advantages and disadvantages [23]. We plan to such classification of code clones in the Verilog HDL code as future work. VIII. CONCLUSION In this paper, we investigated code clones in Verilog HDL. Investigation of code clones in HDL had not been reported as far as we know. We have proposed a simple but effective method for detecting code clones in Verilog HDL. To the best of our knowledge, this is the first reported method on the subject. The method converts the Verilog HDL code to the pseudo C++ and employs CCFinderX for detecting code clones in its C/C++ mode. Since the method does not add or delete any tokens recognized by CCFinderX, the detection capability genuinely depends on the ability of CCFinderX to detect C++ code clones. We have conducted experiment on 10 open source Verilog HDL products to demonstrate the effectiveness of the method. Since the proposed method is not limited to Verilog HDL, it enables code clone analysis for the languages not supported by existing tools. We have compared code clones in Verilog HDL products with those in C/Java products based on the metrics to identify differences between them. Although the overall trend in the metrics for Verilog HDL products is similar to that of C, Verilog HDL products demonstrate a higher coverage of code clones than C/Java products. We have noticed that code clones in Verilog HDL reflect how circuits are developed in the language. For instance, a bit string operation is often defined by a sequence of similar assignment statements, and reset circuits and input dependent circuits are often defined by cloning an always block and an if/case block, respectively. We have observed many cases of file/module cloning, which are the result of developing multiple products with similar, yet different specifications. We believe that the tool support for consistent modification over clone sets is also useful for Verilog HDL. On the other hand, we have noted that aggregating clone sets in Verilog HDL must take into consideration the trade-offs between computational parallelism and circuit footprints. We plan to work on this issue in the near future. The future work also includes measuring recall of the proposed method. We plan to employ the clone mutation injection method [21] for this purpose. Regarding the fact that the proposed method cannot detect Type-3 code clones, we plan to employ methods based on abstract syntax trees. ACKNOWLEDGMENT The second author is supported by JSPS KAKENHI Grant Number REFERENCES [1] K. Neshatpour, M. Malik, and H. Homayoun, Accelerating Machine Learning Kernel in Hadoop Using FPGAs, in Proc. of CCGrid 15, 2015, pp [2] N. Katayoun, M. Maria, G. Mohammad, Ali, and H. Houman, Energy- Efficient Acceleration of Big Data Analytics Applications Using FP- GAs, in Proc. of BigData 15, 2015, pp [3] A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Xiao, and D. Burger, A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services, in Proc. of ISCA 14, 2014, pp [4] S. Sudakrishnan, J. Madhavan, E. J. Whitehead, Jr., and J. Renau, Understanding Bug Fix Patterns in Verilog, in Proc. of MSR 08, 2008, pp [5] A. Duley, C. Spandikow, and M. Kim, Vdiff: a program differencing algorithm for Verilog hardware description language, Automated Software Engineering, vol. 19, no. 4, pp , [6] I. D. Baxter, A. Yahin, L. Moura, M. Sant Anna, and L. Bier, Clone Detection Using Abstract Syntax Trees, in Proc. of ICSM 98, 1998, pp [7] M. Kim, L. Bergman, T. Lau, and D. Notkin, An Ethnographic Study of Copy and Paste Programming Practices in OOPL, in Proc. of ISESE 04, 2004, pp [8] A. Zeller, Why Programs Fail. Morgan Kaufmann Publishers, [9] B. S. Baker, Finding Clones with Dup: Analysis of an Experiment, IEEE Transactions on Software Engineering, vol. 33, no. 9, pp , [10] T. Kamiya, S. Kusumoto, and K. Inoue, CCFinder: a multilinguistic token-based code clone detection system for large scale source code, IEEE Transactions on Software Engineering, vol. 28, no. 7, pp , [11] Z. Li, S. Myagmar, S. Lu, and Y. Zhou, CP-Miner: Finding Copy-Paste and Related Bugs in Large-Scale Software Code, IEEE Transactions on Software Engineering, vol. 32, no. undefined. [12] L. Jiang, G. Misherghi, Z. Su, and S. Glondu, DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones, in Proc. of ICSE 07, 2007, pp [13] P. Jablonski and D. Hou, CReN: A Tool for Tracking Copy-and-paste Code Clones and Renaming Identifiers Consistently in the IDE, in Proc. of OOPSLA Eclipse 07, 2007, pp [14] H. A. Nguyen, T. T. Nguyen, N. Pham, J. Al-Kofahi, and T. Nguyen, Clone Management for Evolving Software, IEEE Transactions on Software Engineering, vol. 38, no. 5, pp , [15] E. Choi, N. Yoshida, and K. Inoue, An Investigation into the Characteristics of Merged Code Clones during Software Evolution, IEICE Transactions on Information and Systems, vol. E97.D, no. 5, pp , [16] CCFinder Official Site, [17] W. Stoye, D. Greaves, N. Richards, and J. Green, Using RTL-to- C++ translation for large soc concurrent engineering: a case study, Electronics Systems and Software, vol. 1, no. 1, pp , [18] Verilog2C++, [19] Y. Ueda, T. Kamiya, S. Kusumoto, and K. Inoue, Gemini: maintenance support environment based on code clone analysis, in METRICS 02, 2002, pp [20] S. Bellon, R. Koschke, G. Antoniol, J. Krinke, and E. Merlo, Comparison and Evaluation of Clone Detection Tools, IEEE Transactions on Software Engineering, vol. 33, no. 9, pp , [21] C. K. Roy and J. R. Cordy, A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools, in Proc. of ICSTW, 2009, pp [22] C. K. Roy and J. R. Cordy, NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization, in Proc. of ICPC, 2008, pp [23] C. J. Kapser and M. W. Godfrey, Cloning considered harmful considered harmful: patterns of cloning in software, Empirical Software Engineering, vol. 13, no. 6, pp , 2008.

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

Searching for Configurations in Clone Evaluation A Replication Study

Searching for Configurations in Clone Evaluation A Replication Study Searching for Configurations in Clone Evaluation A Replication Study Chaiyong Ragkhitwetsagul 1, Matheus Paixao 1, Manal Adham 1 Saheed Busari 1, Jens Krinke 1 and John H. Drake 2 1 University College

More information

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization

CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization 2017 24th Asia-Pacific Software Engineering Conference CCFinderSW: Clone Detection Tool with Flexible Multilingual Tokenization Yuichi Semura, Norihiro Yoshida, Eunjong Choi and Katsuro Inoue Osaka University,

More information

Code Clone Detection Technique Using Program Execution Traces

Code Clone Detection Technique Using Program Execution Traces 1,a) 2,b) 1,c) Code Clone Detection Technique Using Program Execution Traces Masakazu Ioka 1,a) Norihiro Yoshida 2,b) Katsuro Inoue 1,c) Abstract: Code clone is a code fragment that has identical or similar

More information

On Refactoring for Open Source Java Program

On Refactoring for Open Source Java Program On Refactoring for Open Source Java Program Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 and Yoshio Kataoka 3 1 Graduate School of Information Science and Technology, Osaka University

More information

Refactoring Support Based on Code Clone Analysis

Refactoring Support Based on Code Clone Analysis Refactoring Support Based on Code Clone Analysis Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1 and Katsuro Inoue 1 1 Graduate School of Information Science and Technology, Osaka University, Toyonaka,

More information

Problematic Code Clones Identification using Multiple Detection Results

Problematic Code Clones Identification using Multiple Detection Results Problematic Code Clones Identification using Multiple Detection Results Yoshiki Higo, Ken-ichi Sawa, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, 1-5, Yamadaoka,

More information

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Extracting Code Clones for Refactoring Using Combinations of Clone Metrics Eunjong Choi 1, Norihiro Yoshida 2, Takashi Ishio 1, Katsuro Inoue 1, Tateki Sano 3 1 Graduate School of Information Science and

More information

Token based clone detection using program slicing

Token based clone detection using program slicing Token based clone detection using program slicing Rajnish Kumar PEC University of Technology Rajnish_pawar90@yahoo.com Prof. Shilpa PEC University of Technology Shilpaverma.pec@gmail.com Abstract Software

More information

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones Detection using Textual and Metric Analysis to figure out all Types of s Kodhai.E 1, Perumal.A 2, and Kanmani.S 3 1 SMVEC, Dept. of Information Technology, Puducherry, India Email: kodhaiej@yahoo.co.in

More information

SHINOBI: A Real-Time Code Clone Detection Tool for Software Maintenance

SHINOBI: A Real-Time Code Clone Detection Tool for Software Maintenance : A Real-Time Code Clone Detection Tool for Software Maintenance Takanobu Yamashina Hidetake Uwano Kyohei Fushida Yasutaka Kamei Masataka Nagura Shinji Kawaguchi Hajimu Iida Nara Institute of Science and

More information

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files

PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files IEICE TRANS. INF. & SYST., VOL.E98 D, NO.2 FEBRUARY 2015 325 PAPER Proposing and Evaluating Clone Detection Approaches with Preprocessing Input Source Files Eunjong CHOI a), Nonmember, Norihiro YOSHIDA,

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

A Measurement of Similarity to Identify Identical Code Clones

A Measurement of Similarity to Identify Identical Code Clones The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department

More information

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN

International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February ISSN International Journal of Scientific & Engineering Research, Volume 8, Issue 2, February-2017 164 DETECTION OF SOFTWARE REFACTORABILITY THROUGH SOFTWARE CLONES WITH DIFFRENT ALGORITHMS Ritika Rani 1,Pooja

More information

EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS

EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS EVALUATION OF TOKEN BASED TOOLS ON THE BASIS OF CLONE METRICS Rupinder Kaur, Harpreet Kaur, Prabhjot Kaur Abstract The area of clone detection has considerably evolved over the last decade, leading to

More information

CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming

CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming CnP: Towards an Environment for the Proactive Management of Copy-and-Paste Programming Daqing Hou, Patricia Jablonski, and Ferosh Jacob Electrical and Computer Engineering, Clarkson University, Potsdam,

More information

A Novel Technique for Retrieving Source Code Duplication

A Novel Technique for Retrieving Source Code Duplication A Novel Technique for Retrieving Source Code Duplication Yoshihisa Udagawa Computer Science Department, Faculty of Engineering Tokyo Polytechnic University Atsugi-city, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp

More information

On Refactoring Support Based on Code Clone Dependency Relation

On Refactoring Support Based on Code Clone Dependency Relation On Refactoring Support Based on Code Dependency Relation Norihiro Yoshida 1, Yoshiki Higo 1, Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 1 Graduate School of Information Science and Technology,

More information

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection of Potential

More information

A Technique to Detect Multi-grained Code Clones

A Technique to Detect Multi-grained Code Clones Detection Time The Number of Detectable Clones A Technique to Detect Multi-grained Code Clones Yusuke Yuki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014)

Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Electronic Communications of the EASST Volume 63 (2014) Proceedings of the Eighth International Workshop on Software Clones (IWSC 2014) Toward a Code-Clone Search through the Entire Lifecycle Position

More information

Hardware Software Co-Design: Not Just a Cliché

Hardware Software Co-Design: Not Just a Cliché Hardware Software Co-Design: Not Just a Cliché Adrian Sampson James Bornholt Luis Ceze University of Washington SNAPL 2015 sa pa time immemorial 2005 2015 (not to scale) free lunch time immemorial 2005

More information

Using Compilation/Decompilation to Enhance Clone Detection

Using Compilation/Decompilation to Enhance Clone Detection Using Compilation/Decompilation to Enhance Clone Detection Chaiyong Ragkhitwetsagul, Jens Krinke University College London, UK Abstract We study effects of compilation and decompilation to code clone detection

More information

Gapped Code Clone Detection with Lightweight Source Code Analysis

Gapped Code Clone Detection with Lightweight Source Code Analysis Gapped Code Clone Detection with Lightweight Source Code Analysis Hiroaki Murakami, Keisuke Hotta, Yoshiki Higo, Hiroshi Igaki, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Detection of Non Continguous Clones in Software using Program Slicing

Detection of Non Continguous Clones in Software using Program Slicing Detection of Non Continguous Clones in Software using Program Slicing Er. Richa Grover 1 Er. Narender Rana 2 M.Tech in CSE 1 Astt. Proff. In C.S.E 2 GITM, Kurukshetra University, INDIA Abstract Code duplication

More information

Compiling clones: What happens?

Compiling clones: What happens? Compiling clones: What happens? Oleksii Kononenko, Cheng Zhang, and Michael W. Godfrey David R. Cheriton School of Computer Science University of Waterloo, Canada {okononen, c16zhang, migod}@uwaterloo.ca

More information

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique

Study and Analysis of Object-Oriented Languages using Hybrid Clone Detection Technique Advances in Computational Sciences and Technology ISSN 0973-6107 Volume 10, Number 6 (2017) pp. 1635-1649 Research India Publications http://www.ripublication.com Study and Analysis of Object-Oriented

More information

Clone Detection Using Scope Trees

Clone Detection Using Scope Trees Int'l Conf. Software Eng. Research and Practice SERP'18 193 Clone Detection Using Scope Trees M. Mohammed and J. Fawcett Department of Computer Science and Electrical Engineering, Syracuse University,

More information

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools

ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools ForkSim: Generating Software Forks for Evaluating Cross-Project Similarity Analysis Tools Jeffrey Svajlenko Chanchal K. Roy University of Saskatchewan, Canada {jeff.svajlenko, chanchal.roy}@usask.ca Slawomir

More information

An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies

An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies An Automatic Framework for Extracting and Classifying Near-Miss Clone Genealogies Ripon K. Saha Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {ripon.saha,

More information

Sub-clones: Considering the Part Rather than the Whole

Sub-clones: Considering the Part Rather than the Whole Sub-clones: Considering the Part Rather than the Whole Robert Tairas 1 and Jeff Gray 2 1 Department of Computer and Information Sciences, University of Alabama at Birmingham, Birmingham, AL 2 Department

More information

On the Robustness of Clone Detection to Code Obfuscation

On the Robustness of Clone Detection to Code Obfuscation On the Robustness of Clone Detection to Code Obfuscation Sandro Schulze TU Braunschweig Braunschweig, Germany sandro.schulze@tu-braunschweig.de Daniel Meyer University of Magdeburg Magdeburg, Germany Daniel3.Meyer@st.ovgu.de

More information

code pattern analysis of object-oriented programming languages

code pattern analysis of object-oriented programming languages code pattern analysis of object-oriented programming languages by Xubo Miao A thesis submitted to the School of Computing in conformity with the requirements for the degree of Master of Science Queen s

More information

KClone: A Proposed Approach to Fast Precise Code Clone Detection

KClone: A Proposed Approach to Fast Precise Code Clone Detection KClone: A Proposed Approach to Fast Precise Code Clone Detection Yue Jia 1, David Binkley 2, Mark Harman 1, Jens Krinke 1 and Makoto Matsushita 3 1 King s College London 2 Loyola College in Maryland 3

More information

Incremental Clone Detection and Elimination for Erlang Programs

Incremental Clone Detection and Elimination for Erlang Programs Incremental Clone Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code

More information

Detection and Behavior Identification of Higher-Level Clones in Software

Detection and Behavior Identification of Higher-Level Clones in Software Detection and Behavior Identification of Higher-Level Clones in Software Swarupa S. Bongale, Prof. K. B. Manwade D. Y. Patil College of Engg. & Tech., Shivaji University Kolhapur, India Ashokrao Mane Group

More information

A Study of Repetitiveness of Code Changes in Software Evolution

A Study of Repetitiveness of Code Changes in Software Evolution A Study of Repetitiveness of Code Changes in Software Evolution Hoan Anh Nguyen, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan Iowa State University Email: {hoan,anhnt,tung,tien,hridesh}@iastate.edu

More information

Enhancing Source-Based Clone Detection Using Intermediate Representation

Enhancing Source-Based Clone Detection Using Intermediate Representation Enhancing Source-Based Detection Using Intermediate Representation Gehan M. K. Selim School of Computing, Queens University Kingston, Ontario, Canada, K7L3N6 gehan@cs.queensu.ca Abstract Detecting software

More information

An investigation into the impact of software licenses on copy-and-paste reuse among OSS projects

An investigation into the impact of software licenses on copy-and-paste reuse among OSS projects An investigation into the impact of software licenses on copy-and-paste reuse among OSS projects Yu Kashima, Yasuhiro Hayase, Norihiro Yoshida, Yuki Manabe, Katsuro Inoue Graduate School of Information

More information

Falsification: An Advanced Tool for Detection of Duplex Code

Falsification: An Advanced Tool for Detection of Duplex Code Indian Journal of Science and Technology, Vol 9(39), DOI: 10.17485/ijst/2016/v9i39/96195, October 2016 ISSN (Print) : 0974-6846 ISSN (Online) : 0974-5645 Falsification: An Advanced Tool for Detection of

More information

An Experience Report on Analyzing Industrial Software Systems Using Code Clone Detection Techniques

An Experience Report on Analyzing Industrial Software Systems Using Code Clone Detection Techniques An Experience Report on Analyzing Industrial Software Systems Using Code Clone Detection Techniques Norihiro Yoshida (NAIST) Yoshiki Higo, Shinji Kusumoto, Katsuro Inoue (Osaka University) Outline 1. What

More information

A Tool for Multilingual Detection of Code Clones Using Syntax Definitions

A Tool for Multilingual Detection of Code Clones Using Syntax Definitions 1,a) 2 3 1 CCFinder CCFinder 1 CCFinderSW 1 ANTLR CCFinderSW ANTLR A Tool for Multilingual Detection of Code Clones Using Syntax Definitions Yuichi Semura 1,a) Norihiro Yoshida 2 Eunjong Choi 3 Katsuro

More information

Master Thesis. Type-3 Code Clone Detection Using The Smith-Waterman Algorithm

Master Thesis. Type-3 Code Clone Detection Using The Smith-Waterman Algorithm Master Thesis Title Type-3 Code Clone Detection Using The Smith-Waterman Algorithm Supervisor Prof. Shinji KUSUMOTO by Hiroaki MURAKAMI February 5, 2013 Department of Computer Science Graduate School of

More information

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su

Automatic Mining of Functionally Equivalent Code Fragments via Random Testing. Lingxiao Jiang and Zhendong Su Automatic Mining of Functionally Equivalent Code Fragments via Random Testing Lingxiao Jiang and Zhendong Su Cloning in Software Development How New Software Product Cloning in Software Development Search

More information

A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs

A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs STONY BROOK UNIVERSITY CEAS Technical Report 839 A VM-HDL Co-Simulation Framework for Systems with PCIe-Connected FPGAs Shenghsun Cho, Mrunal Patel, Basavaraj Kaladagi, Han Chen, Tapti Palit, Michael Ferdman,

More information

Code Clone Analysis and Application

Code Clone Analysis and Application Code Clone Analysis and Application Katsuro Inoue Osaka University Talk Structure Clone Detection CCFinder and Associate Tools Applications Summary of Code Clone Analysis and Application Clone Detection

More information

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim

Lecture 25 Clone Detection CCFinder. EE 382V Spring 2009 Software Evolution - Instructor Miryung Kim Lecture 25 Clone Detection CCFinder Today s Agenda (1) Recap of Polymetric Views Class Presentation Suchitra (advocate) Reza (skeptic) Today s Agenda (2) CCFinder, Kamiya et al. TSE 2002 Recap of Polymetric

More information

Software Clone Detection and Refactoring

Software Clone Detection and Refactoring Software Clone Detection and Refactoring Francesca Arcelli Fontana *, Marco Zanoni *, Andrea Ranchetti * and Davide Ranchetti * * University of Milano-Bicocca, Viale Sarca, 336, 20126 Milano, Italy, {arcelli,marco.zanoni}@disco.unimib.it,

More information

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique 1 Syed MohdFazalulHaque, 2 Dr. V Srikanth, 3 Dr. E. Sreenivasa Reddy 1 Maulana Azad National Urdu University, 2 Professor,

More information

COMPARISON AND EVALUATION ON METRICS

COMPARISON AND EVALUATION ON METRICS COMPARISON AND EVALUATION ON METRICS BASED APPROACH FOR DETECTING CODE CLONE D. Gayathri Devi 1 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu dgayadevi@gmail.com Abstract

More information

Automatic Identification of Important Clones for Refactoring and Tracking

Automatic Identification of Important Clones for Refactoring and Tracking Automatic Identification of Important Clones for Refactoring and Tracking Manishankar Mondal Chanchal K. Roy Kevin A. Schneider Department of Computer Science, University of Saskatchewan, Canada {mshankar.mondal,

More information

Research Article An Empirical Study on the Impact of Duplicate Code

Research Article An Empirical Study on the Impact of Duplicate Code Advances in Software Engineering Volume 212, Article ID 938296, 22 pages doi:1.1155/212/938296 Research Article An Empirical Study on the Impact of Duplicate Code Keisuke Hotta, Yui Sasaki, Yukiko Sano,

More information

Digital Integrated Circuits

Digital Integrated Circuits Digital Integrated Circuits Lecture 3 Jaeyong Chung System-on-Chips (SoC) Laboratory Incheon National University GENERAL MODEL OF MEALY MACHINE Chung EPC6055 2 GENERAL MODEL OF MOORE MACHINE Chung EPC6055

More information

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools

A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools A Mutation / Injection-based Automatic Framework for Evaluating Code Clone Detection Tools Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON, Canada K7L 3N6 {croy,

More information

How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects

How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects How are Developers Treating License Inconsistency Issues? A Case Study on License Inconsistency Evolution in FOSS Projects Yuhao Wu 1(B), Yuki Manabe 2, Daniel M. German 3, and Katsuro Inoue 1 1 Graduate

More information

A Study on A Tool to Suggest Similar Program Element Modifications

A Study on A Tool to Suggest Similar Program Element Modifications WASEDA UNIVERSITY Graduate School of Fundamental Science and Engineering A Study on A Tool to Suggest Similar Program Element Modifications A Thesis Submitted in Partial Fulfillment of the Requirements

More information

An Exploratory Study on Interface Similarities in Code Clones

An Exploratory Study on Interface Similarities in Code Clones 1 st WETSoDA, December 4, 2017 - Nanjing, China An Exploratory Study on Interface Similarities in Code Clones Md Rakib Hossain Misu, Abdus Satter, Kazi Sakib Institute of Information Technology University

More information

Code duplication in Software Systems: A Survey

Code duplication in Software Systems: A Survey Code duplication in Software Systems: A Survey G. Anil kumar 1 Dr. C.R.K.Reddy 2 Dr. A. Govardhan 3 A. Ratna Raju 4 1,4 MGIT, Dept. of Computer science, Hyderabad, India Email: anilgkumar@mgit.ac.in, ratnaraju@mgit.ac.in

More information

Small Scale Detection

Small Scale Detection Software Reengineering SRe2LIC Duplication Lab Session February 2008 Code duplication is the top Bad Code Smell according to Kent Beck. In the lecture it has been said that duplication in general has many

More information

Visualization of Clone Detection Results

Visualization of Clone Detection Results Visualization of Clone Detection Results Robert Tairas and Jeff Gray Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham, AL 5294-1170 1-205-94-221 {tairasr,

More information

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling

Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Classifying Bug Reports to Bugs and Other Requests Using Topic Modeling Natthakul Pingclasai Department of Computer Engineering Kasetsart University Bangkok, Thailand Email: b5310547207@ku.ac.th Hideaki

More information

Source Code Reuse Evaluation by Using Real/Potential Copy and Paste

Source Code Reuse Evaluation by Using Real/Potential Copy and Paste Source Code Reuse Evaluation by Using Real/Potential Copy and Paste Takafumi Ohta, Hiroaki Murakami, Hiroshi Igaki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology,

More information

Accuracy Enhancement in Code Clone Detection Using Advance Normalization

Accuracy Enhancement in Code Clone Detection Using Advance Normalization Accuracy Enhancement in Code Clone Detection Using Advance Normalization 1 Ritesh V. Patil, 2 S. D. Joshi, 3 Digvijay A. Ajagekar, 4 Priyanka A. Shirke, 5 Vivek P. Talekar, 6 Shubham D. Bankar 1 Research

More information

Relation of Code Clones and Change Couplings

Relation of Code Clones and Change Couplings Relation of Code Clones and Change Couplings Reto Geiger, Beat Fluri, Harald C. Gall, and Martin Pinzger s.e.a.l. software evolution and architecture lab, Department of Informatics, University of Zurich,

More information

Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools

Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools , pp. 31-50 http://dx.doi.org/10.14257/ijseia.2017.11.3.04 Performance Evaluation and Comparative Analysis of Code- Clone-Detection Techniques and Tools Harpreet Kaur 1 * (Assistant Professor) and Raman

More information

Mining Revision Histories to Detect Cross-Language Clones without Intermediates

Mining Revision Histories to Detect Cross-Language Clones without Intermediates Mining Revision Histories to Detect Cross-Language Clones without Intermediates Xiao Cheng 1, Zhiming Peng 2, Lingxiao Jiang 2, Hao Zhong 1, Haibo Yu 3, Jianjun Zhao 4 1 Department of Computer Science

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering

Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Efficiently Measuring an Accurate and Generalized Clone Detection Precision using Clone Clustering Jeffrey Svajlenko Chanchal K. Roy Department of Computer Science, University of Saskatchewan, Saskatoon,

More information

Code Clone Detection on Specialized PDGs with Heuristics

Code Clone Detection on Specialized PDGs with Heuristics 2011 15th European Conference on Software Maintenance and Reengineering Code Clone Detection on Specialized PDGs with Heuristics Yoshiki Higo Graduate School of Information Science and Technology Osaka

More information

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization

NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization NICAD: Accurate Detection of Near-Miss Intentional Clones Using Flexible Pretty-Printing and Code Normalization Chanchal K. Roy and James R. Cordy School of Computing, Queen s University Kingston, ON,

More information

Analysis of Coding Patterns over Software Versions

Analysis of Coding Patterns over Software Versions Information and Media Technologies 0(): - (05) reprinted from: Computer Software (): 0- (05) Analysis of Coding Patterns over Software Versions Hironori Date, Takashi Ishio, Makoto Matsushita, Katsuro

More information

Context-Based Detection of Clone-Related Bugs. Lingxiao Jiang, Zhendong Su, Edwin Chiu University of California at Davis

Context-Based Detection of Clone-Related Bugs. Lingxiao Jiang, Zhendong Su, Edwin Chiu University of California at Davis Context-Based Detection of Clone-Related Bugs Lingxiao Jiang, Zhendong Su, Edwin Chiu University of California at Davis 1 Outline Introduction and samples of cloning errors Definitions of inconsistencies

More information

Software Quality Analysis by Code Clones in Industrial Legacy Software

Software Quality Analysis by Code Clones in Industrial Legacy Software Software Quality Analysis by Code Clones in Industrial Legacy Software Akito Monden 1 Daikai Nakae 1 Toshihiro Kamiya 2 Shin-ichi Sato 1,3 Ken-ichi Matsumoto 1 1 Nara Institute of Science and Technology,

More information

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India

Dr. Sushil Garg Professor, Dept. of Computer Science & Applications, College City, India Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Study of Different

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

EECS 270 Verilog Reference: Sequential Logic

EECS 270 Verilog Reference: Sequential Logic 1 Introduction EECS 270 Verilog Reference: Sequential Logic In the first few EECS 270 labs, your designs were based solely on combinational logic, which is logic that deps only on its current inputs. However,

More information

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 11. Introduction to Verilog II Sequential Circuits

Date Performed: Marks Obtained: /10. Group Members (ID):. Experiment # 11. Introduction to Verilog II Sequential Circuits Name: Instructor: Engr. Date Performed: Marks Obtained: /10 Group Members (ID):. Checked By: Date: Experiment # 11 Introduction to Verilog II Sequential Circuits OBJECTIVES: To understand the concepts

More information

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code DCCD: An Efficient and Scalable Distributed Code Clone Detection Technique for Big Code Junaid Akram (Member, IEEE), Zhendong Shi, Majid Mumtaz and Luo Ping State Key Laboratory of Information Security,

More information

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.

More information

Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming

Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming Exploring the Design Space of Proactive Tool Support for Copy-and-Paste Programming Daqing Hou, Ferosh Jacob, and Patricia Jablonski Electrical and Computer Engineering, Clarkson University, Potsdam, NY

More information

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu

Deckard: Scalable and Accurate Tree-based Detection of Code Clones. Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu Deckard: Scalable and Accurate Tree-based Detection of Code Clones Lingxiao Jiang, Ghassan Misherghi, Zhendong Su, Stephane Glondu The Problem Find similar code in large code bases, often referred to as

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

Research Article Software Clone Detection and Refactoring

Research Article Software Clone Detection and Refactoring ISRN Software Engineering Volume 2013, Article ID 129437, 8 pages http://dx.doi.org/10.1155/2013/129437 Research Article Software Clone Detection and Refactoring Francesca Arcelli Fontana, Marco Zanoni,

More information

A Tree Kernel Based Approach for Clone Detection

A Tree Kernel Based Approach for Clone Detection A Tree Kernel Based Approach for Clone Detection Anna Corazza 1, Sergio Di Martino 1, Valerio Maggio 1, Giuseppe Scanniello 2 1) University of Naples Federico II 2) University of Basilicata Outline Background

More information

The Verilog Language COMS W Prof. Stephen A. Edwards Fall 2002 Columbia University Department of Computer Science

The Verilog Language COMS W Prof. Stephen A. Edwards Fall 2002 Columbia University Department of Computer Science The Verilog Language COMS W4995-02 Prof. Stephen A. Edwards Fall 2002 Columbia University Department of Computer Science The Verilog Language Originally a modeling language for a very efficient event-driven

More information

Identification of Structural Clones Using Association Rule and Clustering

Identification of Structural Clones Using Association Rule and Clustering Identification of Structural Clones Using Association Rule and Clustering Dr.A.Muthu Kumaravel Dept. of MCA, Bharath University, Chennai-600073, India ABSTRACT: Code clones are similar program structures

More information

Software Clone Detection Using Cosine Distance Similarity

Software Clone Detection Using Cosine Distance Similarity Software Clone Detection Using Cosine Distance Similarity A Dissertation SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE & ENGINEERING

More information

Synthesizable Verilog

Synthesizable Verilog Synthesizable Verilog Courtesy of Dr. Edwards@Columbia, and Dr. Franzon@NCSU http://csce.uark.edu +1 (479) 575-6043 yrpeng@uark.edu Design Methodology Structure and Function (Behavior) of a Design HDL

More information

Incremental Code Clone Detection: A PDG-based Approach

Incremental Code Clone Detection: A PDG-based Approach Incremental Code Clone Detection: A PDG-based Approach Yoshiki Higo, Yasushi Ueda, Minoru Nishino, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, 1-5, Yamadaoka,

More information

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering

Design Code Clone Detection System uses Optimal and Intelligence Technique based on Software Engineering Volume 8, No. 5, May-June 2017 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info ISSN No. 0976-5697 Design Code Clone Detection System uses

More information

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and

More information

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT

IJREAS Volume 2, Issue 2 (February 2012) ISSN: SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT ABSTRACT SOFTWARE CLONING IN EXTREME PROGRAMMING ENVIRONMENT Ginika Mahajan* Ashima** ABSTRACT Software systems are evolving by adding new functions and modifying existing functions over time. Through the evolution,

More information

Writing Circuit Descriptions 8

Writing Circuit Descriptions 8 8 Writing Circuit Descriptions 8 You can write many logically equivalent descriptions in Verilog to describe a circuit design. However, some descriptions are more efficient than others in terms of the

More information

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching

Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Reduction of Periodic Broadcast Resource Requirements with Proxy Caching Ewa Kusmierek and David H.C. Du Digital Technology Center and Department of Computer Science and Engineering University of Minnesota

More information

Clone Detection Using Abstract Syntax Suffix Trees

Clone Detection Using Abstract Syntax Suffix Trees Clone Detection Using Abstract Syntax Suffix Trees Rainer Koschke, Raimar Falke, Pierre Frenzel University of Bremen, Germany http://www.informatik.uni-bremen.de/st/ {koschke,rfalke,saint}@informatik.uni-bremen.de

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs

A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs A Reconfigurable MapReduce Accelerator for multi-core all-programmable SoCs Christoforos Kachris, Georgios Ch. Sirakoulis Department of Electrical & Computer Engineering Democritus University of Thrace,

More information

Large-Scale Clone Detection and Benchmarking

Large-Scale Clone Detection and Benchmarking Large-Scale Clone Detection and Benchmarking A Thesis Submitted to the College of Graduate and Postdoctoral Studies in Partial Fulfillment of the Requirements for the degree of Doctor of Philosophy in

More information

Memory and Programmable Logic

Memory and Programmable Logic Memory and Programmable Logic Memory units allow us to store and/or retrieve information Essentially look-up tables Good for storing data, not for function implementation Programmable logic device (PLD),

More information