Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages

Size: px
Start display at page:

Download "Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages"

Transcription

1 Clone Analysis in the Web Era: an Approach to Identify Cloned Web Pages Giuseppe Antonio Di Lucca, Massimiliano Di Penta*, Anna Rita Fasolino, Pasquale Granato ( ) Dipartimento di Informatica e Sistemistica, Università di Napoli Federico II Via Claudio, 21, Napoli, Italy (*) Università del Sannio Facoltà di Ingegneria Piazza Roma, I Benevento, Italy Abstract The Internet and World Wide Web diffusion are producing a substantial increase in the demand of web sites and web applications. The very short time-to-market of a web application, and the lack of method for developing it, promote an incremental development fashion where new pages are usually obtained reusing (i.e. cloning ) pieces of existing pages without adequate documentation about these code duplications and redundancies. The presence of clones increase system complexity and the effort to test, maintain and evolve web systems, thus the identification of clones may reduce the effort devoted to these activities as well as to facilitate the migration to different architectures. This paper proposes an approach for detecting clones in web sites and web applications, obtained tailoring the existing methods to detect clones in traditional software systems. The approach has been assessed performing analysis on several web sites and web applications. 1. Introduction The rapid diffusion of the Internet and of the World Wide Web infrastructure has recently produced a considerable increase of the demand of new web sites and web applications (WA). The lack of method in developing these applications, besides the very short time-to-market due to pressing demand, very often result in disordered and chaotic architectures, and in inadequate, incorrect, and incomplete development documentation. Indeed, the development of a WA is generally performed in an incremental fashion, where additional pages are usually obtained by reusing the code of existing pages or page components, but without explicitly documenting these code duplications and redundancies. This in turn may increase code complexity and augment the effort required to test, maintain and evolve these applications. Moreover, if the WA are maintained and evolved with the same approach, further duplications and redundancies are likely to be added, and increased disorder may affect the code structure, and worsen its maintainability. This situation is similar to the one occurred in the past in the development and maintenance of large size systems where, especially as a consequence of poor design and of performed maintenance interventions, large portions of duplicated code was produced. These portions of duplicated code are generally called clones and clone analysis is the research field that investigates methods and techniques for automatically detecting duplicated portions of code in software artifacts. The approaches to clone analysis proposed in the literature are suitable for analyzing traditional software systems with a procedural or objectoriented implementation. In particular, methods based on the matching of Abstract Syntax Trees (AST), as well as on the comparison of arrays of specific software metrics, or on the matching of the character strings composing the code have been presented and experimented with. In the Internet era, web application are good candidates to clone proliferation, because of the lack of suitable reuse and delegation mechanisms in the languages generally used for implementing them. Moreover, this trend is reinforced by the hurried and unstructured approaches typically used for developing and maintaining web software. In this paper, we propose an approach for detecting clones in web sites or WAs. The approach has been obtained by tailoring the existing clone analysis methods in order to take into account the specific features of a WA. The approach addresses the detection of clones of static pages implemented in HTML language: two HTML pages will be considered clones if they have the same predefined

2 structural components (or properties), such as the components defining the final rendering of the page in a browser, or the components defining the processing of the application (like scripts, applets, modules, etc.). Moreover, two pages can be considered clones also if they are characterized by the same values of predefined metrics. In order to efficiently address the detection of cloned pages, the technique we propose takes into account only a limited set of components implementing relevant structural features of a page, but this limitation, however, does not affect the effectiveness of the approach. These elements are involved in the computation of a distance measure between web pages that can be used to determine the similarity degree of the pages. The validity of the proposed technique has been assessed by means of experiments involving several web sites and WAs. The experimental results showed that the approach adequately detects cloned pages. In order to carry out the experiments, a prototype tool has been developed that automatically obtains the distance between pages. The remaining part of the paper is structured as follows: Section 2 provides a short background in clone analysis, while Section 3 presents our approach to clone analysis. The experiment carried out to assess the approach is described in Section 4, and conclusive remarks are given in Section Background Clone analysis is the research area that investigates methods and techniques for automatically detecting duplicated portions of code, or portions of similar code, in software artifacts. These portions of code are usually called clones. The research interest in this area was born at the end of the 80s [Ber84] [Hor90] [Jan88] [Gri81] and focused on the definition of methods and techniques for identifying replicated code portions in procedural software systems. Clone detection could be performed to support different activities, such as recovering the reusable functional abstractions implemented by the clones to reengineer the system with more generic components, or correcting software bugs in each cloned fragment. A clone, usually produced by copying and eventually modifying a piece of code implementing a well defined concept, a data structure, or a processing item, can be generated for several reasons such as: lack of a good modular design not allowing an effective reuse of a piece of code implementing a common service; use of programming languages not providing suitable reuse mechanisms; pressing performance requirements not allowing the use of delegation and function call mechanisms; undisciplined maintenance interventions producing replications of already existing code. The methods and techniques for clone analysis described in the literature focus either on the identification of clones that consist of exactly matching code portions (exact match) [Bak95,Bak93,Bak95b], either on the identification of clones that consist of code portions that coincide, provided that the names of the involved variables and constants are systematically substituted (p-match or parameterized match). The approach to clone detection proposed in [Bal00] and [Bal99] exploits the Dynamic Pattern Matching algorithm [Kon96][Kon95] that computes the Levenstein distance between fragments of code: each fragment is represented by a sequence of tokens and two fragments are considered clones if their Levenstein distance value is under a given threshold. The approach described in [Bax98] exploits the concept of near miss clone, that is a fragment of code that partially coincides with another one. Ducasse and Reiger propose an approach to clone detection that is independent of the coding language used for implementing the subject systems [Duc99]. Further approaches, such as the ones proposed in [Kon97][Lag97] [May96] [Pat99], exploit software metrics concerning the code control-flow or data-flow. In the Internet era, web sites and web application are good candidates to clone proliferation, because of the lack of suitable reuse and delegation mechanisms in the languages generally used for implementing them 1. At the moment, a considerable growth of the size of web sites and WAs can be observed, and the necessity of effectively maintaining these applications is spreading fast [Ric00] [War99]. Therefore, the effectiveness of traditional clone analysis techniques in the context of WAs should be assessed, and suitable approaches for tailoring these techniques in the renewed context should be investigated. Clones can be looked for in web software with different aims, such as for gathering information suitable to support its maintenance, migration towards a dynamic architecture, and also to cluster similar/identical structures, facilitating the process of separating the content from the user interface (that may be a PC browser, a PDA, a WAP phone, etc.). One of the difficulties in analyzing clones in web software derives from the wide set of technologies available for implementing web sites and WAs, that makes it harder the choice of the replicated software components to be 1 In general, a web site may be thought of as a static site that may sometimes display dynamic information. In contrast, a WA provides the Web user with a means to modify the site status (e.g. by adding/ updating information to the site).

3 looked for. Web sites and WAs include both static pages (e.g., HTML pages saved in a file and always offering the same information and layout to a client system) and dynamic pages (e.g., pages whose content and layout is not permanently saved in a file, but is dynamically generated). Therefore, the concept of clone may involve either the static pages or the dynamic ones. Web pages include a control component (e.g., the set of items determining the page layout, business rule processing, and event management) and a data component (e.g., the information to be read/displayed from/to a user). Therefore, the clones to be detected may involve either the control or the data component of a page. Since the control and the data component of a dynamic page depend on the sequence of events that occurred at runtime, searching for clones in these pages should involve dynamic analysis techniques. Vice-versa, the structure of a static page is predefined in the file that implements it, and clone detection can be carried out by statically analyzing the file. In this paper, we focus on techniques for detecting clones among web static pages. In particular, a clone will be thought of as an HTML page that includes the same set of tags of another page, since a tag is the means used for determining the control component in a static page. In the paper, among the various approaches proposed in the literature for clone analysis, the technique based on the Levenstein distance will be analyzed. Moreover, a frequency based approach will be proposed, and the validity and effectiveness of both approaches will be discussed. 3. An approach to clone analysis for web systems 3.1 The Levenstein distance The comparison of strings is used with similar aims in several fields, such as molecular biology, speech recognition, and code theory. One of the most important models for string comparison is the edit distance model, based on the notion of edit operation proposed in the 1972 [Ula72]. An edit operation consists of a set of rules that transform a character from a source string in a new character in a target string. The alignment of two strings is a sequence of edit operations that transforms the former string into the latter one. A cost function can be used to associate each edit operation with a cost, and the cost of an alignment is the sum of the costs of the edit operations it includes. The concepts of optimum alignment and longest common subsequence are related with the definition of Levenstein distance too. The edit distance can be defined as the minimum cost required to align two strings; an alignment is optimum if its cost coincides with the minimum cost, that is the edit distance. If we consider a unitary cost function (e.g., a cost function that associates each edit operation with unitary cost), the edit distance can be defined as the unit edit distance. The unit edit distance is also called Levenstein distance: the Levenstein distance D(x, y) of two strings x and y is the minimum number of insert, replacement or delete operations required to transform x into y. Moreover, a subsequence of a string (i.e. a substring) consists of each string obtainable by deleting zero or more characters from the string. A common subsequence of two strings is a sub-string that is contained in both strings, while the longest common subsequence of two strings is the common longest sub-string in both ones. As an example, given the strings informatics and systematics, the longest common subsequence is the string matics, while the Levenstein distance of the strings is 10. i n f o r s y m a t i c s s t e m a t i c s 3.2 Detecting cloned pages by the Levenstein distance The computation of the Levenstein distance requires that an alphabet of distinct symbols is preliminary defined. In order to define this alphabet, the items implementing the relevant features of a static web page must be identified. Since our approach focuses on the degree of similarity of the control components of two static pages, disregarding the data components, a candidate alphabet will include the set of HTML tags implementing the control component of a page. In this way, a string composed of all the HTML tags in the page will be extracted from each web page and the Levenstein distance between couples of these strings will be used to compare couples of pages. Since the Levenstein distance represents the minimum number of insert or delete operations required to transform a first string into a second string, its value expresses the degree of similarity of two static pages. In particular, if the distance value is 0, the pages will be cloned pages, while if the distance is greater than 0, but less than a sufficiently small threshold, the pages are candidate to be near missing clones. In order to improve the effectiveness of the approach, the risk of detecting misleading similarities between pages, or the risk of not detecting meaningful similarities have to be minimized. The first type of risk, for instance, may depend on the approach used to manage the set of attributes that characterize each tag. In fact, in HTML the same sequences of attributes can refer to different tags, and their detection may produce false positives if they were not linked to the

4 HTML files and Α 2 alphabet Α * alphabet of tags Tag extraction and composite tag substitution Strings of Α 2 symbols Elimination of symbols not belonging to Α * Strings of Α * symbols Levenstein distance computation Distance matrix Figure 1: The process of cloned page detection correct tag. The second type of risk is connected both with the problem of the composite tags, that are sequences of tags providing a result equivalent to another single tag, and, finally, with the categories of tags that influence only the format of the data, like tags for text formatting, font selection and for inserting hyper-textual links. These problems can be solved by refining the preliminary alphabet including all the HTML tags, and substituting each composite tag in the alphabet with its equivalent tag: the resulting alphabet will be called Α 2. The set of tags that establish the data formatting will be eliminated and a new refined alphabet Α * will be obtained. Α * will include the set of tag attributes too, provided that they are correctly associated with the tag they belong to. The detection of cloned static pages will be therefore carried out according to the process described in Figure 1. In the first phase, the HTML files are parsed, their tags are extracted and the composite tags are substituted with their equivalent ones. The resulting strings will be composed of symbols from the Α 2 alphabet. These strings will be processed in order to eliminate the symbols that do not belong to the Α * alphabet. These final strings will be submitted to the computation of the Levenstein distance: the Distance matrix will finally include the distance between each couple of analyzed strings. 3.3 Detecting cloned pages with a frequency based method The method based on the Levenstein distance is in general very expensive from a computational point of view: in fact, in order to determine an edit distance, all the possible alignments between strings should be evaluated, until the optimal alignment is determined. The computational complexity of the algorithm for computing the Levenstein distance is in fact O (n 2 ) where n is the length of the longer string. A frequency based method to detect clones in web systems has been investigated too. The method requires that each HTML page is associated with an array whose components represent the frequencies (i.e., the occurrences) of each HTML tag in the page. The dimension of the array coincides with the number of considered HTML tags, and the i-th component of the array will provide the occurrence of the i-th tag in the associated page. Given the arrays associated with each page, a distance function in a vectorial space can be defined, such as the linear distance or the Euclidean distance. Exact cloned pages will be represented by vectors having a zero distance, since they are characterized by the same frequency of each tag, while similar pages will be represented by vectors with a small distance. Of course this method may produce false positives, since even completely different pages may exhibit the same frequencies but not the same sequence of tags, especially when the pages have a small size or use a limited number of tags. However, the lower precision of this method is counterbalanced by its computational cost, that is lower than the Levenstein distance one. 4. A case study A number of Web systems have been submitted to clone analysis using the proposed approaches, with the aim of assessing their feasibility and effectiveness. A prototype tool that parses the files, extracts the tags, produces the strings and automatically computes the distances between the pages has been developed to support the experiments. This section provides the results of a case study involving a WA implementing a juridical laboratory with the aim of supporting the job of professional lawyers. The WA includes 201 files distributed in 19 directories and

5 Table 1: The HTML files analyzed in the case study File ID File Name KB 1 \index.htm \Specialisti\MainFrame.htm \Specialisti\Specialisti.htm \Specialisti\Text.htm \Specialisti\Title.htm \Novita\Brugaletta.htm \Novita\CalendarioTarNA.htm \Novita\CalendarioTarSA.htm \Novita\MainFrame.htm \Novita\Novita.htm \Novita\RivisteConsOrdAvvSa.htm \Novita\Text.htm \Novita\Title.htm \Forum\Forum.htm \Forum\MainFrame.htm \Forum\Text.htm \Forum\Title.htm \Common\FrameLeftPulsanti.htm \Common\bottomFrame.htm \ChiSiamo\ChiSiamo.htm \ChiSiamo\MainFrame.htm \ChiSiamo\Text.htm \ChiSiamo\Title.htm \Cerca\Cerca.htm \Cerca\MainFrame.htm \Cerca\Text.htm \Cerca\Title.htm \Caso\Caso.htm \Caso\MainFrame.htm \Caso\Text.htm \Caso\Title.htm \Caso\Testi\Autovelox.htm \Caso\Testi\Corruzione_Identificazione_atto.htm \Caso\Testi\Danno_biologico.htm \Caso\Testi\Mobbing.htm \Caso\Testi\Mobbing_nel_pubblico_impiego.htm \Caso\Testi\Occupazione.htm \Caso\Testi\Oltraggio.htm \Caso\Testi\Parentelemafiose.html \Caso\Testi\Problematica_beni_confiscati.htm \Caso\Testi\Professioni_intellettuali.htm \Caso\Testi\Relazione_attivita_commissario.htm \Caso\Testi\Responsabilita_amministrativa.htm \Caso\Testi\Responsabilita_medica.htm \Caso\Testi\Responsabilita_medico.htm \Caso\Testi\Riflessioni- Omicidio_di_Peppino_Impastato.htm \Caso\Testi\Societa_miste.htm \Caso\Testi\Truffa_in_attivita_lavorativa.htm \Caso\Testi\Uso_beni_condominiali.htm \Caso\Testi\Misure_patrimoniali_nel_sistema.htm \Archivio\Archivio.htm \Archivio\MainFrame.htm \Archivio\Text.htm \Archivio\Title.htm Table 2: Couples of clones with null Levenstein distance (3,10) (3,14) (3,20) (3,24) (3,28) (3,51) (9,15) (9,21) (9,25) (9,29) (10,14) (10,20) (10,24) (10,28) (10,51) (13,17) (13,23) (13,27) (13,31) (13,54) (14,20) (14,24) (14,28) (14,51) (15,21) (15,25) (15,29) (17,23) (17,27) (17,31) (17,54) (20,24) (20,28) (20,51) (21,25) (21,29) (23,27) (23,31) (23,54) (24,28) (24,51) (25,29) (27,31) (27,54) (28,51) (31,54) Table 3: Clusters of clones Cluster A Cluster B Cluster C

6 Figure 2: A couple of cloned pages its overall size is 4,26 Mbytes. Its HTML static pages are implemented by 54 files with htm extension distributed in 10 directories, while 19 files with the asp extension and contained in 4 directories implement 19 server pages. The remaining files includes data or other objects, like images, logos, etc., to be displayed in the pages. The 54 HTML files have been submitted to the clone analysis according to the proposed approach. The name and the size of each analyzed file is listed in Table 1. The Levenstein distances between each couple of pages have been computed using the Α 2 alphabet, and the Distance matrix has been obtained. The Matrix included 46 couples of perfect cloned pages involving 18 distinct files. The couples of cloned pages are listed in the following Table 2, where each page is identified by the file ID shown in Table 1. Moreover, the Distance Matrix included 25 couples of pages with a very low distance that made them potential near missing clones. The 46 perfect couple of cloned pages have been visualized with a browser in order to validate the results of the analysis, and each couple actually implemented perfect clones. As an example, Figure 2 shows the rendered HTML pages corresponding to the couple of clones (10, 28). In similar way, the 25 couples of pages representing near missing clones have been visualized with the browser, and their relatively small differences confirmed that they could not be considered perfect clones. The 18 files implementing the 46 couples of perfect clones were further analyzed and they could be grouped into three different clusters of identical or very similar pages. Table 3 reports the three clusters of pages. The pages from the same cluster were actually very similar, and their differences were essentially due to the parametric components providing the information displayed in the pages. Their similarity was essentially due to the framebased structure of the application. In particular, the pages from the A cluster represented the roots of sub-trees of the web site all reachable from the home page of the application; all the pages of the B cluster were implemented by files with the same name Mainframe.htm, while the pages of the C cluster were all implemented by files with the same name Title.htm. Using the frequency based method, the same set of clones was obtained and no additional clone was detected. However, the second method produced more near missing clones than the Levenstein method. It is worthwhile noting that also in all the other experiments, involving other web systems, we carried out the frequency based method produced always the same set of clones detected by applying the Levenstein distance and no additional clones (i.e. false positives) were detected. Even if both the approaches detected the same set of clones, their computational costs were sensibly different. In particular, the computation of the Levenstein distance for all couples of pages required 2 hours and 50 minutes, while just 15 seconds were necessary for computing the frequency based distances (on a PC with a Pentium III 850 MHz processor). In order to reduce the computational complexity of the Levenstein method and the potential inaccuracy of the frequency based one, an opportunistic approach may be proposed. This approach will use the frequency based method for preliminarily identifying potential couples of clones, and apply the Levenstein method over these couples for detecting the actual clones and rejecting the false ones. 5. Conclusions In this paper an approach to clone analysis in the context of web systems has been proposed. Clone detection allows to highlight reuse of pattern of HTML tags (i.e., recurrent structures among pages,

7 implemented by specific sequences of HTML tags), provides an approach to facilitate web software maintenance, and the migration to a model where the content is separated from the presentation. Moreover, identifying clones facilitates the testing process of a WA, since it is possible to partition the pages in equivalence classes, and specify a suitable number of test-cases accordingly. Two methods for clone analysis have been defined and experimented with. We considered as clones the pages having the same control components, even if they differed for the data components. During the experiment, the proposed methods detected clones among static web pages, and a manual verification gave us confirmation about the methods effectiveness. The two proposed methods have produced results that are comparable but with different computational costs. Since the frequency based method produced, in all the experiments, always the same set of clones obtained by applying the Levenstein distance method, but with a very low computational cost, it could be an effective method for web static page clones detection. Future works will be devoted to further experimentation to better validate the proposed methods. Moreover, approaches based on the use of other suitable software web metrics to identify clones, as well as further approaches to identify clones among server pages, will be investigated. References [Bak93] Baker S. B., A theory of parametrized pattern matching: algorithms and applications, in Proceedings of the 25 th Annual ACM Symposium on Theory of Computing, 71-80, May [Bak95] Baker B. S., On finding duplication and near duplication in large software systems, in Proc. of the 2 nd Working Conference on Reverse Engineering, IEEE Computer Society Press, [Bak95b]Baker S. B., Parametrized pattern matching via Boyer- Moore algorithms, in Proceedings of Sixth Annual ACM- SIAM Symposium on Discrete Algorithms, , Jan [Bal00] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Advanced clone-analysis to support object-oriented system refactoring, in Seventh Working Conference on Reverse Engineering, , Nov [Bal99] Balazinska M., Merlo E., Dagenais M., Lagüe B., Kontogiannis K., Measuring clone based reengineering opportunities, in International Symposium on software metrics. METRICS 99. IEEE Computer Society Press, Nov [Bax98] Baxter I. D., Yahin A., Moura L., Sant Anna M., Bier L., Clone Detection Using Abstract Syntax Trees, in Proceedings of the International Conference on Software Maintenance, , IEEE Computer Society Press, [Ber84] Berghel H.L., Sallach D.L., Measurements of program similarity in identical task environments, SIGPLAN Notices, 9(8):65-76, Aug [Duc99] Ducasse S., Rieger M., Demeyer S., A Language Indipendent Approach for Detecting Duplicated Code, in Proceedings of the International Conference on Software Maintenance, , IEEE Computer Society Press, [Gri81] Grier S., A tool that detects plagiarism in PASCAL programs, in SIGSCE Bulletin, 13(1), [Hor90] Horwitz Susan, Identifying the semantics and textual differences between two versions of a program, in Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation, , Giugno [Jan88] Jankowitz H.T., Detecting plagiarism in student PASCAL programs, in Computer Journal, 31(1):1-8, [Kon96] Kontogiannis K., DeMori R., Merlo E., Galler M., Bernstein M., Pattern Matching for clone and concept detection, in Journal of Automated Software Engineering, 3:77-108, Mar [Kon95] Kontogiannis K., DeMori R., Bernstein M., Merlo E., Pattern Matching for Design Concept Localization, in Proc. of the 2 nd Working Conference on Reverse Engineering, IEEE Computer Society Press, [Kon97] Kontogiannis K., Evaluation Experiments on the Detection of Programming Patterns Using Software Metrics, in Proc. of the 4 th Working Conference on Reverse Engineering, 44-54, [Lag97] Lagüe B., Proulx D., Merlo E., Mayrand J., Hudepohl J., Assessing the benefits of incorporating function clone detection in a development process, in Proceedings of the International Conference on Software Maintenance 1997, , IEEE Computer Society Press, [May96] Mayrand J., Leblanc C., Merlo E., Experiment on the Automatic Detection of Function Clones in a Software System Using Metrics, in Proceedings of the International Conference on Software Maintenance, , IEEE Computer Society Press, [Pat99] Patenaude J.-F., Merlo E., Dagenais M., Lagüe B., Extending software quality assessment techniques to java systems, in Proceedings of the 7 th International Workshop on Program Comprehension IWPC 99, IEEE Computer Society Press, [Ric00] Ricca F., Tonella P., Web Analysis: Structure and Evolution, in Proceedings of the International Workshop on Web Site Evolution, 76-86, [Ula72] Ulam S.M., Some Combinatorial Problems Studied Experimentally on Computing Machines, in Zaremba S.K., Applications of Number Theory to Numerical Analysis, 1-3, Academic Press, [War99] Warren P., Boldyreff C., Munro M., The evolution of websites, in Proceedings of the International Workshop on Program Comprehension, , 1999.

Refactoring Support Based on Code Clone Analysis

Refactoring Support Based on Code Clone Analysis Refactoring Support Based on Code Clone Analysis Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1 and Katsuro Inoue 1 1 Graduate School of Information Science and Technology, Osaka University, Toyonaka,

More information

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones

Clone Detection using Textual and Metric Analysis to figure out all Types of Clones Detection using Textual and Metric Analysis to figure out all Types of s Kodhai.E 1, Perumal.A 2, and Kanmani.S 3 1 SMVEC, Dept. of Information Technology, Puducherry, India Email: kodhaiej@yahoo.co.in

More information

WARE: a tool for the Reverse Engineering of Web Applications

WARE: a tool for the Reverse Engineering of Web Applications WARE: a tool for the Reverse Engineering of Web Applications Anna Rita Fasolino G. A. Di Lucca, F. Pace, P. Tramontana, U. De Carlini Dipartimento di Informatica e Sistemistica University of Naples Federico

More information

On Refactoring for Open Source Java Program

On Refactoring for Open Source Java Program On Refactoring for Open Source Java Program Yoshiki Higo 1,Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 and Yoshio Kataoka 3 1 Graduate School of Information Science and Technology, Osaka University

More information

IDENTIFYING CLONED NAVIGATIONAL PATTERNS IN WEB APPLICATIONS

IDENTIFYING CLONED NAVIGATIONAL PATTERNS IN WEB APPLICATIONS Journal of Web Engineering,, Vol. 5, No.2 (2006) 150-174 Rinton Press IDENTIFYING CLONED NAVIGATIONAL PATTERNS IN WEB APPLICATIONS ANDREA DE LUCIA, RITA FRANCESE, GIUSEPPE SCANNIELLO, GENOVEFFA TORTORA

More information

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones

Keywords Code cloning, Clone detection, Software metrics, Potential clones, Clone pairs, Clone classes. Fig. 1 Code with clones Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Detection of Potential

More information

Web Application Testing in Fifteen Years of WSE

Web Application Testing in Fifteen Years of WSE Web Application Testing in Fifteen Years of WSE Anna Rita Fasolino Domenico Amalfitano Porfirio Tramontana Dipartimento di Ingegneria Elettrica e Tecnologie dell Informazione University of Naples Federico

More information

Recovering Interaction Design Patterns in Web Applications

Recovering Interaction Design Patterns in Web Applications Recovering Interaction Design Patterns in Web Applications P. Tramontana A.R. Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy G.A. Di Lucca RCOST Research Centre

More information

FUNCTION CLONE DETECTION IN WEB APPLICATIONS: A SEMIAUTOMATED APPROACH

FUNCTION CLONE DETECTION IN WEB APPLICATIONS: A SEMIAUTOMATED APPROACH Journal of Web Engineering, Vol. 3, No.1 (2004) 003-021 Rinton Press FUNCTION CLONE DETECTION IN WEB APPLICATIONS: A SEMIAUTOMATED APPROACH FABIO CALEFATO, FILIPPO LANUBILE, TERESA MALLARDO Dipartimento

More information

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes

Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes Cross Language Higher Level Clone Detection- Between Two Different Object Oriented Programming Language Source Codes 1 K. Vidhya, 2 N. Sumathi, 3 D. Ramya, 1, 2 Assistant Professor 3 PG Student, Dept.

More information

A Novel Technique for Retrieving Source Code Duplication

A Novel Technique for Retrieving Source Code Duplication A Novel Technique for Retrieving Source Code Duplication Yoshihisa Udagawa Computer Science Department, Faculty of Engineering Tokyo Polytechnic University Atsugi-city, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp

More information

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price.

The goal of this project is to enhance the identification of code duplication which can result in high cost reductions for a minimal price. Code Duplication New Proposal Dolores Zage, Wayne Zage Ball State University June 1, 2017 July 31, 2018 Long Term Goals The goal of this project is to enhance the identification of code duplication which

More information

Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy

Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy A Technique for Reducing User Session Data Sets in Web Application Testing Porfirio Tramontana Anna Rita Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Giuseppe

More information

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo.

DCC / ICEx / UFMG. Software Code Clone. Eduardo Figueiredo. DCC / ICEx / UFMG Software Code Clone Eduardo Figueiredo http://www.dcc.ufmg.br/~figueiredo Code Clone Code Clone, also called Duplicated Code, is a well known code smell in software systems Code clones

More information

Tool Support for Refactoring Duplicated OO Code

Tool Support for Refactoring Duplicated OO Code Tool Support for Refactoring Duplicated OO Code Stéphane Ducasse and Matthias Rieger and Georges Golomingi Software Composition Group, Institut für Informatik (IAM) Universität Bern, Neubrückstrasse 10,

More information

Software Clone Detection. Kevin Tang Mar. 29, 2012

Software Clone Detection. Kevin Tang Mar. 29, 2012 Software Clone Detection Kevin Tang Mar. 29, 2012 Software Clone Detection Introduction Reasons for Code Duplication Drawbacks of Code Duplication Clone Definitions in the Literature Detection Techniques

More information

A Tree Kernel Based Approach for Clone Detection

A Tree Kernel Based Approach for Clone Detection A Tree Kernel Based Approach for Clone Detection Anna Corazza 1, Sergio Di Martino 1, Valerio Maggio 1, Giuseppe Scanniello 2 1) University of Naples Federico II 2) University of Basilicata Outline Background

More information

Beyond the Refactoring Browser: Advanced Tool Support for Software Refactoring

Beyond the Refactoring Browser: Advanced Tool Support for Software Refactoring Beyond the Refactoring Browser: Advanced Tool Support for Software Refactoring Tom Mens Tom Tourwé Francisca Muñoz Programming Technology Lab Vrije Universiteit Brussel Pleinlaan 2, 1050 Brussel, Belgium

More information

Token based clone detection using program slicing

Token based clone detection using program slicing Token based clone detection using program slicing Rajnish Kumar PEC University of Technology Rajnish_pawar90@yahoo.com Prof. Shilpa PEC University of Technology Shilpaverma.pec@gmail.com Abstract Software

More information

Code duplication in Software Systems: A Survey

Code duplication in Software Systems: A Survey Code duplication in Software Systems: A Survey G. Anil kumar 1 Dr. C.R.K.Reddy 2 Dr. A. Govardhan 3 A. Ratna Raju 4 1,4 MGIT, Dept. of Computer science, Hyderabad, India Email: anilgkumar@mgit.ac.in, ratnaraju@mgit.ac.in

More information

AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING

AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING AN ALGORITHM FOR TEST DATA SET REDUCTION FOR WEB APPLICATION TESTING A. Askarunisa, N. Ramaraj Abstract: Web Applications have become a critical component of the global information infrastructure, and

More information

Concept Analysis. Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy

Concept Analysis. Porfirio Tramontana Anna Rita Fasolino. Giuseppe A. Di Lucca. University of Sannio, Benevento, Italy Web Pages Classification using Concept Analysis Porfirio Tramontana Anna Rita Fasolino Dipartimento di Informatica e Sistemistica University of Naples Federico II, Italy Giuseppe A. Di Lucca RCOST Research

More information

On Refactoring Support Based on Code Clone Dependency Relation

On Refactoring Support Based on Code Clone Dependency Relation On Refactoring Support Based on Code Dependency Relation Norihiro Yoshida 1, Yoshiki Higo 1, Toshihiro Kamiya 2, Shinji Kusumoto 1, Katsuro Inoue 1 1 Graduate School of Information Science and Technology,

More information

A Top-Down Visual Approach to GUI development

A Top-Down Visual Approach to GUI development A Top-Down Visual Approach to GUI development ROSANNA CASSINO, GENNY TORTORA, MAURIZIO TUCCI, GIULIANA VITIELLO Dipartimento di Matematica e Informatica Università di Salerno Via Ponte don Melillo 84084

More information

Software Quality Analysis by Code Clones in Industrial Legacy Software

Software Quality Analysis by Code Clones in Industrial Legacy Software Software Quality Analysis by Code Clones in Industrial Legacy Software Akito Monden 1 Daikai Nakae 1 Toshihiro Kamiya 2 Shin-ichi Sato 1,3 Ken-ichi Matsumoto 1 1 Nara Institute of Science and Technology,

More information

Detection and Behavior Identification of Higher-Level Clones in Software

Detection and Behavior Identification of Higher-Level Clones in Software Detection and Behavior Identification of Higher-Level Clones in Software Swarupa S. Bongale, Prof. K. B. Manwade D. Y. Patil College of Engg. & Tech., Shivaji University Kolhapur, India Ashokrao Mane Group

More information

Rearranging the Order of Program Statements for Code Clone Detection

Rearranging the Order of Program Statements for Code Clone Detection Rearranging the Order of Program Statements for Code Clone Detection Yusuke Sabi, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University, Japan Email: {y-sabi,higo,kusumoto@ist.osaka-u.ac.jp

More information

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study Jadson Santos Department of Informatics and Applied Mathematics Federal University of Rio Grande do Norte, UFRN Natal,

More information

Toward a Taxonomy of Clones in Source Code: A Case Study

Toward a Taxonomy of Clones in Source Code: A Case Study Toward a Taxonomy of Clones in Source Code: A Case Study Cory Kapser and Michael W. Godfrey Software Architecture Group (SWAG) School of Computer Science, University of Waterloo fcjkapser, migodg@uwaterloo.ca

More information

Clone code detector using Boyer Moore string search algorithm integrated with ontology editor

Clone code detector using Boyer Moore string search algorithm integrated with ontology editor EUROPEAN ACADEMIC RESEARCH Vol. IV, Issue 2/ May 2016 ISSN 2286-4822 www.euacademic.org Impact Factor: 3.4546 (UIF) DRJI Value: 5.9 (B+) Clone code detector using Boyer Moore string search algorithm integrated

More information

COMPARISON AND EVALUATION ON METRICS

COMPARISON AND EVALUATION ON METRICS COMPARISON AND EVALUATION ON METRICS BASED APPROACH FOR DETECTING CODE CLONE D. Gayathri Devi 1 1 Department of Computer Science, Karpagam University, Coimbatore, Tamilnadu dgayadevi@gmail.com Abstract

More information

TESTBEDS Paris

TESTBEDS Paris TESTBEDS 2010 - Paris Rich Internet Application Testing Using Execution Trace Data Dipartimento di Informatica e Sistemistica Università di Napoli, Federico II Naples, Italy Domenico Amalfitano Anna Rita

More information

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique

A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique A Novel Ontology Metric Approach for Code Clone Detection Using FusionTechnique 1 Syed MohdFazalulHaque, 2 Dr. V Srikanth, 3 Dr. E. Sreenivasa Reddy 1 Maulana Azad National Urdu University, 2 Professor,

More information

Zjednodušení zdrojového kódu pomocí grafové struktury

Zjednodušení zdrojového kódu pomocí grafové struktury Zjednodušení zdrojového kódu pomocí grafové struktury Ing. Tomáš Bublík 1. Introduction Nowadays, there is lot of programming languages. These languages differ in syntax, usage, and processing. Keep in

More information

Software Clone Detection and Refactoring

Software Clone Detection and Refactoring Software Clone Detection and Refactoring Francesca Arcelli Fontana *, Marco Zanoni *, Andrea Ranchetti * and Davide Ranchetti * * University of Milano-Bicocca, Viale Sarca, 336, 20126 Milano, Italy, {arcelli,marco.zanoni}@disco.unimib.it,

More information

An Effective Approach for Detecting Code Clones

An Effective Approach for Detecting Code Clones An Effective Approach for Detecting Code Clones Girija Gupta #1, Indu Singh *2 # M.Tech Student( CSE) JCD College of Engineering, Affiliated to Guru Jambheshwar University,Hisar,India * Assistant Professor(

More information

A Study of Bad Smells in Code

A Study of Bad Smells in Code International Journal for Science and Emerging ISSN No. (Online):2250-3641 Technologies with Latest Trends 7(1): 16-20 (2013) ISSN No. (Print): 2277-8136 A Study of Bad Smells in Code Gurpreet Singh* and

More information

Visualization of Clone Detection Results

Visualization of Clone Detection Results Visualization of Clone Detection Results Robert Tairas and Jeff Gray Department of Computer and Information Sciences University of Alabama at Birmingham Birmingham, AL 5294-1170 1-205-94-221 {tairasr,

More information

Code Clone Analysis and Application

Code Clone Analysis and Application Code Clone Analysis and Application Katsuro Inoue Osaka University Talk Structure Clone Detection CCFinder and Associate Tools Applications Summary of Code Clone Analysis and Application Clone Detection

More information

Folding Repeated Instructions for Improving Token-based Code Clone Detection

Folding Repeated Instructions for Improving Token-based Code Clone Detection 2012 IEEE 12th International Working Conference on Source Code Analysis and Manipulation Folding Repeated Instructions for Improving Token-based Code Clone Detection Hiroaki Murakami, Keisuke Hotta, Yoshiki

More information

A Measurement of Similarity to Identify Identical Code Clones

A Measurement of Similarity to Identify Identical Code Clones The International Arab Journal of Information Technology, Vol. 12, No. 6A, 2015 735 A Measurement of Similarity to Identify Identical Code Clones Mythili ShanmughaSundaram and Sarala Subramani Department

More information

Meta-Heuristic Generation of Robust XPath Locators for Web Testing

Meta-Heuristic Generation of Robust XPath Locators for Web Testing Meta-Heuristic Generation of Robust XPath Locators for Web Testing Maurizio Leotta, Andrea Stocco, Filippo Ricca, Paolo Tonella Abstract: Test scripts used for web testing rely on DOM locators, often expressed

More information

Turning Web Applications into Web Services by Wrapping Techniques

Turning Web Applications into Web Services by Wrapping Techniques Turning Web Applications into Web Services by Wrapping Techniques Giusy Di Lorenzo Anna Rita Fasolino Lorenzo Melcarne Porfirio Tramontana Valeria Vittorini Dipartimento di Informatica e Sistemistica University

More information

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar,

To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, To Enhance Type 4 Clone Detection in Clone Testing Swati Sharma #1, Priyanka Mehta #2 1 M.Tech Scholar, 2 Head of Department, Department of Computer Science & Engineering, Universal Institute of Engineering

More information

A Database of Graphs for Isomorphism and Sub-Graph Isomorphism Benchmarking

A Database of Graphs for Isomorphism and Sub-Graph Isomorphism Benchmarking A Database of Graphs for Isomorphism and Sub-Graph Isomorphism Benchmarking P. Foggia, C. Sansone, M. Vento Dipartimento di Informatica e Sistemistica - Università di Napoli "Federico II" Via Claudio 21,

More information

Learning Probabilistic Ontologies with Distributed Parameter Learning

Learning Probabilistic Ontologies with Distributed Parameter Learning Learning Probabilistic Ontologies with Distributed Parameter Learning Giuseppe Cota 1, Riccardo Zese 1, Elena Bellodi 1, Fabrizio Riguzzi 2, and Evelina Lamma 1 1 Dipartimento di Ingegneria University

More information

An Approach for Reverse Engineering of Web-Based Applications

An Approach for Reverse Engineering of Web-Based Applications An Approach for Reverse Engineering of Web-Based Applications G.A. Di Lucca**, M. Di Penta*, G. Antoniol*, G. Casazza** dilucca@unina.it dipenta@unisannio.it antoniol@ieee.org gec @unisannio.it (*)University

More information

Evaluating the Evolution of a C Application

Evaluating the Evolution of a C Application Evaluating the Evolution of a C Application Elizabeth Burd, Malcolm Munro Liz.Burd@dur.ac.uk The Centre for Software Maintenance University of Durham South Road Durham, DH1 3LE, UK Abstract This paper

More information

Clone Detection Using Abstract Syntax Suffix Trees

Clone Detection Using Abstract Syntax Suffix Trees Clone Detection Using Abstract Syntax Suffix Trees Rainer Koschke, Raimar Falke, Pierre Frenzel University of Bremen, Germany http://www.informatik.uni-bremen.de/st/ {koschke,rfalke,saint}@informatik.uni-bremen.de

More information

Eliminating Duplication in Source Code via Procedure Extraction

Eliminating Duplication in Source Code via Procedure Extraction Eliminating Duplication in Source Code via Procedure Extraction Raghavan Komondoor raghavan@cs.wisc.edu Computer Sciences Department University of Wisconsin-Madison 1210 W. Dayton St, Madison, WI 53706

More information

MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS. By Angela Carabelli SSLMIT, Trieste

MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS. By Angela Carabelli SSLMIT, Trieste MULTIMEDIA TECHNOLOGIES FOR THE USE OF INTERPRETERS AND TRANSLATORS By SSLMIT, Trieste The availability of teaching materials for training interpreters and translators has always been an issue of unquestionable

More information

A UML-based Process Meta-Model Integrating a Rigorous Process Patterns Definition

A UML-based Process Meta-Model Integrating a Rigorous Process Patterns Definition A UML-based Process Meta-Model Integrating a Rigorous Process Patterns Definition Hanh Nhi Tran, Bernard Coulette, Bich Thuy Dong 2 University of Toulouse 2 -GRIMM 5 allées A. Machado F-3058 Toulouse,

More information

EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION

EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION UNIVERSITA DEGLI STUDI DI NAPOLI FEDERICO II Facoltà di Ingegneria EARLY PREDICTION OF HARDWARE COMPLEXITY IN HLL-TO-HDL TRANSLATION Alessandro Cilardo, Paolo Durante, Carmelo Lofiego, and Antonino Mazzeo

More information

Recovering Conceptual Models from Web Applications

Recovering Conceptual Models from Web Applications Recovering Conceptual Models from Web Applications Giuseppe Antonio Di Lucca Research Centre On Software Technology, University of Sannio Via Traiano, Palazzo ex Poste 82100 Benevento, Italy dilucca@unisannio.it

More information

THE BCS PROFESSIONAL EXAMINATION BCS Level 6 Professional Graduate Diploma in IT September 2017 EXAMINERS REPORT. Software Engineering 2

THE BCS PROFESSIONAL EXAMINATION BCS Level 6 Professional Graduate Diploma in IT September 2017 EXAMINERS REPORT. Software Engineering 2 General Comments THE BCS PROFESSIONAL EXAMINATION BCS Level 6 Professional Graduate Diploma in IT September 2017 EXAMINERS REPORT Software Engineering 2 The pass rate was 40% representing the lowest mark

More information

JSCTracker: A Semantic Clone Detection Tool for Java Code Rochelle Elva and Gary T. Leavens

JSCTracker: A Semantic Clone Detection Tool for Java Code Rochelle Elva and Gary T. Leavens JSCTracker: A Semantic Clone Detection Tool for Java Code Rochelle Elva and Gary T. Leavens CS-TR-12-04 March 2012 Keywords: semantic clone detection, input-output behavior, effects, IOE behavior, Java

More information

Impact of Dependency Graph in Software Testing

Impact of Dependency Graph in Software Testing Impact of Dependency Graph in Software Testing Pardeep Kaur 1, Er. Rupinder Singh 2 1 Computer Science Department, Chandigarh University, Gharuan, Punjab 2 Assistant Professor, Computer Science Department,

More information

Digital Archives: Extending the 5S model through NESTOR

Digital Archives: Extending the 5S model through NESTOR Digital Archives: Extending the 5S model through NESTOR Nicola Ferro and Gianmaria Silvello Department of Information Engineering, University of Padua, Italy {ferro, silvello}@dei.unipd.it Abstract. Archives

More information

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha

Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha 2nd International Conference on Advances in Mechanical Engineering and Industrial Informatics (AMEII 2016) Research and implementation of search engine based on Lucene Wan Pu, Wang Lisha Physics Institute,

More information

Relation of Code Clones and Change Couplings

Relation of Code Clones and Change Couplings Relation of Code Clones and Change Couplings Reto Geiger, Beat Fluri, Harald C. Gall, and Martin Pinzger s.e.a.l. software evolution and architecture lab, Department of Informatics, University of Zurich,

More information

A Review on Identifying the Main Content From Web Pages

A Review on Identifying the Main Content From Web Pages A Review on Identifying the Main Content From Web Pages Madhura R. Kaddu 1, Dr. R. B. Kulkarni 2 1, 2 Department of Computer Scienece and Engineering, Walchand Institute of Technology, Solapur University,

More information

A Software Architecture for Progressive Scanning of On-line Communities

A Software Architecture for Progressive Scanning of On-line Communities A Software Architecture for Progressive Scanning of On-line Communities Roberto Baldoni, Fabrizio d Amore, Massimo Mecella, Daniele Ucci Sapienza Università di Roma, Italy Motivations On-line communities

More information

Reusing Reused Code II. CODE SUGGESTION ARCHITECTURE. A. Overview

Reusing Reused Code II. CODE SUGGESTION ARCHITECTURE. A. Overview Reusing Reused Tomoya Ishihara, Keisuke Hotta, Yoshiki Higo, Shinji Kusumoto Graduate School of Information Science and Technology, Osaka University 1-5, Yamadaoka, Suita, Osaka, 565-0871, Japan {t-ishihr,

More information

TopicViewer: Evaluating Remodularizations Using Semantic Clustering

TopicViewer: Evaluating Remodularizations Using Semantic Clustering TopicViewer: Evaluating Remodularizations Using Semantic Clustering Gustavo Jansen de S. Santos 1, Katyusco de F. Santos 2, Marco Tulio Valente 1, Dalton D. S. Guerrero 3, Nicolas Anquetil 4 1 Federal

More information

Using Slicing to Identify Duplication in Source Code

Using Slicing to Identify Duplication in Source Code Using Slicing to Identify Duplication in Source Code Raghavan Komondoor Computer Sciences Department University of Wisconsin-Madison Madison, WI 53706 USA raghavan@cs.wisc.edu Susan Horwitz Computer Sciences

More information

Annotated Suffix Trees for Text Clustering

Annotated Suffix Trees for Text Clustering Annotated Suffix Trees for Text Clustering Ekaterina Chernyak and Dmitry Ilvovsky National Research University Higher School of Economics Moscow, Russia echernyak,dilvovsky@hse.ru Abstract. In this paper

More information

Detection of Non Continguous Clones in Software using Program Slicing

Detection of Non Continguous Clones in Software using Program Slicing Detection of Non Continguous Clones in Software using Program Slicing Er. Richa Grover 1 Er. Narender Rana 2 M.Tech in CSE 1 Astt. Proff. In C.S.E 2 GITM, Kurukshetra University, INDIA Abstract Code duplication

More information

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS

EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS EXTRACTION OF REUSABLE COMPONENTS FROM LEGACY SYSTEMS Moon-Soo Lee, Yeon-June Choi, Min-Jeong Kim, Oh-Chun, Kwon Telematics S/W Platform Team, Telematics Research Division Electronics and Telecommunications

More information

Detecting Code Similarity Using Patterns. K. Kontogiannis M. Galler R. DeMori. McGill University

Detecting Code Similarity Using Patterns. K. Kontogiannis M. Galler R. DeMori. McGill University 1 Detecting Code Similarity Using atterns K. Kontogiannis M. Galler R. DeMori McGill University 3480 University St., Room 318, Montreal, Canada H3A 2A7 Abstract Akey issue in design recovery is to localize

More information

Hermion - Exploiting the Dynamics of Software

Hermion - Exploiting the Dynamics of Software Hermion - Exploiting the Dynamics of Software Authors: David Röthlisberger, Orla Greevy, and Oscar Nierstrasz Affiliation: Software Composition Group, University of Bern, Switzerland Homepage: http://scg.iam.unibe.ch/research/hermion

More information

Device Independent Principles for Adapted Content Delivery

Device Independent Principles for Adapted Content Delivery Device Independent Principles for Adapted Content Delivery Tayeb Lemlouma 1 and Nabil Layaïda 2 OPERA Project Zirst 655 Avenue de l Europe - 38330 Montbonnot, Saint Martin, France Tel: +33 4 7661 5281

More information

An Improved Algorithm for Matching Large Graphs

An Improved Algorithm for Matching Large Graphs An Improved Algorithm for Matching Large Graphs L. P. Cordella, P. Foggia, C. Sansone, M. Vento Dipartimento di Informatica e Sistemistica Università degli Studi di Napoli Federico II Via Claudio, 2 8025

More information

Inheritance Metrics: What do they Measure?

Inheritance Metrics: What do they Measure? Inheritance Metrics: What do they Measure? G. Sri Krishna and Rushikesh K. Joshi Department of Computer Science and Engineering Indian Institute of Technology Bombay Mumbai, 400 076, India Email:{srikrishna,rkj}@cse.iitb.ac.in

More information

Measuring Web Service Interfaces

Measuring Web Service Interfaces Measuring Web Service Interfaces Harry M. Sneed ANECON GmbH, Vienna Harry.Sneed@t-online.de Abstract The following short paper describes a tool supported method for measuring web service interfaces. The

More information

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY

SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY Yoshihisa Udagawa Faculty of Engineering, Tokyo Polytechnic University, Atsugi City, Kanagawa, Japan udagawa@cs.t-kougei.ac.jp ABSTRACT Duplicate code

More information

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data

e-ccc-biclustering: Related work on biclustering algorithms for time series gene expression data : Related work on biclustering algorithms for time series gene expression data Sara C. Madeira 1,2,3, Arlindo L. Oliveira 1,2 1 Knowledge Discovery and Bioinformatics (KDBIO) group, INESC-ID, Lisbon, Portugal

More information

Software Clone Detection Using Cosine Distance Similarity

Software Clone Detection Using Cosine Distance Similarity Software Clone Detection Using Cosine Distance Similarity A Dissertation SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE & ENGINEERING

More information

Evaluating Software Maintenance Cost Using Functional Redundancy Metrics

Evaluating Software Maintenance Cost Using Functional Redundancy Metrics 1 Evaluating Software Maintenance Cost Using Functional Redundancy Metrics Takeo Imai, Yoshio Kataoka, Tetsuji Fukaya System Engineering Laboratory Corporate Research & Development Center Toshiba Corp.

More information

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code

Keywords Clone detection, metrics computation, hybrid approach, complexity, byte code Volume 3, Issue 5, May 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com An Emerging Approach

More information

Clone Detection via Structural Abstraction

Clone Detection via Structural Abstraction Clone Detection via Structural Abstraction William S. Evans will@cs.ubc.ca Christopher W. Fraser cwfraser@gmail.com Fei Ma Fei.Ma@microsoft.com Abstract This paper describes the design, implementation,

More information

Visual Detection of Duplicated Code

Visual Detection of Duplicated Code Visual Detection of Duplicated Code Matthias Rieger, Stéphane Ducasse Software Composition Group, University of Berne ducasse,rieger@iam.unibe.ch http://www.iam.unibe.ch/scg/ Abstract Code duplication

More information

SHOTGUN SURGERY DESIGN FLAW DETECTION. A CASE-STUDY

SHOTGUN SURGERY DESIGN FLAW DETECTION. A CASE-STUDY STUDIA UNIV. BABEŞ BOLYAI, INFORMATICA, Volume LVIII, Number 4, 2013 SHOTGUN SURGERY DESIGN FLAW DETECTION. A CASE-STUDY CAMELIA ŞERBAN Abstract. Due to the complexity of object oriented design, its assessment

More information

Sparse Dynamic Programming for Longest Common Subsequence from Fragments 1

Sparse Dynamic Programming for Longest Common Subsequence from Fragments 1 Journal of Algorithms 42, 231 254 (2002) doi:10.1006/jagm.2002.1214, available online at http://www.idealibrary.com on Sparse Dynamic Programming for Longest Common Subsequence from Fragments 1 Brenda

More information

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE

IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE International Journal of Software Engineering & Applications (IJSEA), Vol.9, No.5, September 2018 IDENTIFICATION OF PROMOTED ECLIPSE UNSTABLE INTERFACES USING CLONE DETECTION TECHNIQUE Simon Kawuma 1 and

More information

Design concepts for data-intensive applications

Design concepts for data-intensive applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Design concepts for data-intensive applications Attila Adamkó Department of Information Technology, Institute of

More information

VIFOR 2: A Tool for Browsing and Documentation

VIFOR 2: A Tool for Browsing and Documentation VIFOR 2: A Tool for Browsing and Documentation Vaclav Rajlich, Sridhar Reddy Adnapally Department of Computer Science Wayne State University Detroit, MI 48202, USA rajlich@ c s. w ayne.edu Abstract. During

More information

Identifying Similar Code with Program Dependence Graphs

Identifying Similar Code with Program Dependence Graphs Identifying Similar Code with Program Dependence Graphs Jens Krinke Lehrstuhl S oftw aresy steme Universitat Passau Passau, Germany Abstract We present an approach to identify similar code in programs

More information

SOFTWARE ENGINEERING SOFTWARE EVOLUTION. Saulius Ragaišis.

SOFTWARE ENGINEERING SOFTWARE EVOLUTION. Saulius Ragaišis. SOFTWARE ENGINEERING SOFTWARE EVOLUTION Saulius Ragaišis saulius.ragaisis@mif.vu.lt CSC2008 SE Software Evolution Learning Objectives: Identify the principal issues associated with software evolution and

More information

A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications

A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications A Crawljax Based Approach to Exploit Traditional Accessibility Evaluation Tools for AJAX Applications F. Ferrucci 1, F. Sarro 1, D. Ronca 1, S. Abrahao 2 Abstract In this paper, we present a Crawljax based

More information

An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance

An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance An Exploratory Analysis of Semantic Network Complexity for Data Modeling Performance Abstract Aik Huang Lee and Hock Chuan Chan National University of Singapore Database modeling performance varies across

More information

Incremental Clone Detection and Elimination for Erlang Programs

Incremental Clone Detection and Elimination for Erlang Programs Incremental Clone Detection and Elimination for Erlang Programs Huiqing Li and Simon Thompson School of Computing, University of Kent, UK {H.Li, S.J.Thompson}@kent.ac.uk Abstract. A well-known bad code

More information

Conceptual Model for a Software Maintenance Environment

Conceptual Model for a Software Maintenance Environment Conceptual Model for a Software Environment Miriam. A. M. Capretz Software Engineering Lab School of Computer Science & Engineering University of Aizu Aizu-Wakamatsu City Fukushima, 965-80 Japan phone:

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

Tools for Remote Web Usability Evaluation

Tools for Remote Web Usability Evaluation Tools for Remote Web Usability Evaluation Fabio Paternò ISTI-CNR Via G.Moruzzi, 1 56100 Pisa - Italy f.paterno@cnuce.cnr.it Abstract The dissemination of Web applications is enormous and still growing.

More information

A Language Independent Approach for Detecting Duplicated Code

A Language Independent Approach for Detecting Duplicated Code A Language Independent Approach for Detecting Duplicated Code Stéphane Ducasse, Matthias Rieger, Serge Demeyer Software Composition Group, University of Berne ducasse,rieger,demeyer@iam.unibe.ch http://www.iam.unibe.ch/scg/

More information

A Technique to Detect Multi-grained Code Clones

A Technique to Detect Multi-grained Code Clones Detection Time The Number of Detectable Clones A Technique to Detect Multi-grained Code Clones Yusuke Yuki, Yoshiki Higo, and Shinji Kusumoto Graduate School of Information Science and Technology, Osaka

More information

Insights into System Wide Code Duplication WCRE 2004

Insights into System Wide Code Duplication WCRE 2004 Insights into System Wide Code Duplication Matthias Rieger, Stéphane Ducasse, and Michele Lanza Software Composition Group University of Bern, Switzerland {rieger,ducasse,lanza}@iam.unibe.ch WCRE 2004

More information

Investigation of Metrics for Object-Oriented Design Logical Stability

Investigation of Metrics for Object-Oriented Design Logical Stability Investigation of Metrics for Object-Oriented Design Logical Stability Mahmoud O. Elish Department of Computer Science George Mason University Fairfax, VA 22030-4400, USA melish@gmu.edu Abstract As changes

More information

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments

A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments 1 A Load Balancing Fault-Tolerant Algorithm for Heterogeneous Cluster Environments E. M. Karanikolaou and M. P. Bekakos Laboratory of Digital Systems, Department of Electrical and Computer Engineering,

More information

HOW AND WHEN TO FLATTEN JAVA CLASSES?

HOW AND WHEN TO FLATTEN JAVA CLASSES? HOW AND WHEN TO FLATTEN JAVA CLASSES? Jehad Al Dallal Department of Information Science, P.O. Box 5969, Safat 13060, Kuwait ABSTRACT Improving modularity and reusability are two key objectives in object-oriented

More information