XSCoRe: A Program Comprehension Workbench

Similar documents
International Journal for Management Science And Technology (IJMST)

An UML-XML-RDB Model Mapping Solution for Facilitating Information Standardization and Sharing in Construction Industry

XML-based Programming Language Modeling: An Approach to Software Engineering

XML: Extensible Markup Language

A tutorial report for SENG Agent Based Software Engineering. Course Instructor: Dr. Behrouz H. Far. XML Tutorial.

Eclipse Support for Using Eli and Teaching Programming Languages

COMP9321 Web Application Engineering

Chapter 3. Architecture and Design

Specification and Automated Detection of Code Smells using OCL

An Eclipse Plug-In for Generating Database Access Documentation in Java Code

Software Architecture Recovery based on Dynamic Analysis

Documenting Java Database Access with Type Annotations

Evolution of XML Applications

Introduction to XML. XML: basic elements

Inyección de Dependencias en el Lenguaje de Programación Go

M359 Block5 - Lecture12 Eng/ Waleed Omar

Towards the re-usability of software metric definitions at the meta level

The XML Metalanguage

An Implementation of the Behavior Annex in the AADL-toolset Osate2

Chapter 13 XML: Extensible Markup Language

extensible Markup Language

COMP9321 Web Application Engineering

Introduction to XML. Asst. Prof. Dr. Kanda Runapongsa Saikaew Dept. of Computer Engineering Khon Kaen University

Model driven Engineering & Model driven Architecture

Turning a Suite of Modeling and Processing Tools Into a Production Grade System

Introduction to XML 3/14/12. Introduction to XML

Automation of Semantic Web based Digital Library using Unified Modeling Language Minal Bhise 1 1

ADT: Eclipse development tools for ATL

An Annotation Tool for Semantic Documents

Executive Summary. Round Trip Engineering of Space Systems. Change Log. Executive Summary. Visas

INCORPORATING ADVANCED PROGRAMMING TECHNIQUES IN THE COMPUTER INFORMATION SYSTEMS CURRICULUM

Delivery Options: Attend face-to-face in the classroom or remote-live attendance.

Evaluating OO-CASE tools: OO research meets practice

An improved XML syntax for the Java programming language

COMP9321 Web Application Engineering

Standard Business Rules Language: why and how? ICAI 06

UML-Based Conceptual Modeling of Pattern-Bases

Device Independent Principles for Adapted Content Delivery

Heading-Based Sectional Hierarchy Identification for HTML Documents

Navigating Large Source Files Using a Fisheye View

Representing Software Traceability using UML and XTM with an investigation into Traceability Patterns

APPLYING OBJECT-ORIENTATION AND ASPECT-ORIENTATION IN TEACHING DOMAIN-SPECIFIC LANGUAGE IMPLEMENTATION *

XML Markup Languages Framework for Programming in 21 st Century towards Managed Software Engineering

SDMX self-learning package No. 3 Student book. SDMX-ML Messages

An XML-based Framework for Language Neutral Program Representation and Generic Analysis

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Document-Centric Computing

Generalized Document Data Model for Integrating Autonomous Applications

11. EXTENSIBLE MARKUP LANGUAGE (XML)

3rd Lecture Languages for information modeling

XML Technologies Dissected Erik Wilde Swiss Federal Institute of Technology, Zürich

A SHIFT OF ACCENT IN INTRODUCTORY OBJECT ORIENTED PROGRAMMING COURSES. Eugeni Gentchev*, Claudia Roda**

PECULIARITIES OF LINKED DATA PROCESSING IN SEMANTIC APPLICATIONS. Sergey Shcherbak, Ilona Galushka, Sergey Soloshich, Valeriy Zavgorodniy

Knowledge Discovery: How to Reverse-Engineer Legacy Systems

COMP9321 Web Application Engineering

Summary of Bird and Simons Best Practices

Metamorphosis An Environment to Achieve Semantic Interoperability with Topic Maps

Automatized Generating of GUIs for Domain-Specific Languages

White Paper on RFP II: Abstract Syntax Tree Meta-Model

ITM DEVELOPMENT (ITMD)

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 27-1

Database Systems: Design, Implementation, and Management Tenth Edition. Chapter 14 Database Connectivity and Web Technologies

Use of XML Schema and XML Query for ENVISAT product data handling

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

A MAS Based ETL Approach for Complex Data

Part VII. Querying XML The XQuery Data Model. Marc H. Scholl (DBIS, Uni KN) XML and Databases Winter 2005/06 153

x ide xml Integrated Development Environment Specifications Document 1 Project Description 2 Specifi fications

CSI 3140 WWW Structures, Techniques and Standards. Representing Web Data: XML

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

Xerces Http Apache Org Xml Features Validation Schema Full Checking

Data Exchange. Hyper-Text Markup Language. Contents: HTML Sample. HTML Motivation. Cascading Style Sheets (CSS) Problems w/html

KNSP: A Kweelt - Niagara based Quilt Processor Inside Cocoon over Apache

code pattern analysis of object-oriented programming languages

Semi-Formal, not Semi-Realistic: A New Approach to Describing Software Components

HERA: Automatically Generating Hypermedia Front- Ends for Ad Hoc Data from Heterogeneous and Legacy Information Systems

COMP9321 Web Application Engineering. Extensible Markup Language (XML)

7.1 Introduction. extensible Markup Language Developed from SGML A meta-markup language Deficiencies of HTML and SGML

Parser Design. Neil Mitchell. June 25, 2004

An Approach To Web Content Mining

Copyright 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Chapter 7 XML

ANX-PR/CL/ LEARNING GUIDE

SRI VIDYA COLLEGE OF ENGINEERING & TECHNOLOGY- VIRUDHUNAGAR

IBM WebSphere Message Broker V8 Application Development I

Relating Meta modelling and Concrete Textual Syntax

Computation Independent Model (CIM): Platform Independent Model (PIM): Platform Specific Model (PSM): Implementation Specific Model (ISM):

GENERATING RESTRICTION RULES AUTOMATICALLY WITH AN INFORMATION SYSTEM

Transformational Abstraction for Java (TAJ)

SCOS-2000 Technical Note

XML: the document format of the future?

EBS goes social - The triumvirate Liferay, Application Express and EBS

AGORA: a Layered Architecture for Cooperative Work Environments

Comp 336/436 - Markup Languages. Fall Semester Week 4. Dr Nick Hayward

Ingegneria del Software Corso di Laurea in Informatica per il Management. Introduction to UML

José Manuel Barrueco Cruz and Thomas Krichel. Subject description in the academic metadata format

MODEL-DRIVEN REVERSE ENGINEERING AND PROGRAM COMPREHENSION: AN EXAMPLE INGENIERÍA REVERSA Y COMPRENSIÓN DE PROGRAMAS DIRIGIDA POR MODELOS: UN EJEMPLO

Recovery of Design Pattern from source code

End User s Guide Release 5.0

Data Presentation and Markup Languages

Comparing Open Source Digital Library Software

Delivery Options: Attend face-to-face in the classroom or via remote-live attendance.

Transcription:

XSCoRe: A Program Comprehension Workbench Pablo Montes Politécnico Grancolombiano, Ing. de Sistemas Bogotá, Colombia pmontesa@poligran.edu.co Silvia Takahashi Universidad de los Andes, Ing. de Sistemas y Computación Bogotá, Colombia stakahas@uniandes.edu.co Presentado como ponencia en la XXXI Conferencia Latinoamericana de Informática, en el marco del Centro Latinoamericano de Estudios en Informática (CLEI) 2005. 1

Abstract Even software that was developed using modern object oriented languages like Java can profit from reengineering. This is particularly true for software systems that were not developed following all the guidelines for developing good quality software. As always, the first step in the reengineering effort is program understanding. This paper presents an approach for aiding in the understanding of Java programs. We describe an XML representation that can be used to model Java code. This representation is at a higher level of abstraction than an abstract syntax tree because it incorporates constructs from the underlying object oriented paradigm. A novel characteristic is the inclusion of Javadoc comments in the representation of the Java code. We also describe a workbench that can be used to aid in the understanding of Java programs. This workbench includes the translator from source code to the XML representation of Java code, as well as specific XLS transformations that can be used to extract information. The workbench can be easily extended with new XLS transformations depending on the needs of the software maintenance engineer. Keywords: Source code representation, XML, Javadoc comments, XSL transformations, program comprehension workbenches. 2

Resumen Inclusive el software que fue desarrollado usando lenguajes de programación modernos como Java puede beneficiarse de la reingeniería. Esto es particularmente cierto para sistemas de software desarrollados sin seguir todas las técnicas para el desarrollo de software de alta calidad. Como siempre, el primer paso en un proyecto de reingeniería es la comprensión de programas. Este artículo presenta un enfoque para ayudar en la comprensión de programas Java. Definimos una representación XML para modelar código Java. Esta representación se encuentra a un nivel de abstracción más alto que el árbol de sintaxis, ya que incorpora construcciones del modelo de objetos subyacente. Una característica novedosa es la inclusión de comentarios JavaDoc en la representación del código Java. También describimos un banco de trabajo que puede ser usado para ayudar en la comprensión de programas Java. Este banco de trabajo incluye un traductor del código fuente a la representación XML así como transformaciones XLS específicas que pueden ser usadas para extraer información. El banco de trabajo puede ser extendido fácilmente con nuevas transformaciones dependiendo de las necesidades del mantenedor de software. Palabras Claves: Representación de código fuente, XML, comentarios Javadoc, transformaciones XSL, bancos de trabajo para la comprensión de programas. 3

1. INTRODUCTION Software reengineering emerges as a field of study in response to the large number of legacy systems that have to be maintained effectively. With the advent of object oriented languages, techniques, and methodologies, it was believed that object oriented software would in itself be maintainable and that no additional tools would be required to aid in program comprehension: all the program artifacts would always be up-to-date, and it would always be possible to know what the actual design of the application was. However, object oriented software also suffers from the lack of good software practices, and we are often faced with the problem of reengineering software written in modern programming languages such as Java. As always, the first step in the reengineering effort is program understanding. This paper summarizes research carried out in Pablo Montes Master s Thesis [[17]]. The purpose of the project was to design a tool that would ease the process of program comprehension for Java programs. The result was a workbench for reverse engineering Java programs which we called of XScoRe. The backbone of this workbench is xjava, an XML based representation of Java source code. The workbench includes a translator from Java projects to xjava and other tools used to extract information from the translated code. The definition of the XML based representation was a pivotal result of the project because it influenced the development of the workbench. By using XML, the tools that make up the workbench can take advantage of existing techniques for managing XML documents. We can obtain different views of the same model which can, in turn, be used to further the understanding of the software system. Though the workbench described in this paper only implements a few views and their corresponding transformations, it can be easily extended to provide information needed to achieve a better understanding of a software system. This paper is organized as follows: Section 2 briefly describes the motivation for building a reverse engineering workbench and justification for using an XML based representation. Section 3 describes the approach used to solve the problem. Section 4 describes the actual solution that was developed. Section 5 outlines directions for future research, and Section 6 presents some conclusions. 4

2. BACKGROUND In this section we list the characteristics that reengineering tools should have. We also state the motivation for using XML. 2.1. The ideal reengineering workbench Most authors (see [[10], [18], [19], [20], [21], and [22]]) agree on the characteristics that must be offered by an application that aids in the task of software reengineering. These features are summarized below. 1. The tool must be able to extract information from the software artifacts and generate representations that are at a higher level of abstraction. For example, if the artifact that is being studied is the source code, the representation of the corresponding abstract syntax tree would be at a higher level of abstraction; when dealing with a design document, it would be helpful to obtain analysis information. Examples of analysis information include information regarding the real-world data that is being represented in the application data as well as defining functional and non-functional requirements. 2. Once the information has been extracted, the tool must provide means for exploring the extracted data. This exploration activity is, according to many authors, one of the most important tasks in a reengineering project. To provide facilities for exploration, the tool should offer the following features: a. Navigation: Software systems are generally nonlinear and can be viewed as a multidimensional network of artifacts. The tool must facilitate navigating this network to aid in the process of software understanding. b. Multiple views: Visual metaphors are useful and are commonly used to represent information and ease its comprehension. The tool must offer, not only information obtained from the extraction phases, but also mechanisms for the software engineer to create new views according to his/her needs. c. Extensible: Most reverse engineering applications offer a finite set of features. Though the designers of the application may consider these to be sufficient, it is not possible to predict what aspects of a system are important for all users, nor how to represent them. The application, therefore, must offer mechanisms by which users can extend its functionality. d. Information exchange: Currently, there is a widespread development of applications that support software development both for reverse and forward engineering. The reengineering application must be able to 5

interoperate with other reengineering tools and be able to connect with other systems, by using results from or providing information to other components. 2.2. Using XML XML is a Standard format for data exchange [2]. It seems inevitable to relate the use of XML with the practice of reverse engineering. XML intrinsically provides or rather promises all the characteristics mentioned above. Moreover, recent and current research has resulted in the definition of mechanisms for: processing XML documents (SAX, DOM [[14]]), generating multiple views from XML documents through XLST transformations [[8]]. These views often involve the use of HTM thus giving the option of hyperlink navigation. Services for querying data in XML documents (using XPATH, XQUERY and XQL) can also be provided. Moreover, XML has an inherently hierarchical structure where markers are used to identify data. Since naming data often gives an idea regarding its meaning, XML is often described as a mechanism that can be used to specify data semantics. Therefore: Why not define a representation of an application using XML? This would offer the possibility of obtaining information at a higher level of abstraction than the source code, provide the extensibility and support information exchange. We are aware that many other approaches have also used XML as a representation language. In the next section, we describe other approaches and compare them to our proposed solution. 3. A FIRST STEP The source code of a program, though unambiguous due to the fact that it adheres to a well defined grammar, is in a flat text format completely lacking an explicit structure. Its structure can be made explicit after syntactic analysis [[20]]. Usually, the next level of abstraction is the abstract syntax tree (AST). This representation is often used by compilers to represent the source code to analyze, optimize and translate it to a lower level language (e.g., byte code, for Java) [[15]]. This representation is not always intuitive and often reflects the language s syntactic peculiarities instead of higher level language building blocks [[1]]. One of the contributions Java has made to the software engineering community is the standardization of documentation source code comments, informally known as Javadoc comments. This provides not only a standard, but also encourages software engineers to document their programs by offering the possibility of automatically 6

generating the basic class documentation in an HTML document. It also serves the reverse engineering community by offering more information that can aid in the process of program understanding. When designing a high level representation for Java source code, Javadoc documentation should not be ignored. However, many existing high level representations often do. In this section we outline the design of XScoRe, the reengineering workbench and xjava, the source code representation used by the workbench. We begin by outlining other approaches to the problem of representation. We also describe the workbench and give the main characteristics of the representation. 3.1. An XML Based Representation There are various approaches, mostly at the academic level, for representing source code using XML. Each one has its own characteristics as well as advantages and disadvantages. Some of the most significant approaches are the following: SrcML [[4], [5], [20]] CppML [[15], [20]] OOML [[15]] JavaML [[1], [15], [20]] We studied these approaches, to determine their features both the good and the bad and define a representation that integrates the best characteristics of existing approaches. Our goal was to define a mechanism that would aid in the task of reverse engineering and program comprehension. SrcML, for example, uses a generic approach. It can be used to model source code written in any language. To be able to achieve this efficiently, the source code is semi-analyzed (the method's body and other low level characteristics are not represented) and the code structure, including spaces an line-breaks, is kept intact. By using SrcML, only the high level characteristics can be obtained. CppML, defines an XML based grammar for C++ source code. It represents source code with its structure as well as the analyzed structure of the syntax tree. This representation adheres to the syntactic structure of the program and does not model high level object oriented concepts. 7

OOML presents a much more generic approach. Similarly to SrcML, it models only de basic concepts of the object oriented paradigm and, therefore, it does not include all the information found in the source code. On the other hand, it represents all the analyzed information in a tree-like XML document, without taking into consideration the original code structure. Consequently, this representation can be generated, not only from source code, but also from a more complex predefined representation of the source code. Finally, JavaML is the most complete representation found and the one that provides the most complete documentation. As such, it is the basis for this research work. This representation, however, only takes into account the information needed by the compiler to generate the ByteCode. It, therefore, omits comments which are of great importance in a program comprehension workbench. Additionally, the XML representation is based almost exclusively on the Java grammar and, therefore, it does not take into account the semantic information of the Object Oriented Paradigm. We decided to use XML as the representation language for modeling the Java syntax. In the defined representation, we also model some aspects related to the semantics of the object oriented paradigm. The representation also supports Javadoc comments. As we mentioned above, we called this representation xjava. The novel aspects of our representation are, as we have said, that it is at a higher level of abstraction than the syntax tree and that we represent comments. In section 4.1, we show an example of the representation of a simple class. It shows how we include information regarding classes and packages that can be used to enhance high level software comprehension. 3.2. Using the representation Once the code has been translated to xjava, it can be analyzed, transformed (using XSL transformations), and queried to obtain the required information or to view the parts of the code that are relevant to a particular problem. Some of the views that should be provided by the workbench to aid in the task of program comprehension are listed below. Extracting information necessary to construct UML diagrams using XMI. Dependency graph and call-tree using GXL to represent these structures. Reformatted plain text. Hyperlinked text (HTML). 8

Documentation view (a la Javadoc). Graphical view of the program s structure. Querying mechanisms can also be defined using tools such as XPATH. 3.3. Putting it all together The workbench is made up of the following tools: a translator from source code to xjava and a few transformations. It is extensible and the initial set of tools and services should not constitute a roadblock to the person in charge of program comprehension. The user can define new transformations and querying mechanisms depending on his/her program comprehension needs. Standard XML processing APIs such as SAX and DOM can be used to process and analyze the extracted data. Figure 1 illustrates the complete process that is supported by the workbench: all of the source code files are first translated to their equivalent XML representation. Each java file is translated to its corresponding XML file. Additionally, there is a single XML that specifies the packages that make up the system. By using Xlink, we can reference the XML files that represent each individual source code file, indicating to which package it belongs. If we wanted to create a new view of the system we only need to define an XSL transformation and, by running this transformation through the XSLT processor, without modifying the representation or knowing any specifics about the Java grammar, we could create the desired views. Figure 1. xjava and XSL Transformations 9

The proposed architecture could potentially be used in a software development environment based on various views of a single model. In this way, the problem of maintaining coherence among the source code and all software artifacts could be solved. An environment like this would aid in software maintenance tasks as well as in the complete cycle of software development. This would be particularly helpful in large and complex systems in which many people participate in the process of software development. The current version of the workbench provides the following transformations: one that generates a documentation view and another for translating the code into an HTML format with hyperlinks. We are currently integrating a tool that obtains the XMI description of the code from xjava to obtain UML class diagrams that include relations among classes. Each of these examples is not only useful in and of itself but in that it illustrates the fact that the workbench can easily be extended to provide more functionality. This is particularly true in the case of the transformation to XMI, for this was done in a Senior Project by a person who did not participate in the original construction of the workbench. It is important to point out, that given that xjava is well defined and documented, anyone who has a good understanding of XML technology can define new views and queries to satisfy new requirements. By adding query services we can provide an environment which is similar to that described in [[1]]. We could even define a new transformation to create a representation that can then be queried by OCL as it is done in [[1]]. 4. IMPLEMENTATION DETAILS This section describes some of the implementation issues. Some xjava examples for a simple class and a comment are shown. We describe the technical information regarding the actual implementation of the translator and the workbench is provided. Finally we give some examples of how XScoRe has been used. 4.1. The representation: xjava Once we decided that XML would be the language we would use to define the representation, we proceeded to research the best practices in XML design and manipulation. We followed the guidelines proposed in [[3]]. The main characteristics that the representation should have are the following: It should be easily extensible. 10

It should be complete: the representation should be able to model any java program that can be compiled by a standard java compiler, and all the details present in the code must be represented. In other words, so that it can be regenerated with a new transformation. The representation should be as generic and as simple as possible and should represent the structuring concepts of the object oriented programming paradigm. It should also represent Javadoc comments: not just as plain text but in a structured manner. Based on this and on the Java grammar we define xjava. To achieve extensibility we use an XML schema (XSD) for the definition. We based our design on Java 1.4. We do not include syntactic verification: we assume that the program that is being analyzed does not contain syntax errors. Given that XSD is inherently extensible, it will be easy to incorporate new features that appear in new versions of Java without having to change the representation. The translator was constructed using publicly available Java grammars. Our representation supports all constructs in this grammar, so its completeness is dependent on the completeness of the published grammars. Note that we are only representing static information. Though it is possible to extract information like the class diagram through transformations, we have not researched how to deal with features such as class reflection. Since xjava is used by other tools that analyze and apply transformations to extract information, it is important that it be generic and simple. Our design reflects this: the generated document s structure is data oriented and is completely structured. We used the Venetian Blind in most of the generated documents. Some structuring elements are not represented but can be obtained with an adequate transformation. Two examples of these simplifications are: Java s complete qualified names for packages, imports, or references to classes. Here, instead of obtaining the name of each subcomponent we always represent the name as a single string. The other example is the case of modifiers for classes, attributes or methods: instead of representing each modifier as a different element, the list of modifiers is represented as a single element which contains a string with all the modifiers separated by blanks. If any of these characteristics is needed, we can define transforms to obtain them. We show a simple example of a single java file with its corresponding translation to xjava. Figure 2 shows the sample code and Figure 3 the corresponding representation. We also define a representation for a Java project that includes references to all packages and files in a Java application. We used the XLINK standard defined by W3C to manage references and links with XML. 11

Finally, as we mentioned above, given that the tool is going to be used for reverse engineering it is important to represent comments. We defined a structured representation for comments that follow the Javadoc standard. Other comments are represented in plain text. A problem that arises when analyzing a comment is how to determine what programming construct it refers to. Does it refer to the construct in the same line, the previous line or the following line? At times this depends on the software engineer s or the organization s standards. We could try to guess the corresponding construct through parameterized heuristics depending on the number of blank lines before or after a comment. In our representation, we assumed that the comment made reference to the statement or declaration that appear after it. Figures 4 and 5 show an example of a Javadoc comment and the corresponding representation in xjava. Figures 6, 7, 8 and 9 show the result of applying the XSLT that transforms the xjava representation of a more complex program into a series of HTML files that show only the Javadoc comments in an ordered and navigatable manner. This view, though it works like the Javadoc tool or Doxygen, is easily modifiable and extensible by including customized tags. It is important to point out that our tool provides an alternate approach for managing comments. Our tool would be used instead of the Javadoc tool. The interesting feature is that comments are analyzed with the code itself. Though our tool would support new types of tags, we do not propose any methodology for writing comments as is the case in [[25]]. Figure 2. Sample Program 12

Figure 3. xjava representation of sample program 13

Figure 4 Javadoc Comment Figure 5. Javadoc Translation Figure 6 Files list with links 14

Figure 7 Class summary 15

Figure 8. Class fields, constructors and methods list with links Figure 9. Method detail 16

4.2. Technical Information The lexer and parser were generated using JavaCC [[11], [24]]. Currently many developers are using JavaCC and JavaCC grammar files for Java 1.4.2 can be found in the JavaCC Grammar Repository [[12]]. This grammar is based on the initial grammar provided by Sun. We also used JavaCC to analyze Javadoc comments by using only the lexer with lexical states. It is important to point out that though there is a lot of documentation regarding the correct use of Javadoc [[13], [9]], we could not find a grammar that served our purposes. We first generated the syntax tree using JTB (Java Tree Builder) [[23]], a tool that can be used with JavaCC for this purpose. Then, we used the visitor pattern to generate the corresponding representation in xjava. We then used JDOM [[14]] to generate the XML document. Once we obtained this document, transformations can be applied to it to obtain different views. This can all be done though the command line. However, this does not provide a very friendly interface. Therefore, we decided to integrate it with Eclipse s graphical interface through contextual menus. This way a user can translate a Java project to its equivalent xjava representation and can obtain different views from an xjava file. Eclipse Wizards must be implemented for each transformation that requires additional user input. To take advantage of Eclipse s architecture it is not enough to define new plug-ins with new functions. The real potential of the workbench is the fact that it gives the opportunity to extend the plug-ins we have developed. Our plug-in allows developers to add new transformations through XSL transformations or using APIs for XML processing. 4.3. Using XScoRe The set of tools that we developed were tested on the same code used in their development. This is a large project and we were able to create the xjava presentation and obtain information given by the XST transformations that we implemented. The representation of comments was used to create the implementation manual. It is important to point out that by obtaining a representation in XML, it can be visualized in many ways depending on the transformation we use. We also tested it on various student projects. In particular we used it to create a representation of an application used in a course project in which students have to implement an interface of an existing application. As a rule, 17

students cannot modify the core of the application; they can only use the public methods of the main class. XScoRe aided in the understanding of the existing software. 18

5. SUMMARY AND FUTURE RESEARCH In this section we summarize the current state of the XSCoRe workbench and give directions for future work and research. The design of this workbench makes it ideal for adding features because of its extensible architecture. XScoRe includes the following tools: a translator from Java code to xjava, and an integrated mechanism for applying XSL transformations on the generated documents. Two views were implemented: a documentation view and a hypertext code view. The first one presents comments similarly to Javadoc s presentation scheme. The second view presents the code using HTML which allows straight forward navigation of the code. XScoRe is currently being enhanced with a transformation that obtained XMI from xjava. Once we have obtained the XMI representation, UML diagrams can be generated. The designed system can run as a standalone application through the command line. However, its functionality can be better exploited through the integration with the Eclipse platform. New transformations can easily be defined and integrated. The representation itself can be used to extract information through queries XPATH, XQUERY or XQL which allow queries a là SQL over XML documents. Currently we are working on constructing queries that can be used to obtain information related to certain metrics that in turn can help us detect bad smells [[6]]. We hope to be able to define transformations that can be used for refactoring. Not only can the workbench be extended by adding transformations and query possibilities, its design makes easy to incorporate changes to the language s grammar which are quite common with Java s versioning. In the previous paragraphs we mentioned some directions for future work which are facilitated by the design of the workbench. The basic idea is to include new transformations and support for query management that will add services to the workbench so that it covers more of the activities related to program comprehension and reverse engineering. 19

In terms of representation of comments, the current representation does not attempt to determine what program element the comment refers to. This would require the used of heuristics and probably some degree of human intervention. Evidently, it is not enough to have a static representation to achieve program understanding that is enough to carryout maintenance tasks and eventually migrate it to an evolutionary model. Future research should try to define new frameworks in which dynamic information can be represented, preferably also using XML, so that the resulting tools can be integrated with the current system. 6. CONCLUSIONS In this paper we described the results of a research project whose objectives were to develop tools to ease comprehension of java programs. The main results of the research were the development of a workbench (called XScoRe) and of the underlining representation of Java code (xjava). The workbench exhibits many of the characteristics which have been mentioned as vital in a reverse engineering environment: it provides mechanisms for information extraction, generation of multiple views, navigation, and most importantly it provides means for extending it with new features. The code representation is based on XML. Our approach differs from previous ones, in that the representation used was specifically designed with the purpose of reverse engineering. Therefore, it offers many advantages with respect to the code itself or classical textual code representation: it offers services for other reverse engineering tools to extract information and reason about the existing software system. The proposed representation facilitates obtaining program information in a more accessible and heterogeneous manner. Our representation models structures inherent to the object oriented paradigm instead of only representing Java s grammar rules. Therefore, it is at a higher level of abstraction than the source code and even than the syntax tree. We also represent comments which have been written following the Javadoc standard. It is not common for reverse engineering tools to provide support for managing comments. The representation of comments can also be extended if the user wants to define new tags. 20

Both XScore and xjava suggest a new approach towards program understanding. The methodology we used during our research defines a framework for the definition of program understanding workbenches that can be used to automate this process. The basic idea is to first define a representation of the source code that can be easily manipulated. The defined representation can then be used to construct different views directly from the representation which contains more information than what is usually available in the abstract syntax tree. Novel features of XSCoRe are the use of XML schemas which makes the workbench more extensible than other approaches; the inclusion of Javadoc comments in the representation so that they can be analyzed automatically; and the integrated representation of all files in an application organized in packages. The use of XML in a context different from that of data representation and exchange proposes a new approach that can be applied to reverse engineering and to any task that requires representing source code at a higher level of abstraction. This approach can be used in formal verification of programs, automatic code generation, or translation from a programming language to another. By the same token, it can also be used in forward engineering by aiding in the whole process of software development. We believe that this research is a significant contribution to problem of understanding source code. Moreover, this research gives new perspectives to the use of XML for a purpose different than that of data representation and exchange in multi-tier environments. Clearly the use of XML for program comprehension and development is just beginning. The results we have obtained till now may serve as a stepping stone for building more complex software comprehension tools. References [1] Antoniol, G. and Di Penta, M. and Merlo, E. YAAB (yet another AST browser): using OCL to navigate ASTs the 11 th IEEE International Workshop on Program Comprehension, Portland, Oregon, USA, May 10-11 2003, pages:13-22; IEEE Computer Society Press, 2003. [2] Badros, G., JavaML : a markup language for Java source code [online]. International Conference On The World Wide Web (9 : 2000 : Amsterdam, Neatherlands). Available from citeseer: <http://citeseer.ist.psu.edu/badros00javaml.html>. [3] Bonneau, S., et al. XML Design Handbook. Wrox Press, 2003. 21

[4] Collard, M., Maletic, J., and Marcus A., Source code files as structured documents. En : 10 th International Workshop On Program Comprehension, Paris, France, 2002). [online] Available from citeseer: <http://citeseer.ist.psu.edu/maletic02source.html>. [5] Collard, M.; Maletic, J., And Marcus A.,. Supporting Document and Data Views of Source Code [online]. Available from citeseer: <http://citeseer.ist.psu.edu/collard02supporting.html>. [6] Fowler, M., et. al., Refactoring: Improving the Design of Existing Code, Addison-Wesley Pub Co, 28 June, 1999. [7] Friendly, L. The Design of Distributed Hyperlinked Programming Documentation, International Workshop On Hypermedia Design, Montpellier, France, 1995. [online] Available from citeseer: <http://citeseer.ist.psu.edu/friendly95design.html>. [8] Gardner, J., Rendon, Z., XSLT and XPATH: A Guide to XML Transformations. Prentice Hall, 2001. [9] How to Write Doc Comments for the Javadoc tool [online], Available from JavaSun <http://java.sun.com/j2se/javadoc/writingdoccomments/index.html>. [10] JAHNKE, Jen et al. Reverse Engineering: A Roadmap Future of Software Engineering, Limerick, Ireland. [online], Available from citeseer: <http://citeseer.ist.psu.edu/muller00reverse.html>. [11] JavaCC Documentation [online], Available from JavaSun <https://javacc.dev.java.net/doc/docindex.html>. [12] JavaCC Grammar Repository [online], Available from JavaSun: <http://www.cobase.cs.ucla.edu/pub/javacc/>. [13] Javadoc Home Page [online], Available from JavaSun: <http://java.sun.com/j2se/javadoc/>. [14] JDOM Home Page [online]. <http://www.jdom.org/>. [15] Kontogiannis, K., Mamas, E. Towards Portable Source Code Representations Using XML, 7 th Working Conference On Reverse Engineering, Brisbane, Australia 2000. Available from csdl.computer.org. <http://csdl.computer.org/comp/proceedings/wcre/2000/0881/00/08810172abs.htm>. [16] Kontogiannis, K., Zou, Y., Towards A Portable XML-based Source Code Representation, Xml Technologies And Software Engineering, 2001. [online], Available from citeseer: <http://citeseer.ist.psu.edu/500234.html>. [17] Montes, P., XSCoRe: Representación de Código Fuente basada en XML como una Herramienta de Asistencia al Proceso de Ingeniería en Reversa, Tesis de Maestría, Dept. Ingeniería de Sistemas y Computación, Universidad de los Andes, Bogotá, Colombia, 2004. [18] Rada, Roy. Reengineering Software: How to Reuse Programming to Build New, State-Of-The-Art Software. Fitzroy Dearborn Publishers, 1999. [19] Santanu, P., Smith, D., and Tilley, S., Towards a Framework for Program Understanding, 4 th Workshop On Program Comprehension, 1996. [online], Available from citeseer: <http://citeseer.ist.psu.edu/266671.html>. 22

[20] Simic, H., Topolnik, M., Prospects Of Encoding Java Source Code In XML, 7 th International Conference On Telecommunications, Zagreb, Croatia, 2003 [Online], Available From Citeseer: <http://citeseer.ist.psu.edu/simic03prospects.html>. [21] Tilley, S., Perspectives on Legacy System Reengineering Reengineering centre, Software Engineering Institute, Carnegie Mellon University, 1995, [online]. Available from www.sei.edu: <http://www.sei.cmu.edu/reengineering/lsysree.html>. [22] Tilley, S., A Reverse-Engineering Environment Framework [online]. Reengineering centre, Software Engineering Institute, Carnegie Mellon University, 1998. [online]. Available from www.sei.edu: <http://www.sei.cmu.edu/publications/documents/98.reports/98tr005/98tr005abstract.html>. [23] The Java Tree Builder Homepage [online]: <http://www.cs.purdue.edu/jtb/index.html>. [24] The JavaCC FAQ [on line] <http://www.engr.mun.ca/~theo/javacc-faq/>. [25] Torchiano, M.,Documenting Pattern Use in Java Programs, In Proc. IEEE Int. Conf. on Software Maintenance (ICSM 2002), Montreal, Canada, October 3-6, 2002, pp. 230-233. 23