Aalborg University Department of Computer Science

Size: px

Start display at page:

Download "Aalborg University Department of Computer Science"

Natalie Short
5 years ago
Views:

1 Aalborg University Department of Computer Science Database and Programming Technologies Title: The lucidator for Java Topic: Software Documentation for Developers Project period: 2/ / Project group: DAT5, E1-207A Søren Staun-Pedersen Max Rydahl Andersen Vathanan Kumar Kristian Lykkegaard Sørensen Claus Nyhus Christensen Supervisor: Kurt Nørmark Number of appendixes: 3 Total number of pages: 109 Number of pages in report: 75 Number of reports printed: 12 Abstract: This project presents an implementation of a tool, intended for a branch of literate programming called Elucidative Programming. The tool is written in Java for the Java programming language. Elucidative Programming focuses on maintaining the programmers understanding of the program parts, in order to allow easy rediscovery. It features documentation and source code in separate files, and provide mechanisms for linking between these. The Elucidator tool implemented in this project, relies on documentation written in an XML based language. This documentation and the source code is transformed into HTML and shown in a hypertext browser. Editor support has been implemented to allow easy linking between the written documentation and the source code. The main contribution is a framework for an Elucidator tool, using modern standardized technologies. It furthermore provides the user with a plethora of navigation possibilities. Copyright c 2000, DAT5, E1-207A.

2 ii

3 ÈÖ This report documents the effort made during the first half of our master thesis at the Department of Computer Science, Aalborg University, Denmark. The work started on September 2, 1999, and lasted to January 17, The entire master thesis work lasts until June, Chapter 1: Introduction In this chapter we motivate our project, and establish our initial problems for the whole master thesis. Chapter 2: Analysis Through a literature study, we provide a theoretical background for solving the initial problems presented in the Introduction. Chapter 3: Project Focus Since this report documents the first half of our master thesis, some of the initial problems have to be postponed to the second part of the project. In this chapter the focus of the report is narrowed down to the activities taking place at the first part of the thesis. Chapter 4: Design In this chapter we present the design of our current implementation of the lucidator. Ideas are argued for with respect to a set of goals for good design. Chapter 5: Conclusion In this chapter we conclude our project. Chapter 6: Perspective As this report documents the first half of our thesis, further work have to be done in the second half. In this chapter we present our ideas for the second half of the master thesis. Appendix A: Grammars In the first appendix a number of grammars used in the current implementation of the lucidator are listed. Appendix B: Table design for the Data Model This appendix contains a number of tables documenting the format of the tables used in the current implementation of the Data Model. Appendix C: Example: Coffee Machine In the last appendix a sample Java project with corresponding elucidative documentation, abstractions and screen captures from the user interface is presented. The example is used throughout the Design chapter. iii

4 iv Throughout the report figures and tables are enumerated successively in each chapter. When a figure is taken directly from a literature source this is marked with a citation in the figure caption. The first time a word of special interest for the project is used, we emphasize it. Furthermore, we use a special typesetting of the words lucidator, -doc and -doc language when we refer to our current implementation of the Elucidator scheme. The literature referred to in the report is listed in the bibliography. References are given on the form [Nørmark, 1999], which means the piece of literature, marked by this label in the literature list, was used. We would like to thank the following people: Eastfork Object Space (EOS), especially Jørn Larsen and Kim Harding Christensen for providing us with tickets for the JAOO Conference 1999, held in Aarhus, September 20-22, The conference was a great inspiration to our project. Dr. Johannes Sametinger for taking the time to talk with us in connection to his lecture at Aarhus University on November 5, Vincent Gay-Para and Thomas Graf for providing extensive support for the Kopi Java Compiler. Especially for devoting a whole week-end to incorporate our special wishes into the compiler. Aalborg University, January 17, Søren Staun-Pedersen Max Rydahl Andersen Vathanan Kumar Kristian Lykkegaard Sørensen Claus Nyhus Christensen

5 ÓÒØ ÒØ 1 Introduction Problemization Analysis Elucidative programming The Object-Oriented Elucidator Multiuser support Motivation Immediate pay back Tool support Documentation structure Object-Oriented Documentation Cognitive models Templates and views Templates Views Summary Project Focus 21 4 Design System Architecture Input Data Java Source Code E-Doc files Data Model v

6 vi Contents Derived information Storage of derived information Current implementation Abstractor Purpose of the abstraction Abstraction of Java Abstraction of E-doc Query engine Current implementation Generator Responsibilities of the Generator Architectural considerations Current implementation The browser The interface Layout The editor Choice of editor Current implementation Environment Conclusion 69 6 Perspective 73 Bibliography 77 A Grammars 81 A.1 Document Type Definition for the EDoc language A.2 Document Type Definition for the JavaMarkupEngine A.3 Document Type Definition for the Navigation Window B Table design for the Data Model 85

7 Contents vii C Example: Coffee Machine 89 C.1 Java source code C.1.1 EvaTrio.java C.1.2 CoffeeMachine.java C.1.3 CoffeeContainer.java C.1.4 WaterContainer.java C.1.5 Filter.java C.1.6 HeatingElement.java C.1.7 ElectricalAppliance.java C.1.8 ElecAppException.java C.2 E-doc documentation C.3 Derived information and Screen captures

8 viii Contents

9 ½ ÁÒØÖÓ ÙØ ÓÒ In 1984 Donald E. Knuth, Professor Emeritus of The Art of Computer Programming at Stanford University, suggested that the time was ripe for significantly better documentation [Knuth, 1984]. To achieve this, he argued, that computer programs should be considered works of literature. Let us change our traditional attitude to the construction of programs. Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do. [Knuth, 1984] Based on these thoughts, he developed Literate Programming, and a set of tools known as the WEB tools. This new paradigm relied on the code residing in the documentation rather than the, at that time, more commonly used solution, where the program was documented via comments written in the source code. In this way, the program documentation followed the ways of the author, rather than the programmers code. This paradigm seems ideal for documenting algorithms or code fragments. However, Kurt Nørmark, associate professor at the Department of Computer Science at Aalborg University, points out several problems in this approach [Nørmark, 1999]. Knuth s WEB tools uses three languages: a documentation language, a programming language, and a language to bind the two in a literate document. Nørmark argues, that the mental load of using a WEB system is high. Firstly, one has to master and use three languages while keeping the focus on problem solving. Secondly, source code, as seen by the programmer and the compiler, are different, causing problems when, e.g., syntax errors are to be located. Furthermore, the documentation for the code 1

10 2 Introduction...is almost exclusively oriented towards a paper [article] representation. Using todays media, a more online-oriented representation using hypertext concepts would be a big gain. [Nørmark, 1999] Nørmark argues, that Literate Programming is well suited for producing publications of programs as technical literature, while the needs of the practical software engineer are not met. As a consequence, he therefore introduces a branch of Literate Programming, called Elucidative Programming for documenting the understanding of programs. To achieve this, Nørmark suggests that we keep the source code and documentation in separate files in order to remove the mental load experienced with the WEB tools. His primary concern is to maintain the program understanding for the current and future programmers. This process should be supported directly by the editor, and utilize the editor to bridge the gap between documentation and source code. Furthermore, the output is not directed towards paper, but an online representation suitable for web-browsers. 1.1 Problemization Elucidative Programming as a concept is based upon Literate programming, but sees the problem from another perspective. Currently an Elucidator for the language Scheme, has been implemented [Nørmark, 1999]. Scheme is a functional language, with many advantages, e.g., it can be easily parsed due to its list-oriented syntax. Although Scheme has its advantages, it is not widely used for software engineering projects. Object oriented programming, on the other hand, has gained attention lately. Due to the strength of the object oriented paradigm, with respect to modeling of real world problems and reuse of code, we feel that the object oriented approach has the potential to become an important paradigm of the future. The original Scheme Elucidator has no direct support for multiple developers, as it was not aimed at teams of software engineers. We see support for multiple developers as a must for real world software projects, since one man teams are unlikely in the industry. Based on these observations we state our first question: To which degree is it possible to create a tool, which supports Elucidative Programming, multiple users and the object oriented programming paradigm? Usually the programs are not documented for future programmers, so the understanding of the program, the current developer has, is lost over time. When a software product has been released, it has to be maintained by (possibly) new programmers. According to [Brown et al., 1999] over 50% of the maintaining time used, is spent on rediscovery of the understanding. Rediscovery means the process of understanding old source code.

11 1.1 Problemization 3 Despite of these advantages, programmers still often neglect the documentation. It just never quite gets done, and the general quality is poor. We therefore need to find a way to further motivate the programmer to write documentation. How can the tool be designed, to motivate the programmer in writing documentation? The following two questions will not be the primary focus in this first half of the thesis. They still remain interesting however, as we intend to design for today, as well as the future. When writing source code the mental load is relatively high. If you on top of this add the burden of writing and structuring your documentation, this will often lead to, the programmer refraining from documenting. A natural way to solve this problem would be to reduce the burden of structuring documentation. This could be realized by providing a set of templates to the programmer. By a template we mean a predefined way of structuring a piece of documentation. This leads us to our third question: How can templates be created in a way that will support a programmer in structuring his Elucidative documentation? The product of Literate Programming is typically a paper article, while Elucidate Programming produces hyperlinked documentation. Articles are meant to be read linearly while the reader has the ability to choose his personal path when reading hyperlinked documentation. Documentation is usually written for several purposes. Therefore we would like to extend the notion of hyperlinked documentation, as seen in Elucidative Programming, with a set of views. A view can be seen as a structured subset of the documentation. In order to provide different views we need to store the documentation in a way which makes it easy to extract parts of the documentation according to a specific view. We therefore pose our final question: Given a structured documentation, to which degree is it possible to create views of the documentation, which extend the value of the documentation.

12 4 Introduction

13 ¾ Ò ÐÝ The main focus of this project is to create an Elucidator for an object-oriented programming language. By this we mean an Elucidator, that enables a programmer to document an object-oriented language. We have specifically chosen to create an Elucidator for the Java language. In this chapter concepts are presented, which will be useful for us when designing and implementing the Elucidator. This is done by elaborating on the four questions posed in the problemization. From analysis we conclude on the consequences that a given subject will have for the Elucidator. At first we move towards an understanding of the basic concepts for an object-oriented Elucidator for multiple users. This is done by taking a look at the original Elucidator as proposed by Nørmark. Since we are going to create an Elucidator for Java, we investigate the characteristics of Java and what consequences these might have on the idea of Elucidative programming. We also look at ways of introducing multiuser support in the Elucidator. Secondly, we briefly examine the possible problems programmers documenting source code, will encounter, and we look at how to motivate them in writing documentation. Then we consider how documentation for object-oriented programs could be structured. Two different schemes are examined. Finally we argue that templates and views can indeed help the programmer in structuring his documentation and that views can be established. 2.1 Elucidative programming The foundation and main inspirational source for this project is Elucidative Programming. It was briefly introduced, but since it is not yet widely known, a more in depth introduction of the main concepts are presented in this section. 5

14 6 Analysis Elucidative programming was introduced by Nørmark in the article Requirements for an Elucidative Programming Environment [Nørmark, 1999], as a more practical variation of Literate Programming. Nørmark argues that the documentation written before the actual programming begins, such as analysis and design documents, are often not updated throughout the lifetime 1 of the program, which results in the documentation loosing its value. On the other hand, not much documentation is produced during the implementation phase, even though a substantial amount of system understanding is present there. Nørmark argues, that it would in principle be a minor effort to write down the program understanding, but in real life many excuses exist for not doing so. He groups these excuses into four categories [Nørmark, 1999, p. 2]: The program comment problem: If documentation is written as comments, eventually there will exist a lot of comments, leading to that the program will obfuscate in documentation. The separate files problem: If the documentation and source code are written as separate files, it will be hard to keep it up-to-date, unless some definition of relations between the program and documentation exists. The mental load problem: Programming is a mentally demanding activity. If the documentation tools require unnecessary mental overhead the programmer will refrain from using them. The motivation problem: Most often the documentation efforts should be seen as a long term investment in relation to program maintenance. This leaves the programmer unmotivated for writing documentation since his efforts will bring few immediate advantages. Based on these observations and experiences with Literate Programming Nørmark introduces Elucidative Programming, as a branch of Literate Programming, suited for documenting the understanding of practical programs in a software development project. The concept is expressed via six requirements [Nørmark, 1999, p. 4]: Requirement 1: The internal documentation is oriented towards current and future developers of the program. Requirement 2: The internal documentation addresses explanation which serves to maintain the program understanding and to clarify the thoughts behind the program. Requirement 3: The program source must be intact, without embedded or surrounding documentation. 1 By lifetime we mean development, maintenance and further development.

2.1 Elucidative programming 7 Requirement 4: The programmer must experience support of the program documentation task in the program editing tool.

15 2.1 Elucidative programming 7 Requirement 4: The programmer must experience support of the program documentation task in the program editing tool. Requirement 5: The program chunking structure follows the main abstractions supported by the programming language. Requirement 6: The documented program must be available in an attractive, on-line representation suitable for exposition via an Internet browser. Based on these requirements Nørmark has developed a prototype Elucidator tool. The tool is for documenting programs written in the Scheme language, and is mainly written in Scheme. The tool has two interfaces: The editor, which is Emacs, and a hypertext browser, e.g., Netscape or Internet Explorer. The editor is used for editing both the documentation and program source code, but also acts as an interface to the Elucidator. The browser is used for presentation and navigation in the documentation. On Figures 2.1 and 2.2, screen capture of the browser and editor can be seen. Figure 2.1: Screen capture of the browser from the original Elucidator. The top frame contains links to indexes, while the left and right frame shows the documentation and source code, respectively [Nørmark, 1999]. The Elucidator tool operates on documentation bundles. A documentation bundle is a collection of source code files, a single documentation unit and a setup description. When the Elucidator is started in Emacs on a documentation bundle, it will, as shown in Figure 2.2 on the next page, create two frames: one for the documentation and one for the source code. The documentation contains, besides a header part, a number of sections and subsections, also known as entries. Each entry is supposed to explain and document one single aspect

16 8 Analysis Figure 2.2: Screen capture of editor from the original Elucidator, during work on the example shown in Figure 2.1 on the preceding page. The top frame contains the documentation while the bottom frame contains the source code [Nørmark, 1999]. of the program. The editor makes it possible to create relations from each documentation entry to a number of program units in the source code files and visa versa. The relations can either be strong, meaning the documentation entry in the relation is explaining details of the program unit, or weak, which is used when you just want to mention a program unit without explaining it. Two further means of referencing exists. First, it is possible to create cross reference relationships internally in the documentation. Secondly, it is possible to create a relationship to a part of, or point in, a program unit. This is done using source markers. Source markers are special marks, that are placed in program comments in the source code and can be used as program units when creating a relationship. In Figure 2.3 on the facing page an illustration of the relationship model is shown. It is now possible to have the Elucidator tool create a static HTML representation of the contents of the bundle, which can be shown in a hypertext browser, as shown in Figure 2.1 on the page before. In order to make an object-oriented multiuser version of the Elucidator we will have to expand the notion of the Elucidator. In the following two sections requirements for object orientation and multiuser support will therefore be examined.

17 2.1 Elucidative programming 9 Figure 2.3: An illustration of the elucidative programming relation model [Nørmark, 1999] The Object-Oriented Elucidator When examining an object-oriented Elucidator for the Java programming language there are several aspects to consider. In this section we present these aspects and what consequences they have for the object-oriented Elucidator. Object-Oriented Modeling Programs written in the object-oriented paradigm, often closely models some aspect of the real world. The objects of the running program is a representation or simulation of objects in a real or imaginary world [Madsen et al., 1993]. In that sense the program code does not only contain instructions for the computer to interpret, but it also contains information about the objects being modeled and different properties of these objects. By using the objectoriented modeling paradigm we write program code which is more comprehensible and as Sametinger has suggested this will lead to a need for less documentation [Sametinger, 1999]. At the same time, but not so obviously, the program code contains information about the choices made during the modeling. When modeling, a lot of decisions are taken concerning which aspects of the world to emphasize and which to ignore. During a traditional design phase a lot of design decisions are described, typically in a design language like UML. However, many decisions are first made when the design is to be implemented. These decisions could concern the fact that design languages, like UML, and programming languages, like Java, does not always have the same modeling abilities.

18 10 Analysis High complexity in Java Object-oriented languages have advanced abstraction features. Therefore object-oriented programs tend to be more complex than non-object-oriented programs [Sametinger, 1994]. In the following we examine some of the features in Java [Gosling et al., 1996] that can make Java programs complex. Polymorphism There are several aspects of polymorphism that makes Java programs more complex and hence complex to comprehend. Firstly, the inheritance hierarchy makes it difficult to immediately see from the definition which properties the class has, because it might inherit properties from its superclass. Secondly, the possibility of late binding makes it difficult to figure the exact type of an object at hand. In this way the exact semantics of the method called in a situation where a method is overridden in the subclass. Overloading The use of overloading combined with polymorphism also increases the complexity of the program code, since many methods can potentially be applicable at the calling place therefore the semantics of the method call becomes unclear or at least difficult for the programmer to comprehend. Anonymous classes This is one of the more exotic features of Java, although it appears in other languages (e.g., BETA [Madsen et al., 1993]). Via anonymous classes it is possible to instantiate objects which have their class defined at the place of the instantiation. The anonymous class either extends another class or implements an interface, and only through reflection it is possible to call other methods than the ones defined on the extended class or implemented interface. The anonymous class cannot be accessed anywhere else, since it has no name. The anonymous classes can make it much harder to comprehend a program, since it is possible to create an object where its type declaration or definition does not have a clear place of definition. Initializers Through initializers it is possible to initialize fields of a class, but the static initializers can be placed anywhere in the class declaration and it can therefore be difficult to understand how the class is initialized. Furthermore, the initializers are executed as they appear in the class declaration, which makes it even more difficult to comprehend the class initialization. Besides the object-oriented features of Java the concurrent features also adds to the complexity of Java programs. Using these features the programmer can have several threads running within the same address space. This adds to the complexity of the programs since a single line of instructions does not alone control the execution of a program. This of course also makes it more difficult to comprehend a program and it is our claim, thereby to document it. With concurrency it is very difficult to write one story in literate style that explains the program.

19 2.1 Elucidative programming 11 Hence we summarize that when changing the original Elucidator idea to an object-oriented Elucidator for the Java programming language it has the following consequences: We are dealing with a programming language with complex abstraction features (classes, methods, packages and the like) and these features play an important role in the documentation of the program. Therefore we need to understand them in detail. Since the structuring of different abstraction features (class hierarchies, packages) are vital in the comprehension of the program, the lucidator must have detailed knowledge of these structures and features to support the browsing and documentation of these. As we have seen, the object-oriented features of Java adds some pitfalls that can make programs more complex and more difficult to comprehend. The lucidator must have features to deal specifically with this. The model in an object-oriented program is an implementation of another model in some real or imaginary world, and hence in the description of the program it must be possible to easily refer to external documents that describe the domain Multiuser support The target for the Elucidator is the software engineers [Nørmark, 1999]. This is also true for our lucidator, and with the target users in mind, we find it natural to consider support for multiple users in our lucidator. Especially, since almost all software projects, involving software engineers, will take place in a multiuser environment. In this situation, it is our experience, that if you do not coordinate closely, you will end up with more people doing the same job, possibly at the same time. It is frustrating to find out that somebody already did the job you are doing, or have just overwritten one of your files. Multiuser support can have many facets. There are basic, multiuser features such as file sharing and versioning, and more complex features e.g., communication between users or mechanisms for project coordination. Furthermore, multiuser support can be realized by either interfacing to existing multiuser systems, or by making it an integrated part of the lucidator. The thoughts presented here has the consequences, that the lucidator should, as a minimum, allow some level of multiuser support to be implemented both on low and high level: It should allow multiple users to work on the same set of files. Either by direct support or by interfacing to existing multiuser/versioning systems. The lucidator should support sharing of information, and have some awareness of possible concurrent documentation and development.

20 12 Analysis 2.2 Motivation Programmers do not care much for writing documentation, or as Sametinger and Pomberger puts it: Development programmers hate documentation which, therefore, almost never is either complete or consistent. [Sametinger and Pomberger, 1992] It therefore seems obvious to consider this when designing and implementing a program documentation tool, such as the lucidator. Furthermore, we must make sure that we motivate the programmer to use the tools. It is necessary to find out why programmers do not like to write documentation, and why the written documentation always ends up being inconsistent. Therefore, it would seem like a good idea to know how programmers think when they write code. Unfortunately, to our best knowledge, very little literature exists on this. We have chosen to use our own, as well as our surroundings, as an empirical knowledge base. We realize that this is a heuristic method, and that one could question, how much programming experience five master students could possibly have but we find this approach more appealing than ignoring the problem all together Immediate pay back In projects involving more than one programmer, a manager or supervisor is typically involved. The role of the manager is to make sure time schedules are kept, resources are used appropriately, documentation is written and so on. It is our understanding that a typical reason, given by the manager to the programmers, for writing documentation, is to make sure that the software project is maintainable. So the documentation is written with the purpose of easing the maintenance process. This is clearly a motivating factor for the software company or the manager, but hardly for the programmer. There are several reasons for this: Many companies have a special department for maintenance, so the programmer who originally wrote the code will probably not maintain it. The time horizon from the development of a piece of software to the maintenance process starts is often long. Sametinger [Sametinger, 1991, p. 2] states that one problem of maintenance, among others, are the unavailability of the developers (in case questions arise). This means that the programmer could have gotten a new job or been reassigned to a new project. It requires strict discipline to spend extra effort on something, when the results will only be visible after a long period of time.

21 2.2 Motivation 13 Maintenance as an activity is being negatively viewed. It is often thought to be difficult, unfair (due to the lack of needed information), a dead-end job (no progress that can be seen), a task that is not at the cutting edge of technology [Sametinger, 1991, p. 2]. Based on these observations the notion of immediate pay back is introduced. By immediate pay back, we mean that the programmer must get something back for his efforts within a short time horizon, typically a week, but even better, right away. An example of immediate pay back could be the notion of a smart info-box. This box could be displayed for every symbol, for an example a method call, known to the lucidator, and would contain information that explained the use of the symbol, like information written at the place of the method declaration, e.g., the parameter list with documentation, pre and post conditions, or for a class with subclasses, a list of overriding methods. Another example could be a smart browser. Advanced object-oriented browsers of today enables the programmer to browse all classes through the inheritance hierarchy, to see all available methods or fields. An intelligent code browser could sort the information based on hints and constraints in the documentation. It could e.g., group methods together that were related to the same documentation fragments. It could sort the inheritance hierarchy so that classes that where explained together were shown together. These considerations has the following consequences for the lucidator: We should aim for providing as much immediate pay back as possible. We can come a long way in motivating the programmer to use the tool and hereby document, if he can immediately feel the benefits of his work Tool support There are other means of motivating the programmer to write documentation: Tools which support him in the documentation process could be provided. The tool support should be provided from an environment already known to the programmer, so instead of e.g, providing the programmer with a tool like DOgMA [Sametinger, 1991], one should rather provide extended functionality to the programmers own editor. The creator of DOgMA, Dr. Johannes Sametinger, associate professor at Department of Business Information Systems, Johannes Kepler University Linz, states that one of the reasons DOgMA has not been widely used, is that programmers do not like to switch to another tool just to make the creation of documentation easier [Sametinger, 1999]. One of the functionalities, which could be provided in the editor, could be supportive ideas for the structure of the documentation, realized via a set of structuring guidelines. We will go into greater details on object-oriented documentation, documentation structuring and structures for guidelines in the next section. As we intend to motivate the programmer through tool support we stress the following:

22 14 Analysis It is very important that the lucidator must resemble or reside in a programming environment known to or at best chosen by the programmer. This also emphasize the lucidator s ability to work together with other tools. 2.3 Documentation structure As previously described, an object-oriented program, models some part of the real or imaginary world. The programs can even be multi threaded and thereby do several things at the same time. How do we best document these programs? In Literate Programming the traditional target for the documentation is articles or other non hyperlinked documentation (single threaded documentation). Literate Programming presents the authors subjective view on what is going on. This might not be the best approach when documenting a system deeply rooted in a subjective and concurrent world. It should therefore not come as a big surprise, if we have a hard time documenting object-oriented programs using the traditional means. One could argue that any program, no matter how complex, can still be turned into a Turing machine which is basically mathematical. It could also be argued, that in a complex concurrent system, the whole is greater than the sum of the parts. Even though the parts of the system are described in detail, the system, as an entity, is too complex to be fully comprehended or described. Describing a complex system is not a trivial task, but it can be done, as proved by many of todays great authors. The rest of us, being mere mortals, would probably do better with another approach. In the following sections different schemes and ideas will be examined closely in search for a new documentation approach Object-Oriented Documentation Sametinger suggest an Object-Oriented Documentation scheme [Sametinger, 1994]. The idea is straight forward: If object-oriented programs has to be documented, why not document them in an object-oriented way. He argues that object-oriented software is normally based on components or class libraries done by other people. The programmers using these, needs an external view of the documentation, while the maintainer needs an internal view. Sametinger also suggests an overview which is needed to make a decision on whether to reuse existing software components and to ease the familiarization process for programmers [Sametinger, 1994, p. 4]. Furthermore, he divides the documentation into static (the systems architecture) and dynamic (e.g. the control flow). This in turn forms six different classes, as seen in Figure 2.4 on the facing page.

23 2.3 Documentation structure 15 overview external view internal view static view static overview class interface descriptiotation class implemen- description dynamic view dynamic overview task interface description task implementation description Figure 2.4: Documentation scheme for object-oriented software systems [Sametinger, 1994]. Inheritance in documentation In object-oriented programming you can inherit other classes, as previously described in Section on page 9. Sametinger extends the notion of inheritance to be applicable for documentation too. In Figure 2.5 the methods of the three classes are presented. The class Rectangle inherits the Class shape, which inherits the Object class. The boxes marked with a darker hue illustrates that the method is created or overridden in that class. classes: Rectangle Shape Object methods on class Rectangle Compare PrintOn Draw Outline Move Reuse Figure 2.5: Inherited and overridden methods for the rectangle class [Sametinger, 1994, p. 8]. Sametinger uses the same idea as with inheritance of methods. In Figure 2.6 on the next page the scenario has changed a bit. The focus is no longer methods, but documentation elements. The description of the class Rectangle inherits the description of the class Shape. This description in turn inherits its documentation from the description of the class Object. That is, when the user reads the documentation for the class Rectangle, the following sections appear: Short description, Conditions for use, Storing on Files and Graphical Objects. Only the section Short description is actually written for the documentation of the class Rectangle. The atoms in Sametinger s documentation are the sections. This means that in order for, e.g., the documentation for the Rectangle class to make sense, the sections must have low interdependency, or every section will potentially be out of context. The idea of inheritance will make it easier for the programmer to write less documentation. This way of writing will for a programmer, who is used to the object-oriented way of thinking, make it easy to reuse documentation. In this way the system facilitates a need for less documentation, which can motivate the programmer [Sametinger, 1999]. This form of reuse is a potential powerful concept, since it makes it easier to propagate changes the

24 16 Analysis classes: documentation on class Rectangle Rectangle Shape Object Short description Conditions for use Storing on Files Graphical Objects Figure 2.6: Inherited and overridden documentation for the rectangle class [Sametinger, 1994, p. 9]. documentation only has to be changed in one place. On the other hand inheritance can add a level of complexity to the process of writing documentation. The programmer has to concentrate on keeping the sections free of interdependencies with other sections, since this would make inheritance difficult. Furthermore, it is not clear if simple hyperlinking could make the same level of reuse possible. Besides inheritance, Sametinger has made a simple schema for the structure of documentation. This illustrates that is possible to have some level of pre-made structure for the documentation, that will apply in general. This moves against the general trend in traditional literate programming, where the documentation is a single thread of text. Sametinger presents a structure, where the single elements are of equal importance. We see this attempt to make a structure in the documentation as important. Sametinger s documentation scheme has underlined the importance of finding a structure for the program documentation. Sametinger s concept of inheritance has the following consequence for the lucidator: The lucidator could support reuse of documentation through some kind of inheritance Cognitive models In this section we will look at an approach, which eventually will lead us to another categorization of documentation. This is based on Storey, Fracchia, and Muller [Storey et al., 1997], which introduce three cognitive models for program comprehension. In this context, a cognitive model is a mental approach you adopt to understand already written source code. This is a reengineering approach compared to Sametinger who suggests a way to document code. Storey et al. presents the following three different cognitive models for program rediscovery: Bottom-up comprehension The programmer understands the code by reading the source code. Gradually, he understands the big picture by putting the fragments of understanding together.

25 2.4 Templates and views 17 Top-down comprehension The source code is searched superficially for beacons, which indicate lower level behavior. Knowledge based comprehension The programmer can use both top-down and bottom-up approaches. This is supported via a knowledge base. The knowledge base consists of application and programming domain knowledge, program goals, a library of programming plans and rules of discourse. On top of this, the model relies on an evolving mental model and an assimilation process. Every programmer relies, according to the article, more or less on one of these three models. To improve the degree of program comprehension, the authors have a number of suggestions, which can be seen as guidelines for creating templates and views to support the different cognitive models. To enhance top-down comprehension, they suggest that we provide an adequate overview of the system architecture at various levels of abstraction [Storey et al., 1997, p. 22]. We will elaborate on the difference between reading and writing documentation in the following sections. From the concept of cognitive models we conclude that: There exists a number of different ways to read and comprehend source code and the corresponding documentation. 2.4 Templates and views Storey et al. gives suggestions on how information should be structured in order to make it easier for a programmer to comprehend a program. We do not wish to go into details with these, but underline that a number of schemas exists and that they are relevant. From Sametinger we have seen a schema for structuring documentation [Sametinger and Stritzinger, 1993]. From our own experience as programmers we know that many different ways exists for writing documentation. This underlines the need for a number of different ways of structuring the writing of documentation (Templates) as well as a number of ways to extract and read documentation (Views). We will deal with these topics in the following Templates For writing the documentation templates can be used. A template is a predefined guideline for structuring a piece of documentation 2. We divide templates in two forms: Textual templates A textual written guideline telling what to write and how to write the documentation. 2 Notice that we now speak of pieces of documentation, and not documentation as an entity.

26 18 Analysis Structural templates A structural guideline with key and value pairs for the programmer to fill out. For the rest of this report, we will, unless specially noted, always refer to structural templates as templates. Due to the non-linear nature of object-oriented programs it seems unlikely that a template of all templates would be useful. Instead we introduce the notion of sub-templates. By a sub-template we mean a small specific template, which can be combined with/nested within other templates to form the whole structure of the documentation. Furthermore, it should be possible for the programmer to create new templates himself, in order to provide maximum flexibility. One of the strengths of templates is the flexibility. You can provide the programmer with a set of standard templates, while at the same time allowing him to produce his own, and even allow him to produce new templates by combining sub-templates. This seems like an appealing idea, but every rose has its thorns. We see three dangers with the introduced flexibility: The possibility of a multitude of templates: By allowing the programmer to create his own templates we open up for the possibility of a vast amount of templates. Many possibilities for creating the documentation: By allowing the programmer to create new templates and combine sub-templates into new ones we make it possible to structure the same documentation in various ways. Risk of unstructured documentation: With freedom comes responsibility. By giving the programmer a flexible template system we give away the opportunities to enforce a strict structure of the documentation, thus leaving this responsibility to the programmer. Common for all these three dangers are, that if the programmer is not careful when he writes his documentation, then the introduction of templates may result in a somewhat messy and unstructured presentation of the documentation to the user. This again underlines the need for a way of reading structured subsets of the documentation. If it should at all be possible to extract subsets of the documentation, the documentation would have to be attributed or annotated in some way. This tagging or attributing would serve as meta-information, explaining the kind or type of documentation. This has the following consequence for the lucidator. The lucidator should have a framework for annotating or attributing the documentation with meta-information. This could be achieved through some kind of extendible markup language. The lucidator should supply some textual templates to aid the programmer in the writing of documentation.

27 2.5 Summary Views The recommendations from Storey et al. suggests that it will be necessary to work with different ways of viewing the documentation. We call this subset of documentation a view. Another important point when it comes to the presentation of the documentation is to be able to fit the presentation to the person who is viewing it. There are various reasons for this: People do not need the same level of information: In a software development project, the involved persons have different roles. A role may dictate which level of documentation the person needs. An example could be two programmers with different levels of competence, one being a project expert, the other having just been assigned to the project. The expert would want overview for easy navigation of the project, extra detailed information for new parts of the project and not so detailed information for the parts he himself has been working on. The newcomer would want detailed overview information in order to understand the structure of the program, and detailed information for parts of the project he has been assigned to. People do not need all the information: Documentation plays an important role during maintenance and rediscovery. A potential problem is that programs tend to be so large that they cannot be comprehended in their entirety. Fortunately this is not necessary. According to Katalin Erdös and Harry M. Sneed [Erdös and Sneed, 1998] it is only necessary to comprehend the documentation connected to the parts of the program which are affected by the maintenance request. Practically, this means that we do not need to show the complete documentation at all times, but only parts of it. Erdös and Sneed have identified a very small amount of information to be sufficient for a maintainer to comprehend and change a program. They only speak about source code and comprehension through the reading of source code. But the subset of information can be seen as a view of the documentation, a structured subset of documentation. Their approach is only one kind of view. From this and the cognitive models made by Storey et al. we extrapolate the usability of structured subsets of information or documentation. The multitude of possible view also calls for a possibility for the user to define views. If the lucidator has to have support of various ways of writing documentation and a multitude of ways for reading documentation is has the following consequences: The lucidator should support different views of the documentation. It should be possible to define new views based on the structure of the documentation and existing views. 2.5 Summary This section summarizes the consequences this chapter has on our lucidator.

28 20 Analysis We rely on a programming language with advanced abstraction features. These play an important role when documentation is written. The features are vital to the comprehension of the program, so the lucidator must be aware of these. The documentation produced, has to be able to reference to external documentation. We would, furthermore, like the lucidator to allow implementation of high- and low-level support for multiple users. We should aim for providing as much immediate pay back as possible, as this will increase the motivation to use the tool for the programmer. When we demand the programming environment to be known to or chosen by the programmer, this is also with motivation in mind. The lucidator should have a framework for annotating the documentation, as there exists numerous ways of comprehending documentation and the corresponding source code. The annotations would allow us to create different views on the documentation. It should be possible to define new views based on the structure of the documentation and existing views. Furthermore, the lucidator should supply some guidelines or heuristics to aid the programmer in writing documentation.

29 ÈÖÓ Ø ÓÙ In our Problemization we posed four questions we wanted to answer in this project. In the Analysis we have presented a theoretical foundation for the answer to these four questions. Next in the process, is the actual design and implementation of the lucidator. Since this report only describes the efforts made in the first half part of the thesis year, this report documents the first half of the project. Some of the questions posed in the Problemization will therefore first be answered in the report for the second half of the thesis year. The purpose of this chapter is therefore to describe which questions will be dealt with in this report and which will be postponed to the next, thus stating our project focus for this semester. The first question was: To which degree is it possible to create a tool, which supports Elucidative Programming, multiple users and the object oriented programming paradigm? This question is obviously important, and has to be dealt with in this semester. Our primary focus therefore is the creation of a prototype lucidator. Furthermore, such a prototype would also provide us with a solid foundation for further research in the next semester. Since we are creating a prototype, some limitations will be placed on the implementation. The first limitation will be that the prototype will be created for the Java language and not the object oriented paradigm as a whole. We will though, endeavor to make parts of the lucidator language independent whenever possible. The second limitation is that we will not use too much energy on making a true multiuser lucidator since we estimate this to be a rather big task. As with the choice of language, we will try to prepare the implementation for multiuser support in the future, thus we do not limit ourselves from creating a multiuser version in the future. 21

30 22 Project Focus The second question was: How can the tool be designed, to motivate the programmer in writing documentation? We find this question to be central and a natural consideration as observed in Section 2.2 on page 12 and it is therefore also considered a focus for this semester. As mentioned in our analysis there seems to be a lack of literature regarding this issue. Our solution to the problem therefore largely depends on our own experience as software developers. The third question was: How can templates be created in a way that will support a programmer in structuring his Elucidative documentation? In Section 2.3 on page 14 of the Analysis chapter we describe some of the theoretical foundations for structuring documentation, and mention templates as a solution to this. It is our belief, that in order for templates to be useful in the lucidator, it has to support them in a generic way, which makes it possible to the users of the system to, e.g., create their own templates. This will involve further literature studies and design and we have chosen to postpone this question to the next semester. The final question was: Given a structured documentation, to which degree is it possible to create views of the documentation, which extend the value of the documentation. This question is connected to the previous since, as described in Section on page 19, templates makes the foundation for realizing views. Since we have chosen to postpone the question of templates, we therefore also postpone this question. To summarize, we have decided to focus our project around creating a prototype lucidator for the Java language. This prototype will feature aspects of a multiuser environment and the question of motivating the programmer to use the system will be a central one. We have chosen not to focus on creating a generic method of handling templates and views, thus postponing these subjects to the next semester.

31 Ò In this chapter design of the current lucidatorprototype is presented. First, we present the general system architecture and continue with describing each of the components in detail. During the design phase a number of goals for our design are emphasized in order to ensure a good quality of the design. The choice of goals was made by a combination of our experience with design and what we have learned about good design during our education. The most important goals for our design is: component/module based design, usage of standards, use of interfaces/contracts. Furthermore, we choose solutions, whenever possible, that do not limit the lucidator to any specific language. Some aspects, though, are targeted for an lucidator supporting Java programming and documentation. Throughout this chapter we use one main example to visualize aspects of our design. It is an example of how a coffee machine can be modeled in Java. The source code and -doc for the example can be seen in Appendix C on page 89. At some places in the chapter the example is too big to be useful. In these cases we use a subset of the example. This subset will then be placed in a figure and a reference to the context of the subset will be made. 4.1 System Architecture We are inspired by the original Elucidator by Nørmark, for the primary functionality to keep source code and documentation separate and support the Elucidative Programming concept. The original Elucidator works by having a tool, which as input receives source code and documentation files, parses them and produces a set of HTML pages that can be viewed in a web browser. We want to have similar features, and also wish to provide these via a modular, flexible and more dynamic architecture. This architecture is illustrated in Figure 4.1 on the following page. It consists of three conceptual layers as illustrated these layers is briefly described in the following. 23

32 24 Design Data This layer contains all the information used in the lucidator. It is primarily the source files and the accompanying documentation. Furthermore, it contains the Data Model which is used to store derived information from both the source code and documentation. These are described in Section 4.2 and Section 4.3. Functionality This layer provides the core functionality of the lucidator. It consists of three components where the Abstractor extracts information from the source and documentation files and store it in the Data Model. The information is used by the Generator which uses the Query Engine to access the Data Model. The Generator can be seen as a server providing a set of services for the editor and browser. Data is generated on request, extracted dynamically from the Data Model, and delivered to either the editor or browser. These components are described in more detail in Section 4.4 on page 38, Section 4.5 on page 47 and Section 4.6 on page 48. User interface/interaction The user interface consists of an editor, and a web browser. Both rely on the Generator for services. The editor is used to create and edit both the source and documentation. A hypertext version of the documentation and source can be viewed in the browser. It is used for reading the documentation, viewing the source code and exploring the cross references. The following sections will now in more detail explain each component of the lucidator system. Editor Browser Interface User interface/interaction Generator Abstractor Query engine Functionality.java.edoc Data model Bundle Data Figure 4.1: The lucidator system architecture. The bold lines are data flow, and the thin are message passing.

33 4.2 Input Data Input Data The third requirement, by Nørmark, imposed on the Elucidator states the following (see Section 2.1 on page 5 for a listing of the Elucidator requirements): Requirement 3: The program source must be intact, without embedded or surrounding documentation. In order to fulfill this requirement, we choose to have two seperate types of files in the lucidator environment: 1. -doc files. 2. Java source code files. The Java source code files are ordinary source code files expect for some markup, introduced by the lucidator environment, placed as comments. The -doc files hold the documentation. These files are written in the -doc language which is a mark-up language we designed with the purpose of writing elucidative documentation. The elucidative documentation is produced by processing these two types of files. By placing the documentation and the source code in separate files we also introduce a need for an effective linking mechanism in the -doc language since the elucidative documentation involves linking from documentation to source. Figure 2.1 on page 7 illustrates this concept. In this section a closer look will be taken, at these files, and the languages they are written in Java Source Code The first type of input data is Java source code. It is ordinary Java source code except that we introduce the concept of source markers as in the original lucidator to be able to pinpoint more specific parts of the code. This section describes the issues concerning these markers and the naming scheme used for linking from the documentation to the source. Source markers and regions A vital part of the requirements for the Elucidator was that the source code was kept intact, without embedded or surrounding documentation. In general this requirement is respected, but we make a small deviation since in some cases it is not enough to make links to language constructs. This could be the case in large methods, where we would like to link to a subpart of the method (although one could argue that the method would have to be split up if it is that big). Another example could be anonymous inner classes, which do not have a name.

34 26 Design Nørmark solves this problem by using source markers to refer to a point in the source code [Nørmark, 1999]. The markers are placed in the comment areas of the source code. This means they do not disturb other applications reading the source, such as preprocessors and compilers. An alternative solution would be to store the positions of the locations externally and hence keep the source code intact. This would be a more ideal solution in regards to Requirement 3. Its disadvantage is the need for keeping the external data for the markers consistent with the source code. We choose to follow the pragmatic strategy where the markers are stored in comments. This solution does not interfere any tools, since they are placed in comments, and does not require much effort from the programmer to use. Source markers A source marker is placed inside comments in the Java files and should be as minimal as possible. Nørmark where x is a single letter unique inside a function. To use symbol in Java could create conflicts with the similar Java Doc 1 construct, so we choose to use a new syntax. We use a Extensible Mark up Language (XML) [Bray et al., 1998] inspired syntax where the marker is enclosed in and. We have to prefix the marker with something to distinguish it from other possible HTML elements in the source 2, we have chosen to use e: as a prefix. An example of a source marker would be e:x/ (see line 35 in Appendix C.1.2 on page 90). Note the postfix slash, which is required by XML when using single-tags. Nørmark allowed only a single letter to keep the comments related to the Elucidator at a minimum, but because names are easier to remember and refer to we allow any length of source markers. The name still has to be unique in some sense. We have discussed two possibilities: Global scope in a file Unique inside the normal scope rules for the Java language The idea about global scope in a file was quickly discarded since we seldom refer to the file in the documentation. It is more often in a specific class or method we want to pinpoint something. Therefore we have chosen to base the name scope for source markers on the scope rules for names in Java. Source markers which,, e.g., are inside a class or method are required to be unique. Source regions We introduce the concept of a source region as we would like to address more specific regions/parts of the source than a mark/point. Source regions could be specified by referring to two source markers instead of just one. We would however like to have a more concrete concept of a region. Therefore we once again use the syntax from XML element tags. These tags always appear in pairs, e.g., e:x 1 Javadoc is a tool from Sun Micro systems for generating API documentation in HTML format from documentation comments in source code. The documentation produced by this method is called Java Doc. 2 HTML is used in Java when documenting the interface via Java Doc.

35 4.2 Input Data 27 /e:x (see line 64 and 72 in Appendix C.1.2 on page 90) and the area they enclose is then defined as a source region. As it can be seen we use the same notation as for a source marker, with e: as a prefix for the region name. As in XML, source regions are allowed to be nested, but they must not overlap. Linking -doc to source When writing documentation in elucidative style, it is necessary to link from the documentation to the source code. The following will describe the naming conventions used to uniquely identify entities possible in Java and the problems related to such a naming convention. Java names and identifiers The Java Language Specification (JLS)[Gosling et al., 1996] describes declaration of packages, classes, interfaces etc. in relation to names and identifiers as follows: A declaration introduces an entity into a Java program and includes an identifier (paragraph 3.8) that can be used in a name to refer to this entity. [Gosling et al., 1996] Entities in Java, is referred to by a simple name or a fully qualified name. A simple name is a single identifier(coffemachine), while a fully qualified name is a name, a. token and then a simple name(appliances.kitchen.coffeemachine). At first sight it would then be natural to follow the naming conventions stated by the JLS, but this is not possible in our case as the meaning of a name is determined by its context. This is stated in the Java Language Specification, paragraph 6.2, as: In determining the meaning of a name (paragraph 6.5), the Java language takes into account the context in which the name appears. It distinguishes among contexts where a name must denote (refer to) a package (paragraph 6.5.3), a type (paragraph 6.5.4), a variable or value in an expression (paragraph 6.5.5), or a method (paragraph 6.5.6). [Gosling et al., 1996] Given that the context plays a role in defining the meaning of a name the names can be ambiguous when referring to declarations from out of context, even when using fully qualified names. This is the case with -doc files. The ambiguity arises because fields and inner-classes/interfaces can have overlapping names as illustrated in Figure 4.2 on the following page. The name Tube is defined multiple times as a class, field and type, but this is valid Java because of its context rules for names. If we want to refer to the inner class Tube out of context (e.g. in the documentation) we can use the fully qualified name, appliances.parts.weirdheatingelement.tube and the class is uniquely qualified. The problem occur when we want to refer to the

36 28 Design field, in this case called Tube. This cannot be done in Java without being in a context with a WeirdHeatingElement object. A solution would be to append the field name for its containing class, but this would create an ambiguity between the field and the class. package appliances. parts ; public class WeirdHeatingElement extends HeatingElement 5 class Tube // inner class named Tube... int Tube ; // field named Tube 10 Tube parts [ ] ; // field of type Tube public WeirdHeatingElement ( String name, int count )... Tube = count ; // refer to the field Tube 15 for( int count =0 ; count Tube ; count ++) // anonymous class extended from the inner class parts [ count ] = new Tube ( )... // methods and fields for the inner class 20 Figure 4.2: Overlapping names example. Conflicting names and the need to refer to source markers and variables inside methods leads us to the conclusion that another approach is needed to uniquely identify source entities from -doc files. Java source entity names The main problem with identifying Java names is that field and classes can have conflicting names. Thus we introduce a new separator character to distinguish fields, methods, parameters, variables and source markers from package, interface and class names. This character (at). By using this marker, we can distinguish between the field and class in Figure 4.2. The class will still have the name: appliances.parts.weirdheatingelement.tube, but the field will have the name: appliances.parts.weirdheatingelement@tube and hence be unambiguous. is also used to allow selecting parameters, variables and source-markers. A BNF grammar is shown in Figure 4.3. By using this naming standard the entities defined in the previous examples is as listed in Table 4.1 on the next page. Table 4.1 illustrates a few other issues of the naming of source entities. Long identifiers The example is a small class with short names, despite of that the unique name get rather large and will be tedious to write for a human. The reason for the names getting so long is because of the package hierarchy in Java, and the type of parameters is the only way to distinguish between overloaded methods in Java and is hence needed when referring to them by name. We have implemented mechanisms in

37 4.2 Input Data 29 <javaentityname> ::= <package> <class> <field> <method> <parameter> <variable> <mark> <package> ::= <dotname> <class> ::= [<package> "."] <dotname> <field> ::= <class> <name> <method> ::= <class> <methodname> <parameter> ::= <method> <name> <variable> ::= <method> <name> <mark> ::= (<class> <markname>) (<method> <markname>) <methodname> ::= <name> "(" [ <params> ] ")" <params> ::= <class> { "," <class> } <dotname> ::= <name> { "." <name> } <markname> ::= "<e:" <name> ">" <name> ::= letter { letter digit } Figure 4.3: BNF for the naming of Java entities. Curly brackets, and, means 0 or more elements. means or. Hard brackets, [ and ] means the element is optional. Name appliances.parts.weirdheatingelement appliances.parts.weirdheatingelement.tube appliances.parts.weirdheatingelement@tube appliances.parts.weirdheatingelement@parts appliances.parts.weirdheatingelement@weirdheatingelement(java.lang.string,int) appliances.parts.weirdheatingelement@weirdheatingelement(java.lang.string,int)@name appliances.parts.weirdheatingelement@weirdheatingelement(java.lang.string,int)@count appliances.parts.weirdheatingelement@weirdheatingelement(java.lang.string,int)@count Description The class The inner Tube class The Tube field The parts field The constructor parameter name parameter count variable count(conflict) Table 4.1: Names for parameters, variables and source marks.

38 30 Design the editor and the -doc-language to reduce the need for typing the complete names while editing. These are described in Section on page 32 and Section 4.8 on page 64. Duplicate variables We want our lucidator to be able to document the internal parts of a system and for this it can be useful to refer to variables. This is normally not a problem but Java does allow duplicate variable names in a method/constructor. In the example the identifier count appears as both a parameter for the constructor and variable in the for-loop. Java allows this, as the variable exists in another (non-named) scope and thereby shadows the parameter. A solution is to distinguish the names by prefixing them with a number, e.g., naming two count s as 1count and 2count. This is not a solution however, as the names will change dependent on the number of count s inside the method and not from their given context. We have chosen not to support this, as a method for creating unique and persistent names cannot be found. Another subjective argument is that the use of duplicate identifiers inside a method can be considered harmful. Source symbols without a name We have a general rule for selecting which abstractions in the source code that can be referenced by name. It should be possible to refer to it by some name in Java. This is not possible for many elements of the Java Language, such as single statements, operators, unnamed blocks etc. A more important part of the language is the anonymous classes that per definition does not have a name. We circumvent this problem by using source markers as explained in Section on page 25. This ends our discussion about the issues concerning the naming scheme used to uniquely identify any Java entity E-Doc files In the following the -doc language will be described. The second type of input data is the -doc files. These files contain the documentation. As mentioned earlier these files are written in the -doc language which is a language we developed to write elucidative documentation. The -doc language The design goal for the -doc language was to meet the basic requirements for the lucidator. This means that we need elements for structuring the documentation and elements for creating links between the documentation and the source code, as well as internally in the documentation. When designing our -doc language there are two possibilities for deciding the form and structure of the syntax:

39 4.2 Input Data 31 We can design our own syntax. We can base the syntax on an existing language. The first choice leaves us with complete freedom of deciding everything about the syntax (and the language). This means that we have the possibility to precisely target the language to our needs. The second choice gives us nearly the same freedom if we choose about our base language wisely. The base language can, of course, impose some limitation on our design of -doc language. More importantly the second choice will leave us with the benefits of using an existing language. These benefits include: Our programmer already knows or can relate to the syntax. There exists tool support for working with the syntax, e.g. an editor or parser. If the programmer already knows the syntax of the base language then it will be easier for him to adopt to our -doc language. This will in the end reduce the mental overhead introduced to the programmer by making him use a new language. Based on these arguments we decided to base the syntax of -doc language on an existing language. XML seems as an obvious choice since this language is fast becoming an industry standard, it is flexible and, for programmers with Hypertext Mark up Language (HTML) experience, and the syntax is familiar. Furthermore, a lot of tools for various tasks involving XML exists. Examples are: editor support in Emacs, parsers, and XSL processors which can transform the contents of a XML document using a style sheet. As an additional benefit we can express our -doc language via a grammar such as a Data Type Definition (DTD), thus providing a formal definition of the language. The DTD for our current -doc language can be seen in Appendix A on page 81. Text structuring elements Three basic text structuring elements exists in our -doc language (the use of these elements can be seen in Appendix C.1.2 on page 90): edoc : This element are used to specify the beginning and end of an -doc document (see line 7 and 136 in Appendix C.2 on page 93). This is the outermost element for textual contents. Between the begin ( edoc ) and end tag ( /edoc ) any number of chapter elements can be placed. chapter : The top level paragraph element for textual contents (see line 10 and 135 in Appendix C.2 on page 93).

40 32 Design section : This element is used for fine graining the structuring of the documentation (see line 26 and 60 in Appendix C.2 on page 93). The section element can be placed inside a chapter or another section, thus allowing for nested sections. Common for these three elements is that they can all have a title and author (see line 11 and 12 in Appendix C.2 on page 93) element in the body. They are used for specifying the title and/or author of a particular piece of documentation. The elements can be seen as attributes to the three elements, but they are modeled as separate elements in order not to clutter the elements 3. The chapter and section elements furthermore has an optional label attribute (see line 10 in Appendix C.2 on page 93). The attribute is used as an anchor when making references internal in the documentation. Linking elements One of the key properties of Elucidative Programming is to be able to create links between the documentation and the source code, as well as internally in the documentation. Furthermore external documentation, such as Java Doc and other may exist. It would be an advantage to allow linking to these. The parameters has to be considered when deciding on the syntax of linking. We see two approaches: Have only one element. The type of the target of the element will then decide which link type it is, e.g., HTML style, a href=" a href="mailto:e1207a99@cs.auc.dk". Have a set of elements, corresponding to the number of different link types. The first approach seems to be the simplest since we do only need one link element. However, a closer look shows that this solution is not that simple. Since we do only have one element, it will be the target of the link which specifies the type of the link 4. This means that we will have to introduce an interpreter in order to decide the type of the link. Furthermore we add mental load to the programmer since he will have to parse the link to find out which type it has. A solution to this could be to have an attribute to the element, stating the type of the element. While this solution would work, we do not see it as an usable solution, since too many attributes on an element, add to the mental load, as stated in section on the preceding page. Instead we choose to let us inspire by the Doc Book [Walsh and Muellner, 1999] and HyTime [ISO, 1997] projects. These use different elements for different kinds of links. This means 3 Too many attributes on an element makes it very difficult to comprehend while writing/editing the text. 4 We need to know the type of the link since the action to be performed when using the link vary, depending on the type of the link.

41 4.2 Input Data 33 that the programmer can easily distinguish between the different kinds of links, and we do not need the before-mentioned interpreter. We therefore introduce three kinds of linking elements, corresponding to the types of linking described above: slink : Used for making links between the documentation and the source code (see line 30 in Appendix C.2 on page 93). dlink : Used for making links internally in the documentation (see line 16 in Appendix C.2 on page 93). xlink : Used for making links to external documentation (see line 12 in Appendix C.2 on page 93). All three elements have an href attribute which specifies the target of the link. The contents of the attributes vary, depending on the type of link. For xlink s the href attribute is an URL specifying where the external documentation is placed (it is then the responsibility of the browser to show the external documentation, be that in the browser or by launching an external program). For the dlink s the href contains labels of chapter s or section s. Finally, the contents of slink s is a fully qualified name of a source entity in the source code (see Section on page 27 for information on fully qualified names). Source base and aliases As described in Section on page 27 the fully qualified names tend to be long. If the programmer will have to write these long names each time he wants to make a link, odds are, that he will not make the links. We therefore have to provide some kind of mechanism to help with this problem. We therefore introduce the concept of a source base; the sbase attribute (see Figure 4.4 for an example). A source base works in a similar way as the base element in HTML. It is used as a prefix for all slink s, and it is valid inside the element that has the attribute (only chapter and section elements can have this attribute). Children of an element inherits the source base if they do not have a source base them self. To allow for direct referencing to something outside the source base, the reference should start with the prefix // (see line 66 in Appendix C.2 on page 93). section label = sec : s p e c i f i c p a r t s sbase = appliances. kitchen. t i t l e Coffee machine parts / t i t l e p 30 The s l i n k role = weak href = CoffeeMachine CoffeeMachine / s l i n k class is r e a l i z e d trough a number of subcomponents. These are : Figure 4.4: Example of some -doc that uses sbase to simplify writing of links to source. The sbase is used as a prefix in the slink tag to refer to appliances.kitchen.coffeemachine Since some classes are often used in the fully qualified names, e.g., java.lang.string as a parameter to a method, it would be an advantage to have some sort of aliasing

42 34 Design mechanism. By using XML we get this for free, since XML provides us with the!entity element. With this element we can make aliases and use them in our - doc documentation (see line 4 and 5 in Appendix C.2 on page 93). The expansion of the aliases will then be done by the XML processor. Roles The slink and dlink furthermore has a role attribute (see line 16 in Appendix C.1.2 on page 90). This is used for specifying whether the link is strong or weak. A strong link is used when the programmer wants to explain something in detail, while the weak link is used when he just wants to mention something, but not describe it in detail. 4.3 Data Model The responsibility of the Data Model is to store the derived information provided by the Abstractor, and to allow the Query Engine to retrieve this derived information. In this section the considerations behind the design of the Data Model are presented, and our current implementation of the Data Model is described Derived information Derived information is information which the Abstractor is able to derive from the Java source code and -doc files (see Section 4.4 on page 38 for a detailed description of the Abstractor). In this section we will take a look at the nature of this information, and present a general model for structuring the information. Inspired by the Chava [Korn et al., 1999] and Ciao [Chen et al., 1995] projects we have chosen to store the derived information in a entity/relationship-model. One of the big advantages of this model is that it is language independent. This means that, although we use it for Java source code, we will not have to change the Data Model if we decided to change our programming language to, e.g., C++ or Pascal. This furthermore means that we can use the same model for our source code and -doc documentation. As the name of the model suggest the derived information falls into two categories: Entities: An entity is an abstraction of a language construct derived by the Abstractor. Since the Java language and the -doc language have different rules, we introduce two types of entities: Source entities and Document entities. Source entities, for the Java language, would be language elements, such as classes, methods, fields etc., while Document entities would be elements from the -doc language, such as edoc, chapter and section etc. Extraction of information about the language constructs is done with the following purposes in mind: 1. We want to be able to represent relationships:

43 4.3 Data Model 35 (a) between source code and -doc (b) internally in the source code and (c) internally in the -doc 2. We want to be able to search for a specific instance or instances. This means that we must be able to make every instance of the entities unique through a key or unique name. Furthermore, in order to locate the entities in the source code or -doc files we need their exact location in the file. This information together with other relevant properties for an entity, is derived by the Abstractor, and represented as attributes on the entities. For a source entity of type method, this are e.g., parameters, return data type, start position in the source code file etc. The chosen attributes for the current implementation can be seen in the three tables: Table B.6 on page 88, Table B.2 on page 86 and Table B.5 on page 87. Relationships: A relationship is a way of modeling that two entities can be related to each other. Each relationship has a kind attribute, to specify which kind of relationship exists between the two entities. Three types of relationships exists in our Data Model: 1. Source to source relationship: Typical kinds of source to source relationships could be: inheritance, method invocation, method overloading etc. 2. Document to document relationship: A typical example for this type of relationships would be containment. 3. Document to source relationship: The kind of this relationship will always be slink, since this relationship expresses that you can create links from the documentation to the source code (see Section on page 32 for more information about slinks). As for the entities the relationships also has attributes. Here the attributes specify, e.g., the kind of the relationship, and where in the source code the relationship exists. The chosen attributes for the current implementation can be seen in the three tables: Table B.1 on page 85, Table B.4 on page 87 and Table B.3 on page Storage of derived information While the Abstractor is creating the derived information, our Data Model has to provide it with a way of storing the information. Different solutions are possible. In this section we will take a look at the most obvious, while the following section will present how the Data Model is currently implemented. In our opinion three storage possibilities comes to mind: Main memory We could store the entities and relationships as objects in main memory. This would ensure fast access to the data, but on the other hand it might take up too much memory when working on a large project. Furthermore, main memory is not

44 36 Design persistent, so if the program crashes we would have to run the Abstractor again in order to recreate the information. A dump of the objects to disk could, to some extend, solve this problem, but it would add a level of complexity to the method and furthermore undermine the main advantage of this method by increasing the access time to the objects. Files Another solution could be to store the data in files. This could, e.g., be dumps of the objects from main memory, or plain text files, with, e.g., a comma separated list for each entity and relationship. Compared to keeping the data in main memory this method is much slower but takes less memory. The biggest disadvantage with this method is when it comes to querying. It is not an easy task to retrieve data, based on certain rules, from random locations in a plain text file. Relational database Our third storage opportunity would be to use a relational database. This method ensures that the data is written to disk, while at the same time providing us with a strong and, compared to just using files, flexible method of querying for data (SQL). A relational database will, like plain text files, be slower than just keeping the entities and relationships in main memory, but it is not that difficult to introduce a cache layer between the lucidator and the database, in order to speed things up. By looking at these arguments it is clear that files are not very attractive since a relational database can provide us with the same benefits when it comes to persistence of the files, while at the same time providing a much better way of querying for data. The major downside of using a relational database is the requirement of installing and configuring the actual database. We still choose to use a relational database, for storage of our Data Model because we think that the advantages outweigh the disadvantage Current implementation As mentioned we have chosen to store our Data Model in a relational database, but since this project is running for another semester we would like to make our design so flexible that the storage method can easily be changed, should that be necessary. The functionality of the Data Model falls into two categories: 1. Methods for creating entity and relationship objects. 2. Methods for committing these objects to persistent media. The first category is independent of the storage model while the second relies on it. We have created a primary Data Model class, called AbstractBundleController which contains the implementation of the first category, as well as abstract methods of the second category. We then demand that a storage model specific class is made, which extends the Abstract- BundleController and implements the abstract methods. In our implementation this class is called DBBundleController. In Figure 4.5 an illustration of the design can be seen.

45 4.3 Data Model 37 <<interface>> RelationshipFactory <<interface>> EntityFactory AbstractBundleController DBBundleController Figure 4.5: Design of the Data Model. Storage model independent functionality is implemented in the AbstractDundleController class while storage model dependent functionality is implemented in extended classes (in this case the DBBundleController class). Choice of database As for the choice of database to use for the implementation, we have a lot of options. At the department 5 three relational databases is available: Oracle, Informix and MySQL. Other databases, such as PostGreSQL could also be used, the only demand is that a Java DataBase Connectivity (JDBC) implementation must exist for that particular database, which is the case for all the before mentioned databases. We chose to use MySQL[MySQL, 2000] since this is a free, and more importantly, a fast database. A switch to another database would however just be a matter of using another driver when connecting to the database. Design of the database The database has six tables: SourceEntity, SourceRelationship, DocEntity, DocRelationship, DocSourceRelationship and FileEntity. Each table represents an object type, meaning that the SourceEntity table stores SourceEntity objects, and so on. Each entity or relationship is represented as a tuple in the corresponding table. This means that each column in the tables corresponds to an attribute in the objects. The DocSourceRelationship table is a bit special since it contains information derived from the DocEntity and SourceEntity tables by making a join. This table is therefore not strictly needed, since one could just make the join when querying, but we have chosen to make the table for three reasons: uniformity with respect to the types of relationships in our model, faster database query, and to make it possible some time in the future, to have slinks which has multiple targets. The layout of the six tables can be seen in Appendix B on page 85. The layout of the tables is as earlier mentioned inspired by the Chava [Korn et al., 1999] and Ciao [Chen et al., 1995] 5 Department of Computer Science at Aalborg University, Denmark

46 38 Design projects. Furthermore an example of the information to be placed in the tables can be seen in Appendix C.3 on page Abstractor The following section will describe the design issues regarding the process of collecting the derived information needed for the lucidator. This process is controlled by the Abstractor. The purpose of the Abstractor is described first and then the two parts of the Abstractor, EdocAbstractor and the JavaAbstractor, is discussed Purpose of the abstraction The generator needs information about the structure and relationships between elements to allow querying and browsing both the documentation and source. The abstractor actually consists of two sub-abstractors one for java (JavaAbstractor) and another for -doc (EdocAbstractor). Each sub-abstractor will receive a list of files from which they extract the needed information and store it in the database. Besides the extraction of entities and their relationships, the abstractor also has to extract the exact positions of their occurrence. This is to support markup of the source code and to allow positioning of query-results in the browser and editor Abstraction of Java Java source code has a higher complexity than the documentation files which has a simple structure. The Abstractor needs to resolve the type of names and there is more complex relationships in the source code than in the documentation. Source entities The reasoning for the choice of source entities to be extracted from the Java source code, is that the lucidator should support documentation of all implementation details. To support this the Abstractor extracts all elements in Java that can be uniquely identified with a name. These include all the major constructs in Java but also parameters and variables as these can be referred to by a name as described in Section on page 27. The set of source entities extracted from the Java source code is inspired by the Chava tool [Korn et al., 1999]. The JavaAbstractor differs from the Chava tool by having to support the Java 1.1 language, as Chava only supported Java 1.0. The biggest difference between the two is the inclusion of inner classes/interfaces and anonymous classes in Java 1.1.

47 4.4 Abstractor 39 Chava extracts the major Java structures: package, class, interfaces, fields and methods. The JavaAbstractor also extracts inner classes/interfaces, anonymous classes, method parameters and variables. Furthermore the JavaAbstractor has to extract information about source markers and regions. Information about the scope of an entity, parameter list, methods return type, etc. is also collected and stored as attributes. A complete list of source entities found by the current implementation of the JavaAbstractor is shown in Table 4.2. Inner classes/interfaces is not distinguished from their normal counterpart by the Abstractor. Thus their kind is set to either class or interface respectively. Kind package class interface method constructor field parameter variable sourcemark endmark Description A set of classes and interfaces Contains declarations and definitions of methods and fields. Classes can extend one class and implements zero or more interfaces. Similar to classes, but contains no definitions. Interfaces extends zero or more interfaces. A function that is part of a class or interface. The constructor of a class. Variable or constant that is part of a class or interface. A parameter listed on the parameter list on a method or constructor. Variable declared inside a method or constructor. A source mark or start of a source region end of a source region Table 4.2: List of the different kinds of entities found by the current implementation of the JavaAbstractor. A subset of the data extracted when using the Coffee Machine example is shown in Table 4.3. The data is generated from the ElectricalAppliance class from appendix C.1.7 on page 92. Id Kind Name Idname 1 package appliances.general appliances.general. 5 class ElectricalAppliance appliances.general.electricalappliance 6 constructor ElectricalAppliance appliances.general.electricalappliance@electricalappliance(java.lang.string) 7 parameter worldpart appliances.general.electricalappliance@electricalappliance(java.lang.string)@worldpart 8 method switchon appliances.general.electricalappliance@switchon() 9 field voltage appliances.general.electricalappliance@voltage. Table 4.3: Subset of the abstracted entities from the ElectricalAppliance class. The source code for the class can be found in appendix C.1.7 on page 92. Generation of id-names As described in Section on page 27, a naming scheme has been defined to allow unique identification of all source entities. This scheme is followed by the Abstractor and is, in order to simplify lookups, used when storing the each source entity,.

48 40 Design Source relationships As with the source entities the extracted relationships is inspired by Chava. The JavaAbstractor extracts similar relationships, but uses a more varied set. Table 4.4 shows a list of the relationships found by the current implementation of the JavaAbstractor. Kind containment extends implements access invoke throws typeof returntype Description If a source entity A contains a definition of source entity B, then a containment relationship exists from A to B. If a class/interface A extends class/interface B, then a extends relationship exists from A to B. If a class A implements interface B, then a implements relationship exists from A to B. If a method/constructor A access field, variable or parameter B, then a access relationship exists from A to B. If a method/constructor A invoke a method B, then a invoke relationship exists from A to B. If a method/constructor A throws a class B, then a throws relationship exists from A to B. If a field, parameter or variable A is of type B, then a typeof relationship exists from A to B. If a method A returns a source entity B, then a returntype relationship exists from A to B. Table 4.4: List of the different kinds of relationships found by the current implementation of the JavaAbstractor. Chava has containment, subclass (extends), implements, field read/write (access), and reference (invoke). The JavaAbstractor extends this list to have a more complete view of the relationships in Java. The extra relationships found by the JavaAbstractor are: throws, typeof and returntype. With these relationships it is possible to categorize the use of a source entity and thereby get a better understanding of the source code. In Section on page 52 on page 57 it is illustrated how the lucidator uses these relationships to allow browsing the source code via different categorizations of the relationships. Static vs. Dynamic relationships Another aspect is the dynamic features of the Java language. These features include reflection and polymorphism. Reflection makes it possible to dynamically load classes and execute methods which name and type is not known at compile time. Using polymorphism, methods can be overridden and field/variables can refer to subclasses of its type. This means that the actual method being invoked is first decided at runtime (late binding).

49 4.4 Abstractor 41 Such features imply that it is not possible to capture all the possible relationships in the source code by only performing a static analysis. Even with access to dynamic runtime information it would not give a precise result as the bindings can change over time and between runs of the application. The JavaAbstractor only performs a static analysis of the source code, consequently the abstracted invoke relationships will only represent the relationship defined via the static type information. A pragmatic remedy for this is the possibility of extracting an override relationship between a method and its super method. This would make it possible to extract a list of the possible methods a given method call can invoke. This relationship is not currently being extracted by the JavaAbstractor. As an example is shown in Table 4.5 contains an subset of the abstracted relationships present in the ElectricalAppliance class from appendix C.1.7 on page 92. Id Fromid Toid Kind containment access access access containment throws containment extends Table 4.5: Subset of the abstracted relationships from the ElectricalAppliance class. source code for the class can be found in appendix C.1.7 on page 92. The Current implementation Java is an object oriented language with complex rules for its semantics and type-system. These rules are defined in the Java Language Specification[Gosling et al., 1996] and implemented by Java compilers. The currently available compilers do not extract all the information needed for the lucidator. It only generates a set of class files. These class files contain much information, but does not contain the precise position information for any definitions or statements, the comments in the source files are not included either and both are needed for our purpose. The JavaAbstractor has to parse Java and therefore has to be more or less complete in its coverage of the language. This is not a trivial assignment and thus we tried to find an existing tool that could parse and extract the type information from Java files and at the same time was extendible and allowed us to modify it for our needs. Fortunately there exists such a tool, called Kopi Java Compiler (KJC) [Gay-Para et al., 1999].

50 42 Design It is a fully functional open source Java compiler released under the GNU General Public License [GPL, 1991]. It is implemented in pure Java and is based on a LL(k)-parser grammar defined and implemented via a modified version of the ANTLR tool [Parr, 1999]. The KJC tool is highly modular and works by parsing Java files and building up corresponding decorated abstract syntax trees (AST). These AST s are traversed multiple times to perform different semantics and type checks. The byte code generation is also done via an AST traversal. The JavaAbstractor actually uses all of these traversals, except the code generation traversal which is replaced with an abstraction traversal. To simplify the implementation we implemented the visitor pattern [Gamma et al., 1996] in KJC 6. The visitor implementation allows traversal of an object structure without modifying the implementation of the classes being visited. Even with the visitor pattern we needed to modify some internal parts of the KJC tool as it did not extract the exact positions of some parts of the needed source entities from the source code. Support for collecting source markers also needed to be implemented. The time used on these modifications were minor in contrast to what a ground-zero development of a similar tool would have taken. An overview of the JavaAbstractor process is shown in figure 4.6..java KJC KjcVisitor Data Model AST with types Figure 4.6: Data flow though the JavaAbstractor. The Java files are parsed by KJC which results in an AST. This AST is traversed and thereby stored in the Data Model. Limitations of the current implementation The JavaAbstractor is fully functional as it can parse and abstract a high amount of useful information from Java 1.1 files. The Abstractor is only a prototype, and there is some limitations we have identified that are worth mentioning. Only source code The Abstractor only creates source entities for the source code it actually parses. Hence it does not store entities for, e.g., java.lang.string class, except if the source of java.lang.string is included in the parse. This also means that the relationships from a field of type java.lang.string to the class is not stored either. The Abstractor actually find the relationships, but because the Abstractor never creates the source entity it cannot refer to the matching entity. 6 Which has been included in the latest release of KJC.

51 4.4 Abstractor 43 Arrays Arrays are not completely supported. For example the entity machines in CoffeeMachine machines[] will be abstracted and related to the entity representing the class appliances.kitchen.coffeemachine. It should actually be related to the array type named appliances.kitchen.coffeemachine[]. This is somewhat related to the previous limitation as there actually not exist any source code that explicitly defines the array class entity even if the source to appliances.kitchen.coffeemachine is included in abstraction process. Initializers Initializers is anonymous blocks of code in the body of a class that is used for initialization and construction of the class and its instances (objects) [Gosling et al., 1996]. It is illustrated in Figure class CountCoffeeMachine extends CoffeeMachine 5 static int instancecount ; static // this is a static initializer instancecount =0 ; 10 // this is a normal initializer instancecount ++;... Figure 4.7: Example of static initializers. An initializer is invoked when an object is instantiated and before the constructor is called. There can be multiple constructors in a class and hence the block does not have a unique constructor that it can be related to. The static initializer is correctly associated to its defining class (CountCoffeeMachine), but the normal initializer does not have a single constructor to be associated to. If there had been constructors a fully implemented Abstractor could relate it to all constructors via a refersto relationship in order to be correct. Incremental update Effective use is an essential factor for any tool, and the Abstractor is no exception. Currently, the Abstractor works like a batch-job that needs to be run on all the source files even if there has only been one minor change. This can be solved by implementing an incremental abstraction/update. The Data Model does not limit this in any way, but we have chosen not to implement such an algorithm in our first prototype Abstraction of E-doc -doc files contain the documentation of the source. This documentation has a fairly simple structure and primarily consists of chapters, sections and links.

52 44 Design Document entities The elements defined by edoc, chapter, section and the three different kind of link-tags are the only elements stored as document entities. These are the primary elements in the -doc language. The definition of the -doc language allow the files to contain almost any kind of markup. This can be used to insert other information for example HTML for rendering tables and pictures. These secondary elements are unknown to the lucidator and the EdocAbstractor does not have any knowledge about their semantics and therefore only consider the -doc language element set as defined in appendix A.1 on page 81. A list of all the elements of the -doc language and thereby the possible document entities can be seen in Table 4.6. Kind edoc chapter section xlink dlink slink Description Contains chapter and sections. Contains sections. Contains sections. Refers to an external document. Refers to an internal labeled chapter or section. Refers to a source entity. Table 4.6: List of the different kinds of entities found by the current implementation of the EdocAbstractor. Elements and attributes ( title, author, role etc.) associated to the primary elements by the -doc grammar are stored as attributes in the Data Model. A complete list of the attributes can be see in Table B.2 on page 86. Attributes for a primary element that are not defined in the grammar are stored as a comma separated list in the attlist. An example of the abstracted entities is shown in table 4.7 on the next page. The data is abstracted from the -doc file in appendix C.2 on page 93. Note that not all data is present in the table since the positioning data and some tuples has been removed for the sake of simplicity. Labels chapter and section elements can be given a label that can be used as a destination (href) for a dlink. These labels are optional, but have to be unique in the whole document. This is all controlled and validated by the EdocAbstractor, but because the labels are optional, an element does not always have a label that can be used as unique key. A similar label/key is also needed for each link. Keys are primarily used for anchors in the documentation produced by the generator to allow for precise hyperlinking. Thus the same key generation scheme presented here is used by the Generator when it produces the documentation. The abstractor therefore generates a unique key for each non-labeled entity.

53 4.4 Abstractor 45 Id Kind Label Title 1 dlink cha:main.dlink1 architecture 2 slink cha:main.slink1 EvaTrio (TM) 3 dlink cha:main.dlink2 Coffee machine parts 4 dlink cha:main.dlink3 External parts 5 dlink cha:main.dlink4 error handling mechanisms 6 slink sec:specificparts.slink1 CoffeeMachine 7 slink sec:specificparts.slink2 CoffeeContainer 8 slink sec:specificparts.slink3 addfilter 9 slink sec:specificparts.slink4 filter 18 section sec:specificparts Coffee machine parts 23 section sec:externalparts External parts 35 section sec:architecture Architecture 37 section sec:error Error handling 38 chapter cha:main Elucidative documentation for a Coffee Machine Table 4.7: Subset of the abstracted entities from the CoffeeMachine.edoc -doc file. The file can be seen in appendix C.2 on page 93. Document relationships -doc documents have a simple set of relationships. The primary elements only contain or refer to other elements. The abstractor creates a containment relationship between an entity and its parent, e.g. a section has a chapter as its parent. A refersto relationship is created between a dlink element and the chapter or section it cross-references. Similar the abstractor stores a refersto relationship between a slink and its corresponding source entity. The relationships defined by xlink which refers to external documentation is not stored in the Data Model. A list of the relationships found by the current implementation of the EdocAbstractor is listed in Table 4.8 on the next page. In Table 4.9 on the following page a subset of the information abstracted from the CoffeeMachine.edoc file is presented, while Table 4.10 on page 47 show a subset of the relationships found between the -doc file and the source code. The CoffeeMachine.edoc file can be seen in appendix C.2 on page 93, and the source code is listed in appendix C.1 on page 89. Kind containment refersto Description If a document entity A is defined in document entity B, then a containment relationship exists from A to B. If a dlink or slink A refers to a document entity B, then a refersto relationship exists from A to B. Table 4.8: List of the different kinds of relationships found by the current implementation of the EdocAbstractor.

54 46 Design Id Fromid Toid Kind containment containment containment containment refersto refersto refersto refersto refersto refersto Table 4.9: Subset of the abstracted relationships from the CoffeeMachine.edoc file. The file can be found in appendix C.2 on page 93. Docid Sourceid Role 25 5 strong 7 11 strong 8 13 strong 6 21 weak 9 48 weak Table 4.10: A subset of the relationships between the CoffeeMachine.edoc file and the source from appendix C.1 on page 89.

55 4.5 Query engine 47 Implementation -doc files are valid XML documents and for parsing this, the free XML4J [AlphaWorks, 1998] parser from IBM is used. At the start of the lucidator implementation this was the best supported parser available. It also supports the widely used standard API s for extracting information from XML: DOM and SAX. By following either of these standards it will not be hard to replace the parser if needed. The DOM API is based on the Document Object Model [Apparao et al., 1998]. It works by building up a tree of element-nodes that an application can traverse. This gives the application maximum flexibility as it can manipulate the tree as it pleases. The disadvantage is that it requires the whole document to be read and stored in memory as a tree. Simple API for XML(SAX) [Megginson, 1998] is an event-based API that generates events each time an element in the XML file starts or ends. This require the application to keep track of how the element is structured, but it also give a much smaller memory-footprint than the DOM API as it does not build up a complete DOM tree. A goal is to minimize time and memory resources for the abstractor and it only needs to extract information about what attributes the elements have and their structural relationship. For this the SAX API is suitable and sufficient. The abstractor thus registers for SAX events with the XML4J parser and parses the -doc file. The EdocAbstractor then reacts to the events and by doing so it can keep track of the entities and relationships. It stores its collected data in the Data Model at the end of the abstraction. The process is illustrated in figure 4.8..edoc XML Parser (XML4J) SAX events (return) Edoc Abstractor Data Model Figure 4.8: Data flow though the EdocAbstractor. The -docfile is parsed by the XMLParser and the derived information is stored in the Data Model via the EdocAbstractor. 4.5 Query engine The purpose of the Query Engine is to provide information to the generator, for usage in layout and navigation. This information is a specific subsets of the derived information stored by the Data Model (see Section 4.3 on page 34 for details concerning the Data Model and derived information). When designing the Query Engine two different approaches was considered: Provide a general query interface: The idea of this approach is to provide the users of the Query Engine with maximum flexibility by allowing him to make queries directly to

56 48 Design the storage model. This approach has a major setback. If we, as mentioned in the Data Model section (Section 4.3 on page 34), want to be independent of the storage model used, we will have to design and implement a query language. This is not a trivial task. A more pragmatic solution, seen in the light of our current implementation of the Data Model, would be to allow the user directly access to the database. Provide specific methods for specific queries: Another approach could be to identify the needs for querying and then implement specific methods for these. This method will ensure that the storage model stays hidden, since adjusting the Query Engine to a new storage model will just be a matter of overriding the methods Current implementation Our currently implemented Query Engine uses the second approach. We therefore have identified a number of queries needed by the generator, and implemented these. We have tried to make these methods as flexible as possible, by allowing the user of the Query Engine to control the query by its parameters. Examples of these methods are: A method for finding all the document entities contained in another document entity. A method for finding all the slinks which points to a given source entity. A method for finding all the source entities used by another source entity. A method for finding all the source entity which is using a given source entity. Many other method exists, and since these methods are basically nothing more than a mapping to a SQL statement, the Query Engine could easily be extended with new method should that be necessary. 4.6 Generator In this section the architectural considerations for the design of the Generator are presented, as well as the current implementation. The user interface will be dealt with in Section 4.7 on page 58 and 4.8 on page Responsibilities of the Generator The Generator provide the user interface layer with structured information and facilities for navigating the documentation, the source code, as well as the relationships between the two. Furthermore, it provide information needed for the editor, to allow easier linking between the

57 4.6 Generator 49 documentation and the source code. The information is retrieved via the Query Engine and the Java and -doc files. Three types of information has to be delivered to the user interface layer: Hypertext documentation and source code The Generator has to present the documentation and source code files as hypertext. For every such file registered in the data model, the Generator must be able to supply a hypertext version of the file. The hypertext document should contain information that makes it possible to follow hyperlinks from a document entity, like a source link, to the matching source entity. At the same time the hypertext should contain links from source relationships to the source entity in question. The hypertext documents are not only meant as a one-to-one mapping of the documentation and source code files. The Generator should also supply subsets of specific documents e.g., a table of contents for the documentation, or an index of classes or methods in the source code files. Navigation Besides doing hyperlinking, the Generator has to supply hypertext documents for advanced navigation in the documentation and source code. From a source entity there exists a potentially large number of relevant destinations for a hyperlink. There exists a destination for every source relationship the source entity is a part of, and one for every document-source relationship. For every document or source entity the Generator has to supply one or more hypertext documents that contains the hyperlinks for the relevant destinations. The idea of the navigating in hypertext documents is illustrated in Figure 4.14 on page 59. We will explain the navigation window further in Section on page 61 Easy generation of links in the editor The Generator should support the editor in making it easy for the programmer to generate links from the documentation to source entities. For this to be possible, the Generator has to generate lists of source entities containing their id and some kind of description. These lists would then be presented to the programmer, which would then be able to choose the right source entity. This list should be categorized by source entity kind, in order to reduce the information presented to the programmer. An alternative would be to allow the programmer to mark a source entity in a source code, and ask the editor to create a link to that source entity. This makes it easy for the programmer to create a link, but we do not consider this approach at this time. It would require that the editor and Generator identified the marked source entity. This would require that the source code files are parsed, if they have been changed since the last abstraction of the source code. This approach is computationally heavy, and would be endangered by the fact that the source code must be without compile errors, which is not always true in the development stage of a project.

58 50 Design Architectural considerations Before presenting the current architecture and implementation, we consider the choice of architecture and technology to rely on. In this section we will present and discuss these. Static vs. dynamic information When deciding in which way the Generator is to produce the information for the user interface, two options are considered: Statically generated The first option is to statically generate and store all the needed hypertext files i.e., just after the Abstractor is run. This is Nørmark s approach in his Elucidator. This would result in a static presentation of the documentation and source code in both the browser and editor since no computation has to be made in order for the Generator to deliver the hypertext to the user interface. We think this approach has two main disadvantages: 1. The amount of information to be generated would be very large compared to the size of the database, the documentation, and source code files. We do not only need one file for each documentation and source code file, but two or three files with cross references will have to be generated for each document and source entity. 2. It is likely that a large part of the generated information would never be viewed by the user. Generally only small parts of the documentation is browsed between abstractions, and the problem will increase as the size of the software project increases. It is even more likely, that very few of the source entities would be used for navigation, compared to the amount of information available. Dynamically generated Another option under consideration is to make the Generator work in a dynamic manner. This would mean it computes hypertext documents each time it gets a request for them, thus working as a server. The Generator would then upon each request, fetch the required information from the Data model, retrieve the relevant documentation or source code, process these and respond with the result in a format understandable by the user interface. This would eliminate the need for further processing after the Abstractor had finished. However this approach also has disadvantages: 1. Some processing will have to take place during browsing of the documentation and source code, which can potentially result in a slower user interface. 2. The exact same information could be generated several times. 3. Dynamic generation can complicate static publishing of the documentation.

59 4.6 Generator 51 The two first problem can be somewhat limited by introducing a cache layer to the Generator. The third problem can be solved by using a robot, that from a hypertext document follows all possible links in a recursive manner and saves the documents. On the other hand this solution has a reduced need for processing, under the assumption that only a fraction of the available information is going to be seen by the user. It would be easier to change the online layout/form of the browsing, without processing the entire set of files again. In this project the advantages of the dynamic solution greatly outweighs the advantages of the static solution. The major reason being that there is no need to produce a lot of information, which is never going to be seen. The flexibility of the server solutions also appeals to us, because it is possible to change the way the documentation and source is presented. As mentioned there could be a potential performance problem with dynamic generation compared to the static solution. We do not expect a problem, but it is left to testing to see if the processing time of the dynamic solution is adequate. Therefore we choose to build the Generator as a component that is accessible through an ordinary web server. A web server provide us with a standard protocol and framework for communication and transferring data/documents. The need for a web server has the disadvantage of requiring its installation and configuration. We choose to see this as a minor disadvantage compared to the benefits gained by using a standard web server. Data format There is a need for transfer information in the internal parts of the Generator, e.g., when retrieving data from the data model and during the processing of source code. We chose to use XML as the internal data format for the Generator. This has the same advantages as described in Section on page 30, i.e., it is a standard and a growing number of tools are available. Even more important is the possibility to easily transform structured XML into other formats. Such transformation can be expressed via a stylesheet written in the Extensible Stylesheet Language (XSL) 7. When transforming XML an XSL processor is used to apply an XSL stylesheet to the XML document. This results in a transformed output with respect to the rules stated in the stylesheet. Because of XSL, it is therefore easy to convert the internal data format of the Generator into HTML when delivering to the browser, and another format when delivering to the editor. It is furthermore fairly easy to change the delivered format, if one should e.g., want to use another editor or browser which requires a different data form at, since this will just be a matter of writing a new stylesheet. 7 Actually XSL contains both at transformation language called XSLT [XSLT, 1999] and a vocabulary for specifying formatting semantics, but we use only the transformation language. Even though XSLT is a subset of the XSL it is commonly just referred to as XSL, and so do we in this report.

60 52 Design Choice of communication method Since the Generator is to interact with the user interface, we need a method of communication. The browser and web server already has support for communication via the HyperText Transfer Protocol(HTTP)[Fielding et al., 1999]. This is therefore the natural choice for the Generator when communicating with the browser. Another issue is the editor. The method of communication for the editor depends on which editor is used. We use the Emacs editor, see Section 4.8 on page 64. A typical way of communicating with Emacs is through sockets. Socket communication do not dictate a specific protocol, so we still need either a standard protocol or design our own. One of the advantages of using Emacs is the vast number of external packages available. These all add to the functionality of Emacs. One of these packages enables Emacs to communicate via HTTP. This gives the great advantage of using the same protocol for both components in the user interface layer. Thereby providing a uniform method of communication. We therefore chooses the HTTP protocol for communication between the Generator and the user interface layer Current implementation Having described our main considerations on choice of architecture and technologies we move on to describe our current implementation of the Generator. Many technologies exists for web server side applications. One of these is The Java Servlet API [Davidson and Ahmed, 1998] which enable server side Java applications. Servlets lets the programmer write classes to be instantiated by the web server. Communication with these objects are provided through the HTTP protocol. Other non-java technologies are available with similar functionality. However as the rest of the implementation is in Java and the project is focused on Java we choose the Servlet API as the foundation for the Generator. The main design of the Generator can be seen in Figure 4.9 on the facing page. The architecture consists of a number of components which work together in order to give appropriate response to the incoming request. First we have the Resolver. The responsibility of the Resolver is to dispatch incoming request and then based on the type of the request delegate the work to three different servlets: the QueryServlet, The JMEServlet, and the EdocServlet. These servlets use one of the other tools and components to produce a response. In the following we will take a look at these, and describe their functionality in detail. The Generator uses three tools: XSLProcessor As mentioned in Section on the page before we use XSL style sheets to transform the internal XML data format to the formats used by the user interface. For this we choose to use the Lotus XSL processor, provided by IBM via the Alpha

61 4.6 Generator 53 Webserver REQUEST Resolver RESPONSE JMEServlet EdocServlet QueryServlet Tools XSLProcessor JavaMarkupEngine QueryEngine RESPONSE RESPONSE RESPONSE DB Figure 4.9: Main architecture of the Generator. The Resolver, that reside within the Web server, accepts requests. These requests are then either responded or dispatched to one of the three other servlets. The three servlets uses the tools and generate responses. Works project [AlphaWorks, 1999]. Other XSL processors exist, but Lotus XSL was chosen, as it is free, open source and mature. Query engine The Query engine provides data to the Generator from the data model. The Query engine is documented in Section 4.5 on page 47. JavaMarkupEngine The purpose of the JavaMarkupEngine (JME) is to change the Java source code into an XML compatible format 8. This makes it possible to use the XSL- Processor to typeset the source code into HTML as argued in on page 51. Currently three things are marked up in the source code: 1. Source entities: Language symbols i.e. classes, methods, etc. and source markers. 2. Source relationships: Language constructs that uses source entities. 3. Comments: Java comments. The actual transformation to XML is achieved by asking the Query Engine for all the entities and relationships inside a specific file. This information contains buffer positions as described in section 4.3 on page 34 and these are used to markup the entities and relationships. Comments are not abstracted into the Data Model, so these are marked-up by explicit search and replace. The structure of the Java source code is not changed and hence there exists a one-to-one mapping between the source code and the HTML representation. 8 A grammar (DTD) for the markup used can be seen in Appendix A.2 on page 82.

62 54 Design The Resolver All requests made to the Generator is handled via the Resolver. The resolver will interpret the request and dispatch it to the appropriate servlet. Here after it is this servlets task to complete the request. The decoding and dispatching of the requests made by the Resolver, is based on parameters sent with the request. All of the servlets works by accepting HTTP DO GET requests. When written in the hypertext version of the documentation or source code these requests are on the form of an URL like this: address?param1= value&param2= value#place in text The servlet address is the URL for the Resolver object running on the web-server. Everything between the & and the # are parameters sent to the Resolver. The place in text is a named anchor, that enables the hypertext browser to position it self at the right place in the hypertext. In general the parameters give information about the nature of the request, e.g., link=dlink specifies a request for a document entity, where the value of the parameter target contains the label of the entity. Therefore if the Resolver receives a request like the following: link=slink&target=appliances.kitchen.coffeecontainer@removefilter() it would hand over the request to the JMEServlet, since the link is an slink (source link). It is now the responsibility of the JMEServlet to take the appropriate actions for the request, which in this situation is expected to markup and return the file, where the method remove- Filter() appears in the class appliances.kitchen.coffeecontainer. In Table 4.11 on the facing page are some of the most important requests accepted by the Generator. The JMEServlet The purpose of the JMEServlet is to handle output to the source frame of the browser. In Figure 4.10 on page 56 the work cycle of the JMEServlet is shown. Step 1 Step 2 Step 3 4 Step 5 6 Step 7 8 Step 9 A request is sent from the browser. The JMEServlet gets the request from the Resolver. It looks up the filename of the source code file requested. This filename is passed to the JavaMarkupEngine which is described above, where it is marked-up. The marked-up source code is passed to the XML processor where it is transformed into HTML. The HTML, with some header information, is passed to the browser as the response to the request.

63 4.6 Generator 55 Request Argument Description link=dlink target=label Returns the hypertext version of the documentation containing the label=label. link=slink target=idname Returns the hypertext version of the source code file, that contains the source entity with idname=idname. query=finddocreldoc origin=label Finds all document entities, that make a dlink to label. query=finddocindoc origin=label Finds all document entities contained in the document entity, with label=label. query=findsrcrelfromsrc origin=idname Finds all source entities using the source entity with idname=idname. query=findsrcreltosrc origin=idname Finds all source entities used by the source entity with idname=idname. query=finddocreltosrc origin=idname Finds all document entities that make a slink to idname. query=prefixlookupsource origin=string Finds all idnames that start with string. query=prefixlookupdoc origin=string Finds all labels that start with string. Table 4.11: This table contains all queries accepted by the generator. link=dlink requests are dispatched to the EdocServlet. link=slink requests are dispatched to the JMEServlet. All query= requests are dispatched to the QueryServlet some of these requests have more parameters but these are omitted for the sake of simplicity.

64 56 Design When the JMEServlet transforms the marked-up source into HTML, it has to write the links and anchors that make further browsing and navigation possible. For each source relationship the JMEServlet inserts a HTML link that associate it to its relevant source entity. Each source entity is marked up with an HTML anchor that can be linked to, from associated source relationships. This is used to link the use of a source name to the definition of the source name, e.g., a link would be made from the invocation of a method to the definition of the method. The source entity is marked up with a link that will show the navigation, activate the Query- Servlet and provide for further navigation and information about this source entity. The anchors and links contain the address of the Resolver and parameters for the relevant servlets. A more detailed example of the generation of links is to be found in the following section. Editor Browser 1 req Resolver 2 req 8 9 res Query engine 3 idname 4 filename JMEServlet.HTML 7.XML 6.XML 5 XSLProcessor JME.java Figure 4.10: Work flow of the JMEServlet. When the JMEServlet gets a request from the Resolver it goes though a number of steps to produce the HTML output, which is returned to the User Interface. The EdocServlet The responsibilities of the EdocServlet is very much like those of the JMEServlet. The EdocServlet, however, handles the output to the documentation frame. The work cycle of the EdocServlet is somewhat simpler than the JMEServlet since we do only have one -doc file. This means we not have to look up the filename in the Data Model. On top of this, the documentation is already in a XML based format ( -doc).

65 4.6 Generator 57 Step 1 Step 2 Step 3 4 Step 5 6 Step 7 A request is sent from the browser. The EdocServlet gets the request from the Resolver. The file is opened. No further processing is needed and the XML file is transformed to HTML for the browser. The HTML, with some header information, is passed to the browser as the response to the request. When the EdocServlet transforms the -doc files into HTML, it is, besides the formatting, responsible for marking the documentation with tags that enables hyperlinking. In the -doc files the slink tags are attributed with an href attribute that contains a fully qualified name (or a part of a fully qualified) for a source entity. The href attribute together with sbase determines fully qualified name. This information is enough to find source code file, which contains the source entity. The EdocServlet has to generate a hyperlink on the format described in the section about the Resolver on page 53. section label = sec : s p e c i f i c p a r t s sbase = appliances. kitchen. t i t l e Coffee machine parts / t i t l e p 30 The s l i n k role = weak href = CoffeeMachine CoffeeMachine / s l i n k class is r e a l i z e d trough a number of subcomponents. These are : Figure 4.11: Example of some -doc that make a reference to a source entity in the coffee machine example. The fully qualified name of the source entity is generated by concatenating the value of sbase attribute with the href attribute. The fully qualified name of the source entity is appliances.kitchen.coffeemachine In Figure 4.11 is lines from the coffee machine example in Appendix C.2 on page 93. From the information in the slink tag and the sbase parameter on the section the EdocServlet generates a A tag in the HTML version of the -doc document. This A will be given a href attribute with the following value 9 : ¹ target=appliances.kitchen.coffemachine#appliances.kitchen.coffemachine The is the address of the Resolver on the Web-server. The parameter link determines that this is a link to a source entity, and that the Resolver should redirect this request to the JMEServlet. The target parameter contains the fully qualified name for the source entity. The text after the hash mark is an anchor name that tells the browser where in the hypertext it should position it self [W3C, 1999], when the response comes from the webserver. These anchor names are generated by the JMEServlet and EdocServlet, when they generate the hypertext version of the document and source entities. 9 The domain is non existing and invented for this example.

66 58 Design Editor Browser 1 req 2 Resolver req 7 res 3 6 Master filename 4.XML EdocServlet.HTML 5.XML XSLProcessor Figure 4.12: Work flow of the EdocServlet. Upon request from the Resolver the EdocServlet loads the master -doc file and handles it to the XSLProcessor for transformation to HTML, which is passed to the browser as the response to the request. The QueryServlet The final servlet is the QueryServlet. The servlet handles output to the Navigation Window and request made from the editor regarding idnames of source entities and labels of document entities. The QueryServlet works like this: Step 1 Step 2 Step 3 4 Step 5 Step 6 7 Step 8 A request is sent from the browser or the editor. The EdocServlet gets the request from the Resolver. The QueryServlet makes an appropriate query to the data model and retrieves a list as response. The format and contents of this list will depend on the query issued, but generally it contains source and document entities or relationships. Upon retrieving the list the QueryServlet makes an internal transformation of the format. The result will be an XML representation of the list (the grammar can be seen in Appendix A.3 on page 83), which is passed to the XSL processor where it is transformed to either HTML or elisp, depending on the whether the browser or the editor is the receiving part of the UI. The HTML or the elisp is passed to the browser or editor respectively as the response to the request. 4.7 The browser The browser is one of the two components of the user interface. The purpose of the browser is to let the user of the lucidator have an attractive view and functional way of reading the produced documentation and source code, as well to provide him with ways of naviga-

67 4.7 The browser 59 Editor Browser 1 req 1 req Resolver 8 res 2 req 8 res Query engine 3 query 4 list QueryServlet 7.HTML/elisp 6.XML XSLProcessor 5 Internal transformation to XML Figure 4.13: Work flow of the QueryServlet. When the QuerySevlet gets a request from the Resolver it first issues a query to the data model. The result of this is internally transformed to XML which is passed to the XSLProcessor. The result of this depends on the type of request, and will be passed along to either the browser or the editor depending on the type. tion in various ways both internally in the documentation and source code and between the documentation and the source code. In this section we will describe how the browser is used in order to realize the features stated above. First we take a look at the combination of frames we provide through the browser and how these are connected to each other. Thereafter we describe which customization possibilities the current implementation provides for the user The interface The setup of the browser of the lucidator involves one window split in three frames and another single framed window. This is shown in Figure This interface layout is based upon the original interface design by Nørmark. We have added the idea of the navigation window for more complex navigation than simply correspondence between the documentation and the source. In the following we will describe each frame in detail. The menu frame The menu is used for providing functionality in connection to the layout and setup of the browser. In the current implementation it provides buttons for: Showing the main -doc document Customization of the placement of the documentation and source frames

68 60 Design Menu Navigation Documents Source Figure 4.14: A suggestion for an lucidator setup with a documentation, source code and a menu frame with buttons at the top, and a separate navigation window for presenting queries. Creating a table of contents for the -doc document Showing a help window Resetting the lucidator In Section on the next page the customization possibilities this menu panel provides will be discussed in detail. The documentation frame The documentation frame is used for presenting the -doc document. All slink, dlink and xlink elements in the -doc file are rendered as hypertext links to their respective targets. When clicking a slink typed link the contents of the source frame will change to show the source code for the chosen source entity. When clicking a dlink typed link the documentation frame will jump to an internal anchor in order to show the selected document entity. Finally, clicking a xlink typed link a new separate browser frame will be opened with the target of the link as its start URL. In order to provide easy recognition of the different types of links each type of link has its own color. Weak and Strong links is furthermore distinguished by the hue of the color. As for the chapter and section elements their title is rendered with different font sizes in order to show the hierarchy of the document. Furthermore the titles is marked up as hypertext links with the navigation window as its target. The navigation window will be dealt with in detail below. The source code frame The source code frame is used for presenting the source code. In the current implementation three things are marked-up in the source code:

69 4.7 The browser 61 Source entities: These are rendered as hypertext links to the navigation frame Source relationships: These are marked up as hypertext links to the other source entity in the relationship. This is used to link the use of a source name to the definition of the source name, e.g., a link would be made from the invocation of a method to the definition of the method. Comments: These are rendered in a special color in order to provide easy distinguishing between source code and comments. Further markup and rendering could be possible. A possibility could e.g. be to make lexical coloring, meaning that language constructs such as public, private, extends and void would be rendered in different colors. The navigation window The navigation window is used for advanced navigation through the source code and the documentation. When the user, i.e., clicks upon a source entity in the source frame the navigation window appears with a hypertext document. This window contains a list of the document entities which describe (strong links) or mention (weak links) the source entity. In Figure 4.15(a) on the following page this window is shown for the source entity appliances.kitchen.coffeecontainer from the example in Appendix C.1.3 on page 91. The list of document entities is given as pairs of chapters or sections and the source link in the chapter or section, which link to the source entity (appliances.kitchen.coffeecontainer). When pressing the chapter/section the documentation frame shows the specified chapter/section. When pressing the slink the documentation frame shows the specified source link. Besides being able to navigate to the documentation, it is possible to navigate to other parts of the source code. This is logically done by following the source relationships between source entities. The notion of using and used by describes that two source entities are related through a source relationship. If a source entity uses another source entity, it means that there exists a source relationship from the former to the later e.g., a method A uses another method B by invoking it and hence a source relationship of kind invokes exists from A to B. For used by it is visa versa, i.e., method B is used by A. See section on page 40 and for a complete list and description of the possible relationships. In Figure 4.15(b) on the next page the navigation window is shown with hyperlinks to all source entities that are used by the source entity appliances.kitchen.coffeecontainer. The elements in the column Source symbol are hyperlinks to the declarations of the source symbols that are used by the CoffeeContainer. The elements in the column Location are hyperlinks to the place in the source code where the relationship is defined, e.g.,the place where a method is invoked as opposed to the declaration of the method. In Figure 4.15(c) on the following page the navigation window is shown with hyperlinks to all source entities that are using the CoffeeContainer.

62 Design (a) Documentation (b) Source symbols used (c) Source symbols using Figure 4.15: The navigation window for the source entity appliances.kitchen.coffeecontainer from the example in Appendix C.

70 62 Design (a) Documentation (b) Source symbols used (c) Source symbols using Figure 4.15: The navigation window for the source entity appliances.kitchen.coffeecontainer from the example in Appendix C.1.3 on page 91. (a) shows the places where the source entity is documented. (b) shows the source entities used by the CoffeeContainer. (c) shows the source entities using the CoffeeContainer Layout Our current implementation of the browser as an user interface features two possibilities for changing the layout: The setup of the frames and the look of the presented text. Frames setup Inspired by the original Elucidator design by Nørmark we provide both a horizontal and vertical mode for the setup of the documentation and source code frames. The choice between the two frame modes is made in the menu frame, and the vertical mode is default. The two modes are illustrated in Figure 4.16(a) and 4.16(b) on the next page. Look of the text To allow users to change look and layout of the browser requires a way of changing the way the marked-up files are interpreted via style sheet processor, and thereby presented as HTML. There seems to be two possibilities: 1. Teach the users how to understand and change XSL style sheets to do what they want 2. introduce Cascading Style sheets (CSS) [Lie and Bos, 1996]

4.7 The browser 63 (a) Horizontal frame mode (b) Horizontal frame mode Figure 4.16: Screen capture of the two frame modes. Via CSS it is possible to define so called classes, i.e.,a class called sectiontitle is shown in Figure 4.

17: Example of a style class definition in CSS. The figure expresses that we want tags with the.

71 4.7 The browser 63 (a) Horizontal frame mode (b) Horizontal frame mode Figure 4.16: Screen capture of the two frame modes. Via CSS it is possible to define so called classes, i.e.,a class called sectiontitle is shown in Figure 4.17 on the facing page.. s e c t i o n T i t l e font weight : bold ; font size : 20pt ; display : block ; 5 margin top : 10pt ; margin bottom : 5pt ; Figure 4.17: Example of a style class definition in CSS. The figure expresses that we want tags with the.sectiontitle class to use bold fonts, have a font size of 20 points and display text in block-mode with a margin top and bottom. An example of HTML using this class is shown in Figure 4.18 on the next page. A class = s e c t i o n T i t l e href = sometarget name= somesection Some Section /A Figure 4.18: Example of HTML using the sectiontitle class. This gives us the possibility to keep the XSL scripts free of typesetting information - and users can thereby change and use any style format he wishes. A disadvantage of CSS is that different browsers has a tendency to interpret and use the CSSspecification differently. This can make it hard to create a CSS that works in all browsers. Fortunately the differences seem to be minor and probably will diminish as browsers evolve and start to conform.

72 64 Design It is our claim, that using and writing CSS to do layout can be considered a somewhat easier task than understanding XSL and use this knowledge modify the existing XSL style sheets. XSL contain some level of application specific transformation rules whereas CSS only contains relevant style information. 4.8 The editor The second component in the user interface is the editor. The purpose of the editor is to support the programmer while writing elucidative documentation. The first versions of the -doc files, which documents the lucidator were written without any support from the editor. This was a learning experience as it helped us identify the most tedious tasks and problems while writing elucidative documentation. Our editor support is inspired by this manual editing of -doc files and the editor support provided by the original lucidator. In the editor a split-view with the documentation and the source code in separate frames is provided. Furthermore functionality for easy insertion of references and for running the application that generated the documentation from within the editor is provided. The following will present our choice of editor and describe the main features implemented in the current version Choice of editor In order of us to use an editor in our lucidator it must support some way of extending its functionality. Emacs is an example of such an editor, since it is very extendible through its scripting language elisp, a lisp dialect. This extendibility and the wide selection of already existing extensions makes it a popular tool for developers and ideal as a prototype editor for the lucidator. Other editors can provides the same functionality for extending its functionality as Emacs does. We choose not to look at other editors and just use Emacs since the precise choice of editor is not crucial to the implementation of the prototype lucidator. Furthermore, all the members of the group has experience with Emacs, and it is widely used by programmers Current implementation In our current implementation we utilize a number of the existing extensions for Emacs. We start and control these via an Elucidative minor mode we have implemented. When starting the editor in Elucidative mode a split view is presented with the documentation in the left frame and source code in the right frame The Java Development Environment (JDE) [Kinnucan, 1999] mode is automatically loaded for the source code frame. This mode

4.8 The editor 65 provides functionality such as lexical highlighting and compilation of the source code from within Emacs. A screen capture of this setup is seen in Figure 4.

73 4.8 The editor 65 provides functionality such as lexical highlighting and compilation of the source code from within Emacs. A screen capture of this setup is seen in Figure Figure 4.19: Screen capture of Emacs in Elucidative mode. Documentation is shown in the right frame and source code in the left frame. Structural editing The textual editing of the -doc files can be a mentally demanding activity for the author since he will have to know the textual possibilities provided the -doc language and keep track of which tags can be inserted at a given time. Furthermore the usage of tags and attributes tend to clutter the document and thereby distract the authors focus from the actual documentation task. An existing mode for Emacs called psgml [Staflin, 1994], provide a remedy as it allows one to use emacs as a structure based editor for XML. psgml can be controlled by a grammar (DTD) to give support for inserting only valid tags and attributes. It also has a feature that can hide the attributes for the tags and outline the document to give a uncluttered view of the document. This is very powerful and the mode still gives the programmer full control over the process, thus it is not a requirement to use the mode if one chooses to manually edit the files. We have therefore chosen to automatically start psgml mode for the documentation frame. Insertion of links Besides the editing of a structured document, the insertion of links between the documentation and source code can be rather cumbersome given the long identifier names used in the lucidator, (see Section for details on this, and Table C.2 on page 96 for an example of the long identifier names).

74 66 Design As a way of reducing this problem we have implemented functionality to support inserting the references used in slink s and dlink s. To do this we again use an existing extension to Emacs. We use the standard url package, which gives us the ability to communicate through the HTTP-protocol. In that way we can contact the Generator with a request for a specific subset of the source entities and use this to support link-insertion. See Section on page 57 for the component responsible of returning information to the editor. It is of great advantage to let the Generator provide this information as it will always be consistent with the data model and it reduces the complexity of adapting another editor. Abstraction While editing the documentation it is useful to be able to run the Abstractor on the updated documentation. This has been implemented as any other normal compilation from inside emacs. By doing this we automatically get support for locating and looking up errors in the editor - a very helpful feature. When the abstractor is completed it will be possible to browse the documentation via the browser. 4.9 Environment Our current implementation if the lucidator is based upon a set of tools which must be installed in order for the lucidator to work. The choice of these tools is based on a combination of our beforehand knowledge with the tool, and the functionality of the tool. We could have chosen other tools, but have tried, as previously mentioned, to choose tools with standardized interfaces. In this section we give a brief description of the chosen tools. Database We have chosen to use MySQL, as it is known to be an fast 10, reliable, and SQL compliant database with a JDBC interface. Editor Emacs, is our choice of editor. It provides a programmable interface, with the ability to extend the functionality of the editor as needed. It furthermore provides us with tools for writing structured documents compliant to some specified grammar. Web browser Netscape is the obvious choice, as it is the only stable browser available for the Sun Solaris platform. The Department of Computer Science at Aalborg University relies on Sun Solaris for workstation operating system. Later in the process Microsoft Internet Explorer have been tested thoroughly too. Web server We have chosen the Apache web server, due to stability, flexibility, and ease of configuration. With this server we uses an extension called ApacheJServ-1.0 for running Java Servlets. Our web server runs on a medium sized PC platform 11 and even Pentium III, 450 mhz, with 128 mb of RAM running the Sun Solaris 7 operating system

75 4.9 Environment 67 though it serves HTTP requests and runs the MySQL database as well, it is still used as a workstation with graphical console logins.

76 68 Design

77 ÓÒÐÙ ÓÒ This chapter concludes our project. This is the place where we look back at the entire project process, and conclude on all the parts in a larger context. Our conclusion has two parts: First the four questions established in the Problemization are concluded upon. We do this by examining whether or not we fulfill the requirements posed by Nørmark for an Elucidator, and to what degree the consequences we established in the Analysis had an impact. This should be seen in the context of the whole master thesis. After this we describe and conclude upon which contributions we have provided in this project. First, we look at our primary goal: To which degree is it possible to create a tool, which supports Elucidative Programming, multiple users and the object oriented programming paradigm? Nørmark s first two requirements are trivially accomplished, as the lucidator do not change rules for writing Elucidative documentation. Requirement three states that the source must be intact. We do not fully comply with this requirement, as we use source markers to denote special places in the source code, but the markers are necessary in order to be able to refer to special language constructs, e.g., anonymous inner classes. The fifth requirement beckons us to allow references to the chunks of the language. Since we can refer to classes, methods, and source regions this requirement is met. To fulfill the last requirement, we have to produce an online representation of the documentation. Since we output to hypertext this is accomplished. The representation, furthermore, has to be attractive, but judging whether or not a representation is attractive is subjective. To cope with this, a system, which allows users to easily change the appearance of the representation, has been implemented. We therefore feel that this requirement is met. In the Analysis we discovered the importance of understanding the relationships between the entities of the programming language, in order to provide facilities for navigation. We 69

78 70 Conclusion achieved this in our lucidator as our Abstractor locates entities and the relationships, which are used for realizing advanced navigation via the navigation window. Furthermore, we discovered a need for referencing external documentation. This is realized through the xlink facilities. Considering the aspect of multiple users, the current implementation do not provide any true multiuser support. This is not a problem, since we choose not to make this a focus for the first half of the thesis. However, the current choice of architecture does not prevent us from adding it in the future. Considering the focus of the project, we find our prototype to be a proof of concept for an Elucidator for an object-oriented language. Next, we look at the issues on motivations: How can the tool be designed, to motivate the programmer in writing documentation? Finding out how to motivate programmers to write documentation is quite a task. To our knowledge, little literature exists in this field, so it would require gathering an amount of empirical data. We choose not to do so. Instead we had to use ourselves as a case study. We are, as programmers and students, a part of the target group, so this is not necessarily a bad choice, but we realize that we are prejudiced towards the lucidator. We have found three areas, where we as programmers have been motivated to use the lucidator: Firstly, the editor uses the existing available packages and our own editor functionality, to help and guide the programmer in the authoring process. This complies with requirement four as formulated by Nørmark. Secondly, the ability to change the appearance of the representation, has ensured that none of us used a representation we found unappealing. Finally, the navigation window has become an added bonus given to us by the entity/relationshipmodel. In our experience, it greatly adds to the value of the tool, and thereby the motivation to use it, as the extra information produced by the processing adds an extra layer of functionality. In short terms we feel that the effort we spend on focusing on motivation is well spent. This has increased our belief in the importance of motivating the programmer when writing documentation. Finally, we look at templates and views: How can templates be created in a way that will support a programmer in structuring his Elucidative documentation? In the Project Focus we chose not to deal with this question. However, our experiences during the development of the tool, requires us to state two points: We had to structure our text for it to be usable, so we indirectly had to create a, simple, yet useful, template for our documentation. The experiences we got, confirmed the points in the Analysis stating that we should have support for general and flexible templates. We can find nothing in the current implementation, which prevents this from being implemented in the future.

79 71 Given a structured documentation, to which degree is it possible to create views of the documentation, which extend the value of the documentation. As with the previous question, we decided not to focus on this question, but we still got some experiences. We succeeded in making two views based on our simple template, namely our view on the documentation and our table of contents. We found these views limiting for the usability of browsing the documentation. This confirmed our belief in a more general scheme for views. - o - It is our belief that the work done in this project makes a number of contributions. In this part of the conclusion we will list these and comment on them. The list is ordered with the contributions we find the most important at the top, and the minor contributions at the bottom. A prototype Elucidator for Java: We have managed to show that an Elucidator for Java can be realized. We have furthermore implemented a prototype, which we find promising. An architecture of an Elucidator: We have designed a modular architecture with welldefined standard interfaces. Among the strengths of this architecture is that it is very easy to change the lucidator to use another language, also non-object-oriented languages, or even make the lucidator use multiple language. Easy navigation in Java source code: We have implemented and shown that when the Java source code is abstracted and stored in a data model it is possible to provide the user with a plethora of navigation possibilities. We have furthermore show how these can be integrated with documentation. Flexible/configurable user interface: The implementation of the lucidator makes it easy to change the look and feel of the user interface. This ensures a flexible solution which can easily be adjusted to new environments. Usage of standardized technologies: This project shows that standardized technologies can be used when designing and implementing an Elucidator. It has furthermore been shown that the usage of these technologies has made it easy to use external tools in the realization of the implementation. Dynamic presentation of documentation and Java source code: Our implementation of the lucidator shows that a dynamic approach to presenting the documentation and Java source code in the browser is possible. We have furthermore shown that this solution is not slow but, on the contrary, rather fast. Standard for Java entity names: As a side effect of implementing the lucidator we have come up with a standard for the naming of entities in Java source code.

80 72 Conclusion Markup of Java source code in a browser: We have shown that it is not that difficult to markup and present Java source code in a browser. - o - This conclusions has in several places shown the possibility to extend the current system. These possibilities will be discussed in the next chapter.

81 È Ö Ô Ø Ú In this section some of the ideas and thoughts are presented and reflected upon. They have evolved during the development of the lucidator. We find the ideas interesting and in some way natural extensions to the ideas behind and architecture of the current lucidator. Version control and change propagation As mentioned in the Analysis we see support for multiple users as a necessity for our tool. A possibility is to provide support for multiple users by integrating version control. By version control we mean, that the lucidator has access to and uses version information on a structural level. Structural version control 1 can both be applied to the source code and documentation. This can be used to track changes on a more fine grained level than standard file-based version control. Furthermore, it can be used to support change propagation which can help the programmer maintaining the documentation by providing and visualizing the impact of changes (impact analysis) for both the source and documentation. Motivational factors One of our primary concerns has been to create a tool, which motivates programmers to write documentation. Several factors plays an important role when motivating a programmer. One important aspect, can be to provide an attractive tool. In our experience with the lucidator such factors as speed of abstraction and editor-support has been important for us, but the current implementation is limited in these areas. This can be remedied by providing incremental update in the abstraction process and by developing a more supportive and maybe more visual editor. These observations are based on our own opinion, thus we see the importance of performing a case study to gather information on motivational factors for programmers. The case study could be based on students or industrial programmers using our lucidator tool. 1 Structural version control is used in e.g., Coop/Orm, CoEd and Ragnarok. 73

82 74 Perspective Templates and Views As we argued in the analysis and as our experience has shown us, there exists a need for templates when writing documentation. We presented ideas on structural templates, which we found the most interesting as they can be used for extracting different views of the documentation. Views can be used to present information, which is relevant in a given context. They can be used to view documentation and source code from different perspectives and thereby help to gain a better understanding of the design/implementation. Templates and views can be used to express many different aspects of documentation. An example is to use templates to document changes in the source code in respect to the original documentation. This is convenient as we find it tedious to constantly rewrite parts of documentation instead of just documenting the relevant changes. This kind of writing can be seen as differential documentation. Documentation containing templates with differential documentation could then be used in relevant views e.g., give a list of which parts of the documentation that needs to be updated because it have too many non-cooperated changes. This is one example of how the templates and views can be used. We could think of a dozen other uses, and we expect users to have just as many ideas. For each of these ideas a set of templates, views and their semantics needs to be defined. Therefore we feel that an environment a framework for general template and views would be beneficial for support, as it would unify their implementation and reduce the need for redundant work. Concepts and multiple languages The Data Model in the lucidatorknows about entities and relationships in the documentation and source code. It has no real knowledge about the meaning of these entities or relationships. This is one of the strengths of the model since it makes it very easy to change the lucidator to work with other languages than Java. The basic data model does not know of classes, methods and such. This is as stated above as a strength, but at the same time also its weakness. It would be of great value if the lucidator had knowledge of the concepts and terms used in the paradigms and/or programming language in question. This information could be used to provide heuristics and cognitive support to the programmer when documenting, e.g., when writing documentation for a class the programmer gets encouraged to pay special attention to static initializers or the use of anonymous classes if these exist in the class. This could be implemented as a layer on top of the data model, without reducing the flexibility of the current lucidator. Another aspect of documentation is problems related to the use of multiple programming languages in a single project. We experienced it during our project as we use many languages, i.e., Java, XSL, XML, -doc and Makefiles. The lucidator only supports Java, even though the documentation for the other implementations were not documented in elucidative style. It would be valuable to extend the data model to function with multiple languages. This

83 75 would allow us to browse all the different files and facilitate cross-referencing, allowing a possible stronger coherence of the documentation. API and advanced queries Both the templates and the views would profit from an ability to interact with external programs. In this way it could be possible to create specific references to, e.g., a part of an UML [Booch et al., 1999] diagram. For this to be possible, an Application Programming Interface (API) is needed. In this way, a modular integration would be possible, and external program would also be able to access the data model in the lucidator. The Data Model contains a vast amount of information about the relationships and entities in the source language and the current lucidator only utilize a small part of this information via the Navigation window. The knowledge on the nature of the language constructs could be used in more advanced queries, e.g., to generate views for call-graphs, class hierarchies, dependency-graphs and the like.

84 76 Perspective

85 Ð Ó Ö Ô Ý [AlphaWorks, 1998] AlphaWorks (1998). Xml parser for java. alphaworks.ibm.com/tech/xml4j. [AlphaWorks, 1999] AlphaWorks (1999). Lotus xsl. ibm.com/tech/lotusxsl. [Apparao et al., 1998] Apparao, V., Byrne, S., Champion, M., Isaacs, S., Hors, A. L., Nicol, G., Robie, J., Sharpe, P., Smith, B., Sorensen, J., Sutor, R., Whitmer, R., and Wilson, C. (1998). Document object model (dom) level 1 specification, w3c recommendation. [Booch et al., 1999] Booch, G., Rumbaugh, J., and Jacobsen, I. (1999). The Unified Modeling Langauge User Guide. Object Technology Series. Addison Wesley Longmann, Inc. [Bray et al., 1998] Bray, T., Paoli, J., and Sperberg-McQueen, C. M. (1998). Extensible markup language (xml) [Brown et al., 1999] Brown, W. J., III, W. H. S. M., and Thomas, S. W. (1999). AntiPatterns and Patterns in Software Configuration Management. Wiley Computer Publishing. [Chen et al., 1995] Chen, Y.-F. R., Fowler, G. S., Koutsofios, E., and Wallach, R. S. (1995). Ciao: A graphical navigator for software and document repositories. In International Conference on Software Maintenance, Proceedings., pages AT&T Bell Laboratories 600 Mountain Avenue Murray Hill NJ [Davidson and Ahmed, 1998] Davidson, J. D. and Ahmed, S. (1998). Java servlet api specification, version 2.1a. [Erdös and Sneed, 1998] Erdös, K. and Sneed, H. M. (1998). Partial comprehension of complex programs (enough to perform maintenance). In 6th International Workshop on Program Comprehension (IWPC 98), pages IEEE. [Fielding et al., 1999] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and Berners-Lee, T. (1999). Hypertext transfer protocol http/ w3.org/protocols/. [Gamma et al., 1996] Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1996). Design Patterns: Elements of Reusable Object-oriented Software. Addison Wesley, Reading. 77

86 78 Bibliography [Gay-Para et al., 1999] Gay-Para, V., Divoky, W., Graf, T., Lemonnier, A. G., and Wais, E. (1999). Kopi java compiler. [Gosling et al., 1996] Gosling, J., Joy, B., and Steele, G. (1996). Java language specifhation. [GPL, 1991] GPL (1991). Gnu general public license. copyleft/gpl.html. [ISO, 1997] ISO (1997). Hypermedia/time-based structuring language (hytime). http: // [Kinnucan, 1999] Kinnucan, P. (1999). Java development environment for emacs. http: //sunsite.auc.dk/jde/. [Knuth, 1984] Knuth, D. E. (1984). Literate programming. The Computer Journal, 27(2): [Korn et al., 1999] Korn, J., Chen, Y.-F. R., and Koutsofios, E. (1999). engineering and tracking of java applets. pages Chava: Reverse [Lie and Bos, 1996] Lie, H. W. and Bos, B. (1996). REC-CSS1. [Madsen et al., 1993] Madsen, O. L., Møller-Pedersen, B., and Nygaard, K. (1993). Object- Oriented Programming in the Beta Programming Language. ACM Press and Addison Wesley. [Megginson, 1998] Megginson, D. (1998). Simple api for xml (sax). megginson.com/sax/index.html. [MySQL, 2000] MySQL (2000). The mysql homepage. [Nørmark, 1999] Nørmark, K. (1999). Requirements for an elucidative programming environment. About to be submitted for publication. Can be found via: normark/elucidative-programming/index.html. [Parr, 1999] Parr, T. (1999). Another tool for language recognition. antlr.org/. [Sametinger, 1991] Sametinger, J. (1991). DOgMA: A tool for the documentation and maintenance of software systems. [Sametinger, 1994] Sametinger, J. (1994). Object-oriented documentation. Journal of Computer Documentation, 18(1):3 14. [Sametinger, 1999] Sametinger, J. (1999). As little documentation as possible. Lecture held at Aarhus University, Denmark.

87 Bibliography 79 [Sametinger and Pomberger, 1992] Sametinger, J. and Pomberger, G. (1992). A hypertext system for literate C++ programming. Journal of Object Oriented Programming, 4(8): [Sametinger and Stritzinger, 1993] Sametinger, J. and Stritzinger, A. (1993). A documentation scheme for object-oriented software systems. OOPS Messenger, 4(3):6 17. [Staflin, 1994] Staflin, L. (1994). The psgml homepage. se/ lenst/about_psgml/. [Storey et al., 1997] Storey, M., Fracchia, F., and Muller, H. (1997). Cognitive design elements to support the construction of a mental model during software visualization. In Fifth International Workshop on Program Comprehension (IWPC 97), pages IEEE. [W3C, 1999] W3C (1999). Html 4.01 specification. html401/. [Walsh and Muellner, 1999] Walsh, N. and Muellner, L. (1999). DocBook: The Definitive Guide. [XSLT, 1999] XSLT (1999). Xsl transformations (xslt) TR/xslt.

88 80 Bibliography

89 Ö ÑÑ Ö This appendix contains three grammars (DTD) used in the current implementation of the lucidator. First we list the grammar for the -doc language. Next comes a grammar used by the JavaMarkupEngine to change the Java source code into an XML compatible format (see Section on page 52 for more information on the JavaMarkupEngine). Finally the grammar used by the Generator for the contents of the Navigation Window is shown (see Section on page 52 on page 57 for more information on this part of the Generator). A.1 Document Type Definition for the EDoc language?xml version = 1. 0 encoding = UTF 8?! DTD for e l u c i d a t i v e documentation to be used in a Java E l u c i d a t o r! See http : / / www. cs. auc. dk / santa / dat5. for f u r t h e r explanation. 5! The edoc contains one or more chapters!element edoc ( t i t l e, author?, chapter +)! The chapters contain one or more sections!element chapter ( t i t l e, author?, ( p section ) ) 10!ATTLIST chapter label ID # IMPLIED sbase CDATA # IMPLIED 15! The sections can again in turn contain PCDATA or more sections!element section ( t i t l e, author?, ( p section ) )!ATTLIST section label ID # IMPLIED sbase CDATA # IMPLIED 20! The ordinary paragraphs!element p (# PCDATA x l i n k s l i n k d l i n k ) 25! The t i t l e also contains PCDATA!ELEMENT t i t l e (# PCDATA)! The author contains PCDATA 81

90 82 Grammars 30!ELEMENT author (# PCDATA)! The x l i n k is a URL to an external document!element x l i n k (# PCDATA)!ATTLIST x l i n k 35 href CDATA # REQUIRED! The s l i n k is a l i n k to some symbol in java source, included in! the e l u c i d a t o r bundle. The a t t r i b u t e type determines i f t h i s 40! l i n k is a strong or weak l i n k, see Normarks d e f i n i t i o n in the! a r t i c l e about E l u c i d a t i v e Programming.!ELEMENT s l i n k (# PCDATA)!ATTLIST s l i n k role ( strong weak ) # REQUIRED 45 href CDATA # REQUIRED! The d l i n k is a l i n k to some id in the edoc!element d l i n k (# PCDATA) 50!ATTLIST d l i n k role ( strong weak ) # REQUIRED href IDREF # REQUIRED 55 A.2 Document Type Definition for the JavaMarkupEngine?xml version = 1. 0 encoding = UTF 8?! DTD for the xml markup of java code.! See http : / / www. cs. auc. dk / santa / dat5. for f u r t h e r explanation. 5!ELEMENT javamarkup ( s o u r c e e n t i t y s o u r c e r e l a t i o n s h i p commententity )!ATTLIST javamarkup filename CDATA # REQUIRED 10!ELEMENT s o u r c e e n t i t y (# PCDATA)!ATTLIST s o u r c e e n t i t y 15 kind CDATA # REQUIRED idname CDATA # REQUIRED l o c a l CDATA # IMPLIED 20!ELEMENT s o u r c e r e l a t i o n s h i p (# PCDATA)!ATTLIST s o u r c e r e l a t i o n s h i p kind CDATA # REQUIRED idname CDATA # REQUIRED 25 r e l i d CDATA # REQUIRED l o c a l CDATA # IMPLIED!ELEMENT commententity (# PCDATA) 30!ATTLIST commententity kind CDATA # REQUIRED

91 A.3 Document Type Definition for the Navigation Window 83 A.3 Document Type Definition for the Navigation Window?xml version = 1. 0 encoding = UTF 8?! This datatype d e f i n i t i o n contains a d e s c r i p t i o n of a r e s u l t from! the QueryServlet. 5! A l l r e s u l t are on the f o l l o w i n g form. F i r s t l y every r e s u l t! contains a SearchOrigin, which is the base, o r i g i n or source of the! search in question. Sceondly the query r e s u l t contains a number! of d e s t i n a t i o n s, which a l l contain a d e s c r i p t i o n and any number 10! of related d e s t i n a t i o n s. These d e s t i n a t i o n s can be grouped i n t o! categories, which a l l have a name and a number of d e s t i n a t i o n s.!element QueryResult ( Description?, SearchOrigin?, Category, Destination, RelatedDestinations?) 15!ATTLIST QuertResult d i r e c t i o n CDATA # IMPLIED 20!ELEMENT SearchOrigin ( Description?, E n t i t y )!ELEMENT Category ( Destination +)!ATTLIST Category 25 name CDATA # REQUIRED!ELEMENT Destination ( Description?, E n t i t y, RelatedDestinations? ) 30 ( Description?, Query)!ATTLIST Destination role CDATA # IMPLIED 35!ELEMENT RelatedDestinations ( Category, Destination )!ELEMENT Description # PCDATA 40!ELEMENT Query #PCDATA!ATTLIST Query query CDATA # REQUIRED 45 o r i g i n CDATA # REQUIRED! Here we need to r e f e r to elements from the generel datamodel 50! dtd.!entity % datamodel SYSTEM http : / / loda17. cs. auc. dk : 8080 / XMLSystem / datamodel. dtd %datamodel ; 55!ELEMENT E n t i t y ( SourceEntity DocumentEntity SourceRelationship )

92 84 Grammars

93 Ì Ð Ò ÓÖ Ø Ø ÅÓ Ð This appendix contains the layout of the tables of the database in the currently implemented Data Model. Column id fromid toid fileid kind startpos endpos Description A numerical value to identify the entry. This is used to speed up the process of searching in the database and have no other special purpose. A link to a SourceEntity, in the SourceEntity table, which is has the role of from entity in the relationship. A link to a SourceEntity, in the SourceEntity table, which is has the role of to entity in the relationship. A link to a file in the FileEntity table, in which the relationship is present. Specifies the kind of relationship. Possible values are invokes, containment, access etc. Specifies at what positions in the file, the relationship starts. Specifies at what positions in the file, the relationship ends. Table B.1: Layout of the SourceRelationship table. [ 85

94 86 Table design for the Data Model Column id kind label fileid startpos endpos title author href role attlist Description A numerical value to identify the entry. This is used to speed up the process of searching in the database and have no other special purpose. The type of the entry, e.g., edoc, chapter or slink. A unique identifier for each entity. The label is used for referencing internally in the documentation. A link to a file in the FileEntity table, in which the entity can be found. Specifies at what positions in the file, the entity starts. Specifies at what positions in the file, the entity ends. For edoc, chapter and section entities, this column contains the title of this entity. For other types of entities this column is empty. For edoc, chapter and section entities, this column contains the author of this entity. For other types of entities this column is empty. For slink, dlink and xlink entities this column contains the a string, representing the target of the link. For other types of entities this column is empty. For slink, dlink and xlink entities this column contains the role of the link. Possible values is string and weak. For other types of entities this column is empty. This column is provided as an easy way of extending the number of attributes on an entity. The column contains a comma separated list of attribute and value pair, with the format: attribute-name=value. Table B.2: Layout of the DocEntity table. Column docid sourceid Description A link to a DocumentEntity in the DocEntity table, A link to a SourceEntity in the SourceEntity table. Table B.3: Layout of the DocSourceRelationship table.

95 87 Column Description id A numerical value to identify the entry. This is used to speed up the process of searching in the database and have no other special purpose. fromid A link to a DocumentEntity, in the DocEntity table, which is has the role of from entity in the relationship. toid A link to a DocumentEntity, in the DocEntity table, which is has the role of to entity in the relationship. fileid A link to a file in the FileEntity table, in which the relationship is present. kind Specifies the kind of relationship. Possible values are containment (more types please) etc. startpos Specifies at what positions in the file, the relationship starts. endpos Specifies at what positions in the file, the relationship ends. Table B.4: Layout of the DocRelationship table. Column id filename Description A numerical value to identify the entry. This is used to speed up the process of searching in the database and have no other special purpose. The name of the file. Table B.5: Layout of the FileEntity table.

96 88 Table design for the Data Model Column id kind name scope modifiers params datatype idname fileid startpos endpos Description A numerical value to identify the entry. This is used to speed up the process of searching in the database and have no other special purpose. The type of the entry, e.g., class, method or variable. The name of the entity. Examples could be the name of a class or a method. The scope of the entity. This can be private, public or protected. A set of zero or more modifiers for a given object. For methods, modifiers include abstract, final, native, static, and synchronized. For fields, modifiers include final, static, transient, and volatile. For classes and interfaces, modifiers include abstract and final. Since an entity can have more than one modifier, the modifiers are represented by a bit vector in the database. For method types, this column contains a list of parameters for the method. This column is not used by other entity types. It is primarily used to distinguish among overloaded methods. For method type, this column contains the return type of the method. For field types, the column contains the type of the field. For classes, this column contains the name of the package it is contained in. The full name of a class is constructed by appending the name and datatype columns of the entity. The datatype column is used in these two differing contexts to preserve space in the database. However, datatype is a package name in one context and a type name in the other, so there is no overlap in the possible set of values for these contexts. The fully qualified name for the entity. A link to a file in the FileEntity table, in which the entity can be found. Specifies at what positions in the file, the entity starts. Specifies at what positions in the file, the entity ends. Table B.6: Layout of the SourceEntity table.

97 Ü ÑÔÐ Ó«Å Ò This appendix contains a small example which is used in the Design section to visualize aspects of the design. The example is a simple example of how a coffee machine could be modeled in Java. The appendix has four section. In the first the Java source code for the example is listed. In the second the corresponding -doc documentation is listed. The third section contains tables with the result of applying the Abstractor on the example. Finally the last section contains a set of screen captures showing the example in both the editor and the browser. C.1 Java source code This section contains the Java source code for the Coffee Machine example. C.1.1 EvaTrio.java package appliances. kitchen ; import appliances. general. ElecAppException ; 5 public class EvaTrio 10 public static void main ( String [ ] EvaTrio et = new EvaTrio ( ) ; public EvaTrio ( ) Args ) CoffeeMachine cm = new CoffeeMachine ( "Eva", "trio", "eu" ) ; System. out. p r i n t l n ( "Let us make a happy cup of Eva Trio coffee...òn" ) ; 15 cm. makecoffeemachineready ( 5, 10 ) ; try cm. switchon ( ) ; catch( ElecAppException eae ) System. out. p r i n t l n ( eae. getmessage ( ) ) ; 20 89

98 90 Example: Coffee Machine cm. cleanupcoffeemachine ( ) ; C.1.2 CoffeeMachine.java package appliances. kitchen ; import appliances. general. E l e c t r i c a l A p p l i a n c e ; import appliances. general. ElecAppException ; 5 import appliances. parts. HeatingElement ; public class CoffeeMachine extends E l e c t r i c a l A p p l i a n c e private HeatingElement he ; 10 private CoffeeContainer cc ; private WaterContainer wc; private String producer ; private String model ; private int brewingtime ; 15 private boolean brewing = false; public CoffeeMachine ( String producerparam, String modelparam, String worldpart ) 20 super( worldpart ) ; producer = producerparam ; model = modelparam ; 25 he = new HeatingElement ( model ) ; cc = new CoffeeContainer ( ) ; wc = new WaterContainer ( ) ; 30 public void switchon ( ) throws ElecAppException System. out. p r i n t l n ( "Coffee machine turned on..." ) ; if( checkheatingelement ( ) ) 35 brewingtime = cc. getcoffeeamount ( ) 3 ; // e:x/ int counter = 0 ; brewing = true; while( brewing ) System. out. p r i n t l n ( "Brewing..." ) ; 40 counter = counter + 1 ; if( counter == brewingtime ) brewing = false; 45 private boolean checkheatingelement ( ) throws ElecAppException 50 if( he. works ( ) ) return true; else throw( new ElecAppException ( "The heating element is out of order..." ) ) ; 55 public void makecoffeemachineready ( int spoonfuls, F i l t e r f i l t e r ; if( model == "trio" ) 60 f i l t e r = new F i l t e r ( "medium" ) ; int water )

99 C.1 Java source code 91 else f i l t e r = new F i l t e r ( "just take one" ) ; // e:y 65 System. out. p r i n t l n ( "Filter found... (Found behind some jars in the right upper cupboard)" ) ; cc. a d d F i l t e r ( f i l t e r ) ; System. out. p r i n t l n ( "Filter placed in coffee container..." ) ; cc. f i l l ( spoonfuls ) ; System. out. p r i n t l n ( "Coffee placed in filter..." ) ; 70 wc. f i l l ( water ) ; System. out. p r i n t l n ( "Water placed in water container" ) ; // /e:y 75 public void cleanupcoffeemachine ( ) cc. removefilter ( ) ; System. out. p r i n t l n ( "Filter removed... YIKES!..." ) ; 80 C.1.3 CoffeeContainer.java package appliances. kitchen ; public class CoffeeContainer 5 private int coffeebeans ; private F i l t e r f i l t e r ; 10 public CoffeeContainer ( ) 15 public void f i l l ( int spoonfuls ) coffeebeans = spoonfuls ; public void a d d F i l t e r ( F i l t e r f i l t e r = f ; f ) 20 public void removefilter ( ) f i l t e r = null; public int getcoffeeamount ( ) 25 return coffeebeans ; C.1.4 WaterContainer.java package appliances. kitchen ; public class WaterContainer 5 private int water ; public WaterContainer ( ) 10

100 92 Example: Coffee Machine 15 public void f i l l ( int d e c i l i t e r s ) water = d e c i l i t e r s ; C.1.5 Filter.java package appliances. kitchen ; public class F i l t e r 5 public F i l t e r ( String type ) C.1.6 HeatingElement.java package appliances. parts ; public class HeatingElement 5 String type ; private boolean works = true; public HeatingElement ( String type = typeparam ; 10 typeparam ) 15 public boolean works ( ) return works ; C.1.7 ElectricalAppliance.java package appliances. general ; public abstract class E l e c t r i c a l A p p l i a n c e 5 private int voltage ; public E l e c t r i c a l A p p l i a n c e ( String worldpart ) if( worldpart == "eu" ) 10 this. voltage = 230 ; else this. voltage = 110 ; 15 public abstract void switchon ( ) throws ElecAppException ; C.1.8 ElecAppException.java package appliances. general ; public class ElecAppException extends Exception 5 public ElecAppException ( String msg) super( msg) ;

101 C.2 E-doc documentation 93 C.2 E-doc documentation This section contains the -doc documentation for the example.?xml version = 1. 0 encoding = ISO ??xml stylesheet type = t e x t / xsl href = / user / santa / DAT5/ apache / htdocs / XMLSystem / edoc. xsl?!doctype edoc PUBLIC / / CoffeeMachine edoc / / EN / user / santa / DAT5/ apache / htdocs / XMLSystem / edoc. dtd [! ENTITY String 5 java. lang. String! ENTITY f i l t e r appliances. kitchen. F i l t e r ] edoc t i t l e E l u c i d a t i v e documentation for a Coffee Machine / t i t l e 10 chapter label = cha : main t i t l e E l u c i d a t i v e documentation for a Coffee Machine / t i t l e author x l i n k href = mailto : e1207a99@cs. auc. dk Project group E1 207A / x l i n k /author p 15 This document documents the d l i n k role = strong href = sec : a r c h i t e c t u r e a r c h i t e c t u r e / d l i n k of the s l i n k role = weak href = appliances. kitchen. EvaTrio EvaTrio ( TM) / s l i n k coffee machine. The EvaTrio coffee machine consists of a l o t of parts, some of a very coffee machine s p e c i f i c kind ( see Section d l i n k role = strong href = sec : s p e c i f i c p a r t s Coffee machine parts / d l i n k ) and some of a more general nature ( see Section d l i n k 20 role = strong href = sec : e x t e r n a l p a r t s External parts / d l i n k ). br / br / The EvaTrio coffee machine furthermore provides s o p h i s t i c a t e d d l i n k role = strong href = sec : error error handling mechanisms / d l i n k /p 25 section label = sec : s p e c i f i c p a r t s sbase = appliances. kitchen. t i t l e Coffee machine parts / t i t l e p 30 The s l i n k role = weak href = CoffeeMachine CoffeeMachine / s l i n k class is r e a l i z e d trough a number of subcomponents. These are : ul l i 35 The s l i n k role = strong href = CoffeeContainer CoffeeContainer / s l i n k : br / This class models that every coffee machine has a container for s t o r i n g the coffee while brewing. The class has a s l i n k role = strong href = CoffeeContainer@addFilter (& f i l t e r ;) a d d F i l t e r / s l i n k method for placing a s l i n k role = weak href = F i l t e r f i l t e r / s l i n k in the coffee container and a s l i n k role = strong href = CoffeeContainer@removeFilter () 40 removefilter / s l i n k for removing i t. The s l i n k role = weak href = CoffeeContainer CoffeeContainer / s l i n k class furthermore has a method called s l i n k role = strong href = CoffeeContainer@fill ( i n t ) f i l l / s l i n k which takes a amount of spoonfuls of coffee as i t s argument, and f i l l s the CoffeeContainer with the coffee. / l i 45 l i The s l i n k role = strong href = WaterContainer WaterContainer / s l i n k : br / This class is very s i m i l a r to the s l i n k role = weak href = CoffeeContainer CoffeeContainer / s l i n k class, except i t does not have methods for adding and removing f i l t e r s, and instead of a method for f i l l i n g in coffee i t s s l i n k role = strong href = WaterContainer@fill ( i n t ) 50 f i l l / s l i n k methods takes centi l i t e r s of water to f i l l in the container. / l i l i The s l i n k role = strong href = F i l t e r F i l t e r / s l i n k : br / This class models a coffee f i l t e r. The only property of the class is the type of f i l t e r which is given as a 55 parameter to the s l i n k role = strong href = F i l t e F i l t e r (& String ;) constructor / s l i n k. / l i / ul

102 94 Example: Coffee Machine /p 60 / section section label = sec : e x t e r n a l p a r t s sbase = appliances. parts. t i t l e External parts / t i t l e 65 p The s l i n k role = weak href = / / appliances. kitchen. EvaTrio EvaTrio ( TM) / s l i n k l i n e of coffee machines features a number of external parts which is also used in other household appliances. The most important is the s l i n k role = strong href = HeatingElement HeatingElement / s l i n k. Every heating element has a type which is s p e c i f i e d in the 70 s l i n k role = strong href = HeatingElement@HeatingElement(& String ;) constructor / s l i n k, and a s l i n k role = strong href = HeatingElement@works f l a g / s l i n k which i n d i c a t e i f the element works or not. /p 75 / section section label = sec : a r c h i t e c t u r e sbase = appliances. kitchen. t i t l e A r c h i t e c t u r e / t i t l e 80 p The main a r c h i t e c t u r e of the EvaTrio ( TM) and related coffee machines is the s l i n k role = weak href = CoffeeMachine CoffeeMachine / s l i n k class. This class extends the s l i n k role = strong href = / / appliances. general. E l e c t r i c a l A p p l i a n c e E l e c t r i c a l A p p l i a n c e / s l i n k which takes care of tasks such as the voltage. 85 br / br / In order to create a new coffee machine one has to supply the s l i n k role = strong href = CoffeeMachine@CoffeeMachine (& String ;,& String ;,& String ;) constructor / s l i n k with the name of the producer, the model and in which part of the world the coffee machine is to be used. In the example these values are : Eva, t r i o and eu. 90 br / br / When created the coffee machine has three f u n c t i o n a l i t i e s : ul l i 95 By using the s l i n k role = strong href = CoffeeMachine@makeCoffeeMachineReady ( i n t, i n t ) makecoffeemachineready / s l i n k method the coffee machine is loaded with coffee and water and is ready to s t a r t brewing. The method takes the number of spoonfuls of coffee beans and the amount of water as parameters. 100 / l i l i When the coffee machine is ready, you use the s l i n k role = strong href = CoffeeMachine@switchOn () switchon / s l i n k method for switching the machine on. Unless something is wrong with the d l i n k role = weak href = sec : e x t e r n a l p a r t s heating 105 element / d l i n k t h i s w i l l cause the machine to s t a r t brewing coffee. I f something is wrong with the heating element a s l i n k role = weak href = / / appliances. general. ElecAppException ElecAppException / s l i n k w i l l be thrown. See Section d l i n k role = strong href = sec : error Error handling / d l i n k for more information on t h i s. 110 / l i l i The f i n a l f u n c t i o n a l i t y of the coffee machine is the s l i n k role = strong href = CoffeeMachine@cleanupCoffeeMachine () cleanupcoffeemachine / s l i n k method which removes the s l i n k role = weak href = F i l t e r f i l t e r / s l i n k from the s l i n k role = weak 115 href = CoffeeContainer coffee container / s l i n k. / l i / ul /p 120 / section section label = sec : error t i t l e Error handling / t i t l e 125 p

103 C.3 Derived information and Screen captures 95 To provide maximum s e c u r i t y for the users of the EvaTrio ( TM) coffee machine we implements error handling. This is done in the s l i n k role = strong href = appliances. general. ElecAppException ElecAppException / s l i n k exception class. This class w i l l, when thrown, contain information about what went wrong, so the 130 customer a l l a l l time can f e e l secure. /p / section 135 / chapter /edoc C.3 Derived information and Screen captures This section contains tables with the derived information as produced by the abstractor. Some of the not so important attributes has been left out in order to have the table fit the paper. The section furthermore presents a number of screen captures, showing the example in the editor and the browser. Id Fromid Toid Kind containment containment access access access containment throws containment containment containment containment containment containment typeof creation containment containment typeof creation invoke access invoke access typeof access invoke access containment containment containment access access containment typeof access access containment access containment access Id Fromid Toid Kind containment containment typeof containment extends containment access access access access access creation access access creation access creation containment throws invoke access invoke access containment access access access access access access access containment throws invoke access creation containment containment typeof access Id Fromid Toid Kind access creation access creation invoke access access invoke access access invoke access access containment invoke access containment typeof containment typeof containment typeof containment containment containment containment containment containment containment containment containment access access containment containment containment access access containment containment Table C.1: A complete list of the abstracted relationships from the Coffee Machine example. The cource code for the example can be found in appendix C.1 on page 89.

104 96 Example: Coffee Machine Id Kind Name Idname 1 package appliances.general appliances.general 2 class ElecAppException appliances.general.elecappexception 3 constructor ElecAppException appliances.general.elecappexception@elecappexception(java.lang.string) 4 parameter msg appliances.general.elecappexception@elecappexception(java.lang.string)@msg 5 class ElectricalAppliance appliances.general.electricalappliance 6 constructor ElectricalAppliance appliances.general.electricalappliance@electricalappliance(java.lang.string) 7 parameter worldpart appliances.general.electricalappliance@electricalappliance(java.lang.string)@worldpart 8 method switchon appliances.general.electricalappliance@switchon() 9 field voltage appliances.general.electricalappliance@voltage 10 package appliances.kitchen appliances.kitchen 11 class CoffeeContainer appliances.kitchen.coffeecontainer 12 constructor CoffeeContainer appliances.kitchen.coffeecontainer@coffeecontainer() 13 method addfilter appliances.kitchen.coffeecontainer@addfilter(appliances.kitchen.filter) 14 parameter f appliances.kitchen.coffeecontainer@addfilter(appliances.kitchen.filter)@f 15 field coffeebeans appliances.kitchen.coffeecontainer@coffeebeans 16 method fill appliances.kitchen.coffeecontainer@fill(int) 17 parameter spoonfuls appliances.kitchen.coffeecontainer@fill(int)@spoonfuls 18 field filter appliances.kitchen.coffeecontainer@filter 19 method getcoffeeamount appliances.kitchen.coffeecontainer@getcoffeeamount() 20 method removefilter appliances.kitchen.coffeecontainer@removefilter() 21 class CoffeeMachine appliances.kitchen.coffeemachine 22 constructor CoffeeMachine appliances.kitchen.coffeemachine@coffeemachine(java.lang.string,java.lang.string,java.lang.string) 23 parameter modelparam appliances.kitchen.coffeemachine@coffeemachine(java.lang.string,java.lang.string,java.lang.string)@modelparam 24 parameter producerparam appliances.kitchen.coffeemachine@coffeemachine(java.lang.string,java.lang.string,java.lang.string)@producerparam 25 parameter worldpart appliances.kitchen.coffeemachine@coffeemachine(java.lang.string,java.lang.string,java.lang.string)@worldpart 26 field brewing appliances.kitchen.coffeemachine@brewing 27 field brewingtime appliances.kitchen.coffeemachine@brewingtime 28 field cc appliances.kitchen.coffeemachine@cc 29 method checkheatingelement appliances.kitchen.coffeemachine@checkheatingelement() 30 method cleanupcoffeemachine appliances.kitchen.coffeemachine@cleanupcoffeemachine() 31 field he appliances.kitchen.coffeemachine@he 32 method makecoffeemachineready appliances.kitchen.coffeemachine@makecoffeemachineready(int,int) 33 variable filter appliances.kitchen.coffeemachine@makecoffeemachineready(int,int)@filter 34 parameter spoonfuls appliances.kitchen.coffeemachine@makecoffeemachineready(int,int)@spoonfuls 35 parameter water appliances.kitchen.coffeemachine@makecoffeemachineready(int,int)@water 36 field model appliances.kitchen.coffeemachine@model 37 field producer appliances.kitchen.coffeemachine@producer 38 method switchon appliances.kitchen.coffeemachine@switchon() 39 variable counter appliances.kitchen.coffeemachine@switchon()@counter 40 field wc appliances.kitchen.coffeemachine@wc 41 class EvaTrio appliances.kitchen.evatrio 42 constructor EvaTrio appliances.kitchen.evatrio@evatrio() 43 variable cm appliances.kitchen.evatrio@evatrio()@cm 44 parameter eae appliances.kitchen.evatrio@evatrio()@eae 45 method main appliances.kitchen.evatrio@main(java.lang.string[]) 46 parameter Args appliances.kitchen.evatrio@main(java.lang.string[])@args 47 variable et appliances.kitchen.evatrio@main(java.lang.string[])@et 48 class Filter appliances.kitchen.filter 49 constructor Filter appliances.kitchen.filter@filter(java.lang.string) 50 parameter type appliances.kitchen.filter@filter(java.lang.string)@type 51 class WaterContainer appliances.kitchen.watercontainer 52 constructor WaterContainer appliances.kitchen.watercontainer@watercontainer() 53 method fill appliances.kitchen.watercontainer@fill(int) 54 parameter deciliters appliances.kitchen.watercontainer@fill(int)@deciliters 55 field water appliances.kitchen.watercontainer@water 56 package appliances.parts appliances.parts 57 class HeatingElement appliances.parts.heatingelement 58 constructor HeatingElement appliances.parts.heatingelement@heatingelement(java.lang.string) 59 parameter typeparam appliances.parts.heatingelement@heatingelement(java.lang.string)@typeparam 60 field type appliances.parts.heatingelement@type 61 method works appliances.parts.heatingelement@works() Table C.2: A list of all the abstracted entities from the Coffee Machine example. For simplicity not all the attributes are shown. The source code used in the abstraction can be seen in appendix C.1 on page 89.

105 C.3 Derived information and Screen captures 97 Id Kind Label Title 1 dlink cha:main.dlink1 architecture 2 slink cha:main.slink1 EvaTrio (TM) 3 dlink cha:main.dlink2 Coffee machine parts 4 dlink cha:main.dlink3 External parts 5 dlink cha:main.dlink4 error handling mechanisms 6 slink sec:specificparts.slink1 CoffeeMachine 7 slink sec:specificparts.slink2 CoffeeContainer 8 slink sec:specificparts.slink3 addfilter 9 slink sec:specificparts.slink4 filter 10 slink sec:specificparts.slink5 removefilter 11 slink sec:specificparts.slink6 CoffeeContainer 12 slink sec:specificparts.slink7 fill 13 slink sec:specificparts.slink8 WaterContainer 14 slink sec:specificparts.slink9 CoffeeContainer 15 slink sec:specificparts.slink10 fill 16 slink sec:specificparts.slink11 Filter 17 slink sec:specificparts.slink12 constructor 18 section sec:specificparts Coffee machine parts 19 slink sec:externalparts.slink1 EvaTrio (TM) 20 slink sec:externalparts.slink2 HeatingElement 21 slink sec:externalparts.slink3 constructor 22 slink sec:externalparts.slink4 flag 23 section sec:externalparts External parts 24 slink sec:architecture.slink1 CoffeeMachine 25 slink sec:architecture.slink2 ElectricalAppliance 26 slink sec:architecture.slink3 constructor 27 slink sec:architecture.slink4 makecoffeemachineready 28 slink sec:architecture.slink5 switchon 29 dlink sec:architecture.dlink1 heating element 30 slink sec:architecture.slink6 ElecAppException 31 dlink sec:architecture.dlink2 Error handling 32 slink sec:architecture.slink7 cleanupcoffeemachine 33 slink sec:architecture.slink8 filter 34 slink sec:architecture.slink9 coffee container 35 section sec:architecture Architecture 36 slink sec:error.slink1 ElecAppException 37 section sec:error Error handling 38 chapter cha:main Elucidative documentation for a Coffee Machine 39 edoc 0 Elucidative documentation for a Coffee Machine Table C.3: A list of all the abstracted entities from the CoffeeMachine.edoc -doc file. For simplicity not all the found attributes are shown. The file can be seen in appendix C.2 on page 93. Id Fromid Toid Kind containment containment containment containment containment containment containment containment containment containment containment containment containment containment Id Fromid Toid Kind containment containment containment containment containment containment containment containment containment containment containment containment containment containment Id Fromid Toid Kind containment containment containment containment containment containment containment containment containment containment refersto refersto refersto refersto refersto refersto Table C.4: A complete list of the abstracted relationships from the CoffeeMachine.edoc file. The file can be found in appendix C.2 on page 93.

106 98 Example: Coffee Machine Figure C.1: Screen captures of the -doc and a Java source code file, loaded into the editor.

107 C.3 Derived information and Screen captures 99 Figure C.2: Screen capture from the browser.

108 100 Example: Coffee Machine Figure C.3: Screen capture from the browser.

109 C.3 Derived information and Screen captures 101 (a) Documentation (b) Source symbols used (c) Source symbols using Figure C.4: Screen capture of the Navigation Window after the CoffeeContainer class from Figure C.3 on the facing page has been clicked on. (a) shows the places where the class is documented. (b) shows the source entities used by the CoffeeContainer. (c) shows the source entities using the CoffeeContainer.

Elucidative Programming in Open Integrated Development Environments for Java

Elucidative Programming in Open Integrated Development Environments for Java Thomas Vestdam Department of Computer Science, Aalborg University Fredrik Bajers Vej 7E 9220 Aalborg, Denmark odin@cs.auc.dk