Comments Adaptation. Diploma Thesis May 14, Adaptation of Source Comments and API Documentation When Source Code Changes. Edoardo P. J.

Size: px
Start display at page:

Download "Comments Adaptation. Diploma Thesis May 14, Adaptation of Source Comments and API Documentation When Source Code Changes. Edoardo P. J."

Transcription

1 Diploma Thesis May 14, 2007 Comments Adaptation Adaptation of Source Comments and API Documentation When Source Code Changes Edoardo P. J. Beutler of Winterthur, Swizerland ( ) supervised by Prof. Dr. Harald C. Gall Beat Fluri Department of Informatics software evolution & architecture lab

2

3 Diploma Thesis Comments Adaptation Adaptation of Source Comments and API Documentation When Source Code Changes Edoardo P. J. Beutler Department of Informatics software evolution & architecture lab

4 Diploma Thesis Author: Edoardo P. J. Beutler, Project period: Software Evolution & Architecture Lab Department of Informatics, University of Zurich

5 Acknowledgements First I would like to thank the whole S.E.A.L. team at the Department of Informatics of the University of Zurich for their advices, the comfortable working atmosphere and the many enjoyable lunch breaks. Especially I would like to thank Professor Harald Gall for giving me the opportunity to write this thesis. Further Beat Fluri as my supervising assistant, Michael Würsch and Emanuel Giger for their continuous introductions and support to the existing systems. Next I would like to thank Beat and my mother Adriana Beutler Romanò for proof reading my thesis. Last but not least I would like to thank my parents Adriana and Werner Beutler for their support during my years of studies.

6

7 Abstract Source comments and API documentation are an important part of the source code of software systems. Proper documentation increases the readability as well as the maintainability of source code. In this diploma thesis we investigate whether and when source comments change. We want to know whether comment changes are based on a change of the source code or not. If the comment changes are based on source code changes we are interested to know if the changes to the comments are done together with source code changes or if they are done in a later revision. To investigate this, we take advantage of CVS versioning data. Based on CHANGEDISTILLER, an existing Eclipse plugin, we implemented a plugin to find matching source code and comment changes. We used the plugin to evaluate two mid-size Java projects.

8

9 Zusammenfassung Kommentare und Schnittstellendokumentation sind ein wichtiger Bestandteil des Quellcodes von Software Systemen. Gute Dokumentation erhöht sowohl die Lesbarkeit, als auch die Wartbarkeit des Quellcodes. In dieser Diplomarbeit untersuchen wir ob und wann sich Quellkommentare ändern. Wir interessieren uns dafür ob einer Kommentaränderung eine Änderung des Quellcodes zu Grunde liegt, und falls ja, ob die Kommentaränderung gemeinsam mit der Quellcodeänderung stattgefunden hat oder erst später erfolgte. Um das zu untersuchen, nutzen wir CVS Versionierungsdaten. Wir haben ein Plug-In für Eclipse implementiert, welches auf dem CHANGEDISTILLER, einem existierenden Plug-In, aufbaut, und sowohl Kommentar- wie auch Quellcodeänderungen auf Gemeinsamkeiten untersucht. Das Plug-In haben wir dann verwendet um zwei mittelgrosse Projekte auf deren Gemeinsamkeiten bezüglich Quellcode und Kommentaränderungen zu untersuchen.

10

11 Contents 1 Introduction Motivation Envisioned outcome Structure Related Work Comment change detection Comment to source code matching Vector space based Information Retrieval model Recovering traceability links using Latent Semantic Indexing Applicability for our approach Comments Adaptation Outline of process Process overview Terminology Numeric analysis of source code to source comment change couplings Computing the number of changes Retrieving the results Rating the computed rankings Comment extraction and block building for line comments Mapping source code nodes to source comments Types of source comment to source code matchings Preparing the data The source comment to source code mapping algorithm Shortcomings of the algorithm Tracking comment nodes over multiple revisions Comment similarity measure Possible changes and their recognition Interpretation of the changes Summary Implementation Counting the number of comment changes in the simple analysis GNU diff Building blocks of line comments Adaptation of candidates as corresponding nodes Saving the results of the analysis

12 viii CONTENTS 5 Evaluation Validation and verification Evaluation process Evaluation of the Azureus BitTorrent Client The data evaluated Numeric analysis Detailed analysis Evaluation of the Eclipse Java Development Tools Core Component The data evaluated Numeric analysis Detailed analysis Conclusions Summary of contribution Lessons Learned Future Work A Too specific nodes 41 B Contents of the CD-ROM 43

13 CONTENTS ix List of Figures 2.1 Process to recover the traceability Outline of the procedure for our simple numeric analyze The outline of the steps performed during an in depth analysis. The colors are used to highlight the different chapters Overview over the rating of the ranking pairs Possible approaches to analyze the comment tracks Change recognition example. Evolution of a comment and its corresponding source code over several revisions List of Tables 2.1 Example for a Term-Document-Matrix The table shows the number of source code and comment changes. The rating In the last column is 1 if there is a coupling between the number of source code and comment changes, 0 if there is no coupling and -1 if the numbers can not be rated Results when analyzing the ArgoUML class Project. There are 172 revisions of the file (only 171 analyzed for revision 1.1, as the first revision, has no preceding revision to compare with) Examples for heavy changed classes Example for a repeating pattern Results from the analysis of Azureus. Unmodified basic version. All revisions are included Results from the analysis of the Azureus. Modified version the not rateable revisions have been eliminated to get a meaningful result Number of changes found in comments and their related source code nodes Numbers of common changes and when they happened Matching of change types for common changes Results from the analysis of the JDT Core Components. Unmodified basic version. All revisions are included Results from the analysis of the JDT Core Components. Modified version the not rateable revisions have been eliminated to get a meaningful result Number of changes found in comments and their related source code nodes Numbers of common changes and when they happened Matching of change types for common changes List of Listings 1.1 Different comment types Preceding, current and succeeding nodes One comment for one source code statement. Project.java Revision Three comment blocks describing one method. Project.java Revision One line comment describing several source code nodes. Project.java Revision

14 x CONTENTS 3.6 Comment succeeding the source code, but still on the same line. Project.java Revision A line comment, linking between the preceding and the succeeding source code. Project.java Revision The method wins against the conditional Javadoc example with different tags Identical comment Source code movement in relation to the comment. Version Identical comment Source code movement in relation to the comment. Version Identical comment Source code movement in relation to the comment. Alternate version Six line comments belonging to one block. Project.java Revision Javadoc with a succeeding modifier, being part of a field declaration

15 Chapter 1 Introduction 1.1 Motivation By now it is a commonplace that code is read much more often than it is written. Probably everybody used to programming, is familiar with the challenges of keeping the documentation up to date: Before writing any code the programmer does not feel like writing comments, after all there is no code yet and he does not know what exactly the code will look like. While programming, when the mind is focused on writing code and is concentrated on many details to take care of, the programmer is reluctant to write comments. They can still be written when the code works and is not anymore changed that often. When the code is written, the programmer knows what he has written and there is so much more to do, so why comment now? Some (probably not so long) time later the programmer does not know anymore in detail why his solution of a problem looks as it does, so a serious commentation would need a lot of time, since the code has to be rethought. Thus, making commenting much more painstaking, the programmer does not want to comment his old code. So the task of commenting the source code is often neglected; notwithstanding, everybody programming for longer than the last one or two weeks, knows the value of good comments. Comments allow a fast overview and understanding of the code. Where there are multiple solutions, they can inform the programmer or his successors, maybe years after the code has been written, why a certain solution was preferred over another one. Comments are crucial to sustain a good maintainability of source code. Maintenance can be nearly impossible if comments are outdated, e.g., not adapted after source code changes, or do not exist at all. Knowing the situation, it seems interesting to know how this ambivalent relation (love to have, but not as much to write) between programmers and comments influences actual projects. Are there often bugs due to outdated or missing comments and documentation (e.g., wrong use of a changed interface)? How much time is consumed by trying to reengineer source code which, with a few comments, would be easy to understand and maintain? Are there similarities between different projects? Thanks to todays versioning and bug tracking techniques used in most software development projects, there is plenty of historical data available. Nowadays there are lots of projects, using

16 2 Chapter 1. Introduction /** * Javadoc with tags, documenting a method nr the number to extract the root from returns the result as a double precision number. */ public double squareroot(double nr){ //Variable for the result - discribed in a line comment double result; /* A block comment: * delegation of the actual computing */ result = Math.sqrt(nr); } return result; Listing 1.1: Different comment types. advanced systems for versioning (e.g., Concurrent Versioning System (CVS), 1 Subversion, 2 or Rational ClearCase 3 ) or bug tracking (e.g., Bugzilla 4 or Mantis 5 ). Especially in the case of versioning most of this data is rarely used because it is only seen as insurance for the programmers. Yet there is much more potential in such databases. It would be interesting to use this memory of existing software systems. By that it could be possible to make predictions on bug-prone parts in a system, or to give advices to developers about how to write, handle and maintain comments as simple as possible. The first steps, in using such historical data to make statements on the usage of source code comments and documentation in software development projects, is to extract the source comments and documentation from the source code. Further to track it over time and to find out whether and when changes happen. These first steps are the goal of our work. 1.2 Envisioned outcome In this diploma thesis we aim at an extension of the CHANGEDISTILLER plugin [FG06] for Eclipse. 6 We want to be able to track the changes of source comments, i.e., to know, whether and when these changes are done, e.g., are source code changes often done together with comment changes? Are they done in the following revision, or maybe never at all? Source comments include thereby Javadoc API 7 documentation as well as single line and block comments (e.g., see Listing 1.1). We aim at extracting the source comments and map them to their corresponding source code nodes retrieved by CHANGEDISTILLER. Finally we aim at evaluating our work using two Java projects to conduct a case study. The 1 Concurrent Versioning System (CVS) last visited on May 4, Subversion last visited on May 4, Rational ClearCase last visited on May 4, Bugzilla last visited on May 4, Mantis last visited on May 4, Eclipse last visited on May 2, API Application Programming Interface

17 1.3 Structure 3 projects we are going to analyze are the Eclipse Java Development Tools Core Component (JDT Core) 8 with around 1.4 thousand classes and 37 thousand revisions and the Azureus BitTorrent Client 9 with about 2.8 thousand classes and 26 thousand revisions. 1.3 Structure The remainder of the thesis is structured as follows: In Chapter 2 we present other work related to source code to documentation traceability. Chapter 3 introduces our approach to find changes in source comments. Further we show how we bind the source comments to their related source code and track them over the lifetime of a file. Finally we discuss the analysis and the interpretation of the results. In Chapter 4 we present details on several specific points of the implementation. Chapter 5 shows the validation and verification we have done. Then the evaluation process is introduced and finally the results from the evaluation are presented. In Chapter 6 conclusions are drawn and we show some ideas for future work. 8 Java Development Tools Core Component (JDT Core) last visited on May 2, Azureus BitTorrent Clienthttp://azureus.sourceforge.net/, last visited on May 2, 2007

18

19 Chapter 2 Related Work In this chapter we look at related work in the field of comment change detection and comment to source code matching. We discuss the differences, i.e., applicability, to our approach. 2.1 Comment change detection We have found no work on change detection putting the main focus on the detection of source comment changes. There are lots of workings on change detection, but only on the detection of source code changes. Changing of source comments is not treated, normally it is even explicitly excluded. 2.2 Comment to source code matching Also in the field of source code to comment matching there is hardly any work available. We found no work on the matching of source comments to source code. As well as in the field of source code change detection, also here, comments are in general explicitly eliminated. However there are papers available concerning the matching of source code to (code-extern) free text documentation. The major part of these papers introduces techniques for linking source code to documentation during development of a system. These are mostly unsuited or even not applicable for recovering links in existing systems. They usually are based on certain grammars, the programmers have to use during the development, to establish and trace the couplings. In the following subchapters two different approaches are shown. Approaches which are able to retrieve source code to documentation links for existing systems Vector space based Information Retrieval model In their workings Recovering traceability links between code and documentation [ACC + 02] and Information Retrieval Models for Recovering Traceability Links between Code and Documentation [ACCL00], G. Antoniol et al. introduces a method to recover traceability links, based on vector space Information Retrieval (IR). A pre-condition of this method is, that programmers use meaningful names in their source code. The whole algorithm works as follows (see Figure 2.1 for a graphical overview):

20 6 Chapter 2. Related Work Code extern documentation Text Normalisation Indexer Indexer Document Classifier Scored Document List Source code Query Extraction Figure 2.1: Process to recover the traceability. 1. In a first step the source code and the documentation are prepared: Code extern documentation (upper path in the figure) The whole documentation is normalized (i.e., all letters are converted to lower case, all stop words like, e.g., articles, punctuation, numbers, etc. are removed, and all plurals are converted into singular as well as all flexed verbs to the infinity form). Source code (lower path in the figure) For each source code class a query is built. Identifiers consisting of different words are decomposed (e.g., printstacktrace to print, Stack, and Trace). Finally the words are normalized (same steps as for the code extern documentation). 2. The indexer builds an index, using a vocabulary, built from the documentation (respectively the source code) itself. 3. At last the document classifier computes the similarity between documentation and queries. As result, for each class, a scored list of documents is returned Recovering traceability links using Latent Semantic Indexing This idea, introduced by Andrian Marcus and Jonathan I. Maletic in Recovering documentationto-source-code traceability links using latent semantic indexing [MM03], is one of the few, applicable also to existing (e.g., legacy) systems. In this approach the links between the code extern (prose) documentation and the source code are done using Latent Semantic Indexing (for a brief introduction see the next subsection). Marcus and Maletic use the source comment and the identifier names (e.g., method or field names) to produce semantics. These semantics are used to construct traceability links to the source code extern documentation. Of course this only delivers useful results if the naming is meaningful in the source code components, as well as in the documentation. There are often approaches using predefined vocabulary (mostly for statistical analysis), necessitating expensive preprocessing and complex string manipulations. This solution since based on Latent Semantic Indexing uses no such vocabulary. Marcus and Maletic state that, due to saving the preprocessing and string manipulation, their solution is faster than others. Latent Semantic Indexing in a nutshell Latent Semantic Indexing (LSI) was introduced by Scott C. Deerwester et al. in the article Indexing by Latent Semantic Analysis [DDL + 90]. LSI can be used to identify main components (or

21 2.3 Applicability for our approach 7 concepts) of huge amounts of data. If for example the general expression Ship is identified, also the expressions Boat, Cutter, or Shallop are contained. LSI can also help identifying places, where Ship is mentioned, but is not a relevant match (e.g., a contest where a boat trip can be won). LSI works based on mathematical matrix operations. First a Term-Document-Matrix is built. In that matrix the number of occurrences of each term (i.e., word) per document is saved. In the Table 2.1 for example the term amazing appears three times in document 2, once in document 5, and never in the documents 1, 3, 4, and 6. If necessary the table can also be weighted. In a next doc 1 doc 2 doc 3 doc 4 doc 5 doc 6 this term is amazing Table 2.1: Example for a Term-Document-Matrix step, the singular values of the matrix (X) are built (singular-value decomposition (SVD) see formula below). The resulting matrixes T and D have orthonormal columns and S is diagonal. Then, through omitting the lowest singular values, the dimension can be reduced until up to an undefined limit k (the reduced matrixes are S k and analog U k ). Finally queries q can be transformed to the semantic space (they are seen as special documents with the size (m 1)). SVD: (X = T S D) Query: Q = qt U k diag(s k ) After transforming all the documents to the semantic space (same procedure as transforming q), q can be compared to the documents by using inner product or cosine similarity. The advantage of LSI is the solution of the synonym problem. The disadvantage is it s not so good treatment of polysemy (i.e., one word, having different meanings) as well as the relatively high amount of computing power needed. 2.3 Applicability for our approach Notwithstanding the apparent similarity of matching source code to source comments or to source extern documentation, we found these tasks to be not as exchangeable as they seem. Normally source code extern documentation is much more detailed and contains more labeled connections to the source code then comments between the source code. By labeled connections we mean that there has to be a description which part of the source code is documented. On the other hand, connections between source comments and the source code are more defined through their proximity. A source comment has not to explain through words, to which source code node he attends. So the position of a source comment inside the document is an important, not to say the most important, part of the information. It is crucial to recover the link to the source code. Based on these two points, we think that these approaches are not applicable in a first step, but can be interesting in the future, when it comes to optimizing the results and recognize special types of source code to source comment couplings. What we doubt, is, that these advanced techniques are useful in a first phase. Due to the mentioned differences of the problem it is not possible to

22 8 Chapter 2. Related Work use the same (i.e., unaltered without major changes) algorithms to find source code to source comment couplings.

23 Chapter 3 Comments Adaptation In this chapter we present our tools and algorithms used to extract and analyze the source code to comment couplings. 3.1 Outline of process Here we give an overview of the processes introduced during this chapter. Further we explain the terminology used Process overview Simple numeric analysis of change couplings In the first part we show a relatively simple tool to perform a quantitative comparison. This comparison is able to provide hints whether source code changes are coupled to changes in the source comments or not. Revisions of a class Comments of a revision GNU diff number of comment changes Data Base number of source code changes Add up occurences Figure 3.1: Outline of the procedure for our simple numeric analyze. In Figure 3.1 this simple analysis is depicted. First we get all revisions available of the class we want to analyze. From the revisions we extract only the comments. Then, each pair of consecutive comments is compared and the number of changes is summed. The numbers of source code changes for that revision are fetched from the data base and the resulting numbers are saved for further analysis.

24 10 Chapter 3. Comments Adaptation public static void main(string[] args){ int i = 0; //output System.out.println("i is " + i); } Listing 3.1: Preceding, current and succeeding nodes. In depth analysis process For the in depth analysis more work has to be done. For a graphical overview see Figure 3.2. First the source code of all revisions of a file is fetched from the data base. Then for each revision all the comments are extracted. Several line comments have to be combined into one whenever they comment the same piece of source code. When this is done, the comments have to be mapped to their corresponding source code. After that these mappings can be tracked over multiple revisions and the comment changes, as well as the changes in their related source code can be analyzed. To do this, we will take advantage of the source code changes stored in the database. At the end we write the results to a file for further analysis. Comment extraction and... Data Base Revisions of a class (source code)... block building for line comments Mapping source code nodes to source comments Tracking comment nodes over multiple revisions Result extraction Source code change operations Figure 3.2: The outline of the steps performed during an in depth analysis. The colors are used to highlight the different chapters Terminology Here we introduce several important expressions used during this and the following chapters. Node, source node, source code node and source comment node. A source comment node is one comment, e.g., a block of Javadoc, a block, or a line comment. A source code node is, with few exceptions, a line of source code. The expressions node and source node include source code node, as well as source comment node. We talk about nodes because we are working with Abstract Syntax Trees (AST). Their nodes are the source code or comment elements we analyze. Preceding node, current node, and succeeding node. The current node is the one we are analyzing at the moment. When talking about preceding (respectively succeeding) nodes, we mean the nodes immediately before (respectively after) the current node. In the example in Listing 3.1 the current comment node

25 3.2 Numeric analysis of source code to source comment change couplings 11 //This line comments, as well as //this one, the following source code System.out.println("Success!"); Listing 3.2: A block, consisting of two line comments. //output has the variable declaration for i as preceding code and the method invocation of System.out. println(...) as succeeding. When talking about whole revisions of a class, e.g., the revision 1.34 is preceding revision 1.35 which is succeeded by revision Track of nodes A track of nodes, either source code or comment, is the sequence of one specific node, tracked over all revisions of a class. Change type insert, delete, and update We define the three operations insert, delete, and update of source code as change types. A source code insert is a source code part that has been added in the current revision, a delete operation is analogue a deleted part of source code, and an update a changed source code part. Corresponding node or related node The corresponding (or related) source code node for a certain comment node is the source code described by the comment. Analogue the corresponding comment of a source code node is the comment describing the source code node. Common change A common change is a change of a source comment with a change of the corresponding source code node at the same time, i.e., in the same revision. Common changes can be split into two different types: common changes of the same change type, e.g., code and comment are both updated. 2. a different type, e.g., the code has been updated, while the comment has been deleted. Comment block or block of comments These expressions signify blocks of different comment types (not the same as a block comment). A comment block can be either a single Javadoc or block comment, or one or more line comments, combined to a block. In Listing 3.2 we can see one block, consisting of two line comments. 3.2 Numeric analysis of source code to source comment change couplings In a first step we wanted to see, on a simple numeric base, if we are able to find clues whether the adaptation of comments is related to changing source code. This is a first analysis of the relation between code and comment and independent from the in depth analysis presented in the next sections. The results of this analysis can be used as a complement to the specific results, but their computation does not directly rely on each other.

26 12 Chapter 3. Comments Adaptation Computing the number of changes For this analysis we implemented an Eclipse plugin. The plugin uses the database created by EVOLIZER 1 and CHANGEDISTILLER to get information on source code changes. For each revision of each class (i.e.,.java file) we get the number of changes to the preceding revision, i.e., number of insert, delete, and update operations. These numbers are compared to the numbers of inserts, deletes, and updates in the source comments. For implementation details for the computation of the number of comment changes see Section 4.1. As result we get for each revision of a class the total number of inserts, deletes, and updates for all source code nodes, as well as for all comment nodes Retrieving the results A simple comparison of the computed total numbers does not deliver a useful result. The numbers them-selfs, as well as percentages are not significant, due to the missing relation between them. If we, e.g., assume there are three inserts in the source code, but only one in the comments. So there are three time as much source code inserts as comment inserts. But what does that mean how can this be interpreted? It can mean that two source code inserts are uncommented, but it can as well be one source comment, commenting all three source code inserts. We have no possibility to base a decision on. We decided to do a comparison on a quantity basis. We ranked the three change types after the frequency of their appearance, e.g., assuming three source code inserts, one delete, and five updates, the ranking is updates (5) before inserts (3) and deletes (1). This is done for the number of source code changes as well as for the number of source comment changes. Having computed the ranking we can compared the order of the ranks. This allows us to make statements like e.g., in the source code, as well as in the comments there were more updates than inserts or deletes. This provides a hint whether there are common changes of source code and comments in a revision or not. This is because of the assumption that, having a high number of source code and comment changes of the same type, makes it more probable that there are common changes in a revision. See Table 3.1 for an example output. We used the class org.argouml.kernel.project from the ArgoUML Project 2 for the example Rating the computed rankings As we can see in the example in Table 3.1, each revision gets a rating of either 1, 0, or 1. The list after this paragraph explains to the different ratings. In each case the expected statistical probability is given on the last line, because in a random distribution not every rating is equally probable. The probability is calculated by dividing the recognized cases through all possible cases (see Figure 3.3 for more details). A further distinction allows a special case; three identical values are treated differently, based on their value. When the total number of changes (for source code or comments) is zero this is rated with a 1, else with a 1. This fact is not reflected in our probability measure, thus the expected probability for a 1 is rated slightly to high and the one for a 1 accordingly to low. To take account of this special case, we reduce the probability for a rating of 1 by 3% and increase the one for a 1 as rating by the same amount of percentage points. The value of 3% is based on a number of tests where we compared how often the two cases really appear. Normally three identical values only occur when no changes to the source (code or comment) were applied, i.e., in 97% of the occurrences there were no changes. 1 EVOLIZER last visited on May 4, last visited on May 2, 2007

27 3.2 Numeric analysis of source code to source comment change couplings Matchings result in a rating of: = 91 possible mappings Mapping between two rankings. A matching results in a rating as indicated above the graphs Ranking. The leftmost number indicates the change type which occurred most, the rightmost the least frequent (1 = insert, 2 = delete, 3 = update and 0 = two or all three types were equally frequent. Figure 3.3: Overview over the rating of the ranking pairs.

28 14 Chapter 3. Comments Adaptation revision insert delete update insert delete update rating code code code comment comment comment Table 3.1: The table shows the number of source code and comment changes. The rating In the last column is 1 if there is a coupling between the number of source code and comment changes, 0 if there is no coupling and -1 if the numbers can not be rated. A 1 is rated only if there were no changes: neither in the source code, nor in the comments. Because there can not be any common changes, when there are no changes at all, such revisions are not rated. The expected probability is 13 out of 91 cases, thus or 14.29%. If we reduce this value (to accommodate for the zero not zero values distinction) by 3% this results in a reduction of 0.43 percent points and in a probability of respectively 13.86%. There are two conditions, rated with a The most frequent change type in the source code is not the same as the most frequent comment change type and neither the first place of the source code, nor the one of the comment is shared. Shared first place means that the first and second most frequent have the same number of changes. Revision in Table 3.1 is an example for such a decision. The most changes in the source code were deletes, but the most changes in the source comment were updates). 2. The most frequent change type of either source or comment is the least frequent change type of the other and they are not shared first (respectively) third places, i.e., having the same number of changes as the second most frequent. For an example see revision in Table 3.1. The most frequent comment change type are deletes, which is the least frequent source code change type. There are 36 of the 91 cases leading to a 0 as rating. Accordingly the probability is or 39.56% All not yet mentioned cases result in a rating of 1. This are all the cases where the most frequent change types match, as well as most cases where two change types with the same number of changes are involved. The expected probability is or 46.15% (42 out of the 91 cases). This value has to be increased by the 0.43 percent points, subtracted in the 1 case, resulting in a total of , or 46.58%.

29 3.3 Comment extraction and block building for line comments 15 As last part in this section, we present results we retrieve, when using our plugin to analyze our example class (i.e., ArgoUML s Project.java). We are not going into detail here, but only show Table 3.2. During the case studies (see Chapter 5), we give a detailed analyze and explanation of similar tables. Ranking does match... does not match... is not rateable total number of occurrences percentage of occurrences 32.75% 17.54% 49.71% % expected percentage 46.58% 39.56% 13.86% % occurrence / expectation Table 3.2: Results when analyzing the ArgoUML class Project. There are 172 revisions of the file (only 171 analyzed for revision 1.1, as the first revision, has no preceding revision to compare with). 3.3 Comment extraction and block building for line comments We use the ASTParser of Eclipse to extract the source comment nodes. The parser builds an Abstract Syntax Tree (AST) from the source code. Unfortunately this AST contains only the Javadoc comments. Line- and block comments are excluded because they are missing a distinct link to the source code. The parser cannot be sure where to put these comments in the tree. Nevertheless the comments are returned. The parser generates a List containing all the comments, even Javadoc, which is already contained in the AST. While it is possible to take Javadoc and block comments as they are, line comments have, where necessary, to be combined to a block of line comments. For more details see Section 4.2. Remember, we use the expressions comment block or block of comments (but not block comment) for a single Javadoc comment, a single block comment, and one or more line comments, combined to a block. 3.4 Mapping source code nodes to source comments To find the corresponding source code node for a certain comment block, whilst for humans in most cases a trivial task, is hard for a computer. Matching is a difficult task, as long as the computer only does syntactical matching, hence no semantic matching, i.e., matching formal things, but no abstract things, e.g., matching words, but not their meaning. This is probably one of the main reasons why in this field the comments are explicitly excluded in most of the work (see Chapter 2). First we show different possibilities for source comments and their mapping to source code nodes i.e., patterns to recognize. Then our matching algorithm is presented and at last we show the cases recognized, and the ones not recognized by our algorithm respectively Types of source comment to source code matchings All examples used in this section are taken from different revisions of ArgoUML s file Project.java (class org.argouml.kernel.project). One source comment to one source code node In the majority of cases one source comment block can be matched to one source code node.

30 16 Chapter 3. Comments Adaptation /** * True if we are in the proces of making a project, otherwise false */ private static boolean _creatingproject; Listing 3.3: One comment for one source code statement. Project.java Revision 1.44 /** * Moves some object to trash. This mechanisme must be rethought since * it only deletes an object completely from the project obj The object to be deleted org.argouml.kernel.project#trashinternal */ //////////////////////////////////////////////////////////////// // trash related methos //Attention: whole Trash mechanism should be rethought concerning nsuml public void movetotrash(object obj) {... } Listing 3.4: Three comment blocks describing one method. Project.java Revision 1.44 Normally the source code succeeds the comment. This case can be expected as default. See Listing 3.3 for an example. Multiple source comments to one source code node There are also cases with several source comments for one source code node. Here has to be decided, whether all the comments match the succeeding source code node, or if a part of them matches the preceding node. See Listing 3.4 for an example. One source comment to multiple source code nodes As there can be multiple comments for one source comment node, it is also possible to have one source comment for multiple source code nodes. This can be for example a comment, stating that constant declarations follow, as we can see in Listing 3.5. This is not trivial to recognize. The algorithm would have to understand the semantic of the source code to find such occurrences. Especially for not very detailed comments (like simply constants in the example). Source comment succeeding a source code node There can also be source comment succeeding the corresponding source code. This comment can be either on the same line as the source code (as, e.g., in Listing 3.6), or on a // constants public static final String SEPARATOR = "/"; public final static String FILE_EXT = ".argo"; public final static String TEMPLATES = "/org/argouml/templates/"; Listing 3.5: One line comment describing several source code nodes. Project.java Revision 1.1

31 3.4 Mapping source code nodes to source comments 17 class ResetStatsLater implements Runnable { public void run() { Project.resetStats(); } } /* end class ResetStatsLater */ Listing 3.6: Comment succeeding the source code, but still on the same line. Project.java Revision 1.1 while (iter.hasnext()) { currentmember = iter.next(); if (currentmember instanceof ProjectMemberTodoList) { /* No need to have several of these */ return; } } // got past the veto, add the member _members.addelement(pm); Listing 3.7: A line comment, linking between the preceding and the succeeding source code. Project.java Revision 1.97 succeeding line (e.g., see the Javadoc in Listing 3.7). This can be handled analogue to one comment, succeeded by one source code node occurrences. Source comment linking between two source code nodes As a last type of matchings there are source comments linking between two source comment nodes. In Listing 3.7 we can see an example of a line comment, first commenting what decision was taken in the preceding while loop and then stating what is going to happen in the succeeding method invocation. As already seen in the one comment to several source nodes mapping, this is undecidable without a semantic checking Preparing the data Some preliminary work has to be done before the actual matching and mapping. Besides the obvious extraction of the comments described in Section 3.3 we have to find the candidates as corresponding source code nodes for each source comment block. This task can be divided into two subtasks. 1. Generally we consider the comment as corresponding to the source code node preceding or succeeding it. We find these nodes by searching the source code nodes with the closest start positions to the comment block. 2. In a second step we have to check the possible candidates for their plausibility and adapt them if needed. More details on this can be found in Section 4.3.

32 18 Chapter 3. Comments Adaptation public boolean istrue(string thesis){ /* * magic box: * the return value indicates whether the thesis is true or not. */ return secretmagicthesistester(thesis); } Listing 3.8: The method wins against the conditional The source comment to source code mapping algorithm Here we present the algorithm we use to decide which of the candidates found in the last section is the corresponding source code node for a certain comment. The algorithm works with a rating system. There are conditions, granting points to the candidates. Points are granted for the following: 0.5 Point for the succeeding node. Half a point gets the source code node succeeding the source comment. The reason is the higher probability for a succeeding, than for a preceding source code node. Normally the comments are preceding the corresponding source code. We use only 0.5 point instead of 1 point, because this is only intended to make the decision when else there is a draw, but it is not intended to influence the result in other situations. Thus, if both candidates are equally probable the succeeding node is taken as corresponding node. 1 Point if on the same line. If the source comment is on the same line as a source code node, one point is granted. Comments, sharing a line with source code, normally are commenting that code. int iter = 0; //Iterator for while loop 1 Point if on the preceding, respectively succeeding, line Source code nodes on the line preceding, respectively succeeding, the source comment immediately are granted a point. By immediately we mean that there is not more than one line break between the nodes. This is, because comments normally are in direct proximity of their corresponding source code node, meanwhile between a source code node and a comment related to another source code node there is normally more free space. 1 Point for each word matching Each word appearing in the source comment as well as in the source code node grants a point, e.g., the listing after this paragraph grants two points, one for the int and one for the iter appearing in the source code node and in the source comment). int iter = 0; //The int iterator iter is used for the next loop After summing the points for both candidates, there can be three results. As explained, a draw is not possible, therefore two possible results remain. Either the preceding or the succeeding source code node has more points than the other. In these cases the node with more points has won and is saved as corresponding source code node for the current comment. After first tests, we recognized one important weakness of the algorithm. Large source code structures always win, thus are seen as corresponding node. This is because they often contain the

33 3.4 Mapping source code nodes to source comments 19 source comment itself. In Listing 3.8 there is an example comment with a preceding method node and a succeeding return statement. For the reader, understanding the semantics, it is obvious that the comment describes what is returned by the return statement. Thus the corresponding node is the return statement. However, the algorithm in its first version comes to a different result. In the analysis with the algorithm, the return statement gets 1.5 points through its position (0.5 point for being the succeeding node and 1 point for only one line break in between). 2 more points for matchings of return, and thesis. Comes to a total of 3.5 points for the return statement. The method gets 1 point for the proximity, as well as 2 points for the return statement it contains (as calculated before). The matching of the header and the comment itself grants 16 additional points (i.e., 1 from the thesis in the header and 15 from the comment itself 1 for each word plus 2 because the appears twice), resulting in a total of 19 points. So the preceding method node wins clearly with its 19 points against the 3.5 points of the return statement, which should win. To counter those situations, we decided to work only with the header for larger structures. In the example case, we only take public boolean istrue(string thesis) to calculate the points. Doing so the comment itself and the succeeding node (i.e., the return statement) are no longer contained in the preceding node. The result in the example is as follows: The succeeding node still gets his 3.5 points. For the return statement nothing changes. Other for the method. The method still gets 1 point for the proximity and 1 for the matching of thesis in the header. The methods total points are 2, which looses against the 3.5 from the return statement Shortcomings of the algorithm Our algorithm is able to recognize most of the source code to source comment matching types mentioned in Section However there are some types that are not recognizable. The ones the algorithm recognizes are: One source comment to one source code node. Multiple source comments to one source code node. Source comment succeeding a source code node. The algorithm does not recognize the following: One source comment to multiple source code nodes. Source comment linking between two source code nodes. The reason we decided not to implement a recognition for them is the high complexity of the problem, as well as the much higher computing power that is needed to get at least some results. To get good recognition results in such situations it is inevitable for an algorithm to understand the semantics of the given source code. Only then he can find reliable source comment to source code matchings in real life environments. 3 A number of random samples for such comment nodes, not recognizable by our algorithm showed: normally the comment contains no syntactic clues an algorithm can use to find out whether a comment describes one node or more nodes. Hence, the syntactical analysis (and analog also statistical) can only deliver few results, but needs lots of computing power for the large number of different possible solutions. 3 By real life environment we mean a source code of an arbitrary project. Source code which is not explicitly designed to be understood by an algorithm, but is designed for humans.

34 20 Chapter 3. Comments Adaptation /** * Returns the character found at the specified position in the * given String. * str The String to get the character from. index An index, specifying a position within the String. The character at the specified position. IndexOutOfBoundsException is thrown if the index is not * within the String. */ public char charat(string str, int i) throws IndexOutOfBoundsException{ return str.charat(i); } Listing 3.9: Javadoc example with different tags. In detail we have to check for each comment node (m) every possible combination of source code nodes. Which is for a number of n source code nodes: n 0 n n n 1 n + n = n n =2 n k So the total runtime used is m 2 n, thus exponential which is really slow. Even if we make a constraint, by only allowing connected blocks of code nodes the runtime stays in polynomial (i.e., cubic) order. By connected code blocks we mean, that no gaps of non corresponding nodes are allowed between corresponding ones. k=0 3.5 Tracking comment nodes over multiple revisions To find changes in the source comments and in the corresponding source code nodes (especially changes over several versions), it is necessary to track a comment over the whole chain of revisions. In the first subsection we discuss our source comment similarity measure used to find changes (and their type) between two versions of a comment. In the next subsection we present our tracking algorithm as well as the algorithm to recognize the changes (and change types), of a comment. Finally, we discuss the evaluation of the found comment chains to get meaningful results Comment similarity measure When looking for changes in comment nodes we can distinguish between Javadoc and line or block comments. In contrast to the structureless line and block comments, Javadoc has a syntax, thus it can be divided into different tags, related to certain parts of e.g., a method declaration as in Listing 3.9. We would like to add, that our current analyze uses only a part of the information available. We only analyze whether there has been a change and if, what type of change it has been. No details like e.g., which part of a comment node has been changed are evaluated until now.

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I

BASIC COMPUTATION. public static void main(string [] args) Fundamentals of Computer Science I BASIC COMPUTATION x public static void main(string [] args) Fundamentals of Computer Science I Outline Using Eclipse Data Types Variables Primitive and Class Data Types Expressions Declaration Assignment

More information

CS 142 Style Guide Grading and Details

CS 142 Style Guide Grading and Details CS 142 Style Guide Grading and Details In the English language, there are many different ways to convey a message or idea: some ways are acceptable, whereas others are not. Similarly, there are acceptable

More information

Propositional Logic. Part I

Propositional Logic. Part I Part I Propositional Logic 1 Classical Logic and the Material Conditional 1.1 Introduction 1.1.1 The first purpose of this chapter is to review classical propositional logic, including semantic tableaux.

More information

CS 315 Software Design Homework 3 Preconditions, Postconditions, Invariants Due: Sept. 29, 11:30 PM

CS 315 Software Design Homework 3 Preconditions, Postconditions, Invariants Due: Sept. 29, 11:30 PM CS 315 Software Design Homework 3 Preconditions, Postconditions, Invariants Due: Sept. 29, 11:30 PM Objectives Defining a wellformed method to check class invariants Using assert statements to check preconditions,

More information

CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output

CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output CS112 Lecture: Variables, Expressions, Computation, Constants, Numeric Input-Output Last revised January 12, 2006 Objectives: 1. To introduce arithmetic operators and expressions 2. To introduce variables

More information

CS112 Lecture: Working with Numbers

CS112 Lecture: Working with Numbers CS112 Lecture: Working with Numbers Last revised January 30, 2008 Objectives: 1. To introduce arithmetic operators and expressions 2. To expand on accessor methods 3. To expand on variables, declarations

More information

6. Operatoren. 7. Safe Programming: Assertions. Table of Operators. Table of Operators - Explanations. Tabular overview of all relevant operators

6. Operatoren. 7. Safe Programming: Assertions. Table of Operators. Table of Operators - Explanations. Tabular overview of all relevant operators 6. Operatoren Tabular overview of all relevant operators 180 Table of Operators Description Operator Arity Precedence Associativity Object member access. 2 16 left Array access [ ] 2 16 left Method invocation

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 04 Programs with IO and Loop We will now discuss the module 2,

More information

CS 215 Software Design Homework 3 Due: February 28, 11:30 PM

CS 215 Software Design Homework 3 Due: February 28, 11:30 PM CS 215 Software Design Homework 3 Due: February 28, 11:30 PM Objectives Specifying and checking class invariants Writing an abstract class Writing an immutable class Background Polynomials are a common

More information

News and information! Review: Java Programs! Feedback after Lecture 2! Dead-lines for the first two lab assignment have been posted.!

News and information! Review: Java Programs! Feedback after Lecture 2! Dead-lines for the first two lab assignment have been posted.! True object-oriented programming: Dynamic Objects Reference Variables D0010E Object-Oriented Programming and Design Lecture 3 Static Object-Oriented Programming UML" knows-about Eckel: 30-31, 41-46, 107-111,

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Controls Structure for Repetition

Controls Structure for Repetition Controls Structure for Repetition So far we have looked at the if statement, a control structure that allows us to execute different pieces of code based on certain conditions. However, the true power

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

A macro- generator for ALGOL

A macro- generator for ALGOL A macro- generator for ALGOL byh.leroy Compagnie Bull-General Electric Paris, France INTRODUCfION The concept of macro-facility is ambiguous, when applied to higher level languages. For some authorsl,2,

More information

CSE 12 Abstract Syntax Trees

CSE 12 Abstract Syntax Trees CSE 12 Abstract Syntax Trees Compilers and Interpreters Parse Trees and Abstract Syntax Trees (AST's) Creating and Evaluating AST's The Table ADT and Symbol Tables 16 Using Algorithms and Data Structures

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

Lecture Notes CPSC 224 (Spring 2012) Today... Java basics. S. Bowers 1 of 8

Lecture Notes CPSC 224 (Spring 2012) Today... Java basics. S. Bowers 1 of 8 Today... Java basics S. Bowers 1 of 8 Java main method (cont.) In Java, main looks like this: public class HelloWorld { public static void main(string[] args) { System.out.println("Hello World!"); Q: How

More information

6.170 Laboratory in Software Engineering Java Style Guide. Overview. Descriptive names. Consistent indentation and spacing. Page 1 of 5.

6.170 Laboratory in Software Engineering Java Style Guide. Overview. Descriptive names. Consistent indentation and spacing. Page 1 of 5. Page 1 of 5 6.170 Laboratory in Software Engineering Java Style Guide Contents: Overview Descriptive names Consistent indentation and spacing Informative comments Commenting code TODO comments 6.170 Javadocs

More information

Programming Lecture 3

Programming Lecture 3 Programming Lecture 3 Expressions (Chapter 3) Primitive types Aside: Context Free Grammars Constants, variables Identifiers Variable declarations Arithmetic expressions Operator precedence Assignment statements

More information

2 rd class Department of Programming. OOP with Java Programming

2 rd class Department of Programming. OOP with Java Programming 1. Structured Programming and Object-Oriented Programming During the 1970s and into the 80s, the primary software engineering methodology was structured programming. The structured programming approach

More information

CPS122 Lecture: From Python to Java last revised January 4, Objectives:

CPS122 Lecture: From Python to Java last revised January 4, Objectives: Objectives: CPS122 Lecture: From Python to Java last revised January 4, 2017 1. To introduce the notion of a compiled language 2. To introduce the notions of data type and a statically typed language 3.

More information

1 Lexical Considerations

1 Lexical Considerations Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.035, Spring 2013 Handout Decaf Language Thursday, Feb 7 The project for the course is to write a compiler

More information

1 Overview. 2 Basic Program Structure. 2.1 Required and Optional Parts of Sketch

1 Overview. 2 Basic Program Structure. 2.1 Required and Optional Parts of Sketch Living with the Lab Winter 2015 What s this void loop thing? Gerald Recktenwald v: February 7, 2015 gerry@me.pdx.edu 1 Overview This document aims to explain two kinds of loops: the loop function that

More information

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 6: Information Retrieval and Web Search. An introduction Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods

More information

Bits, Words, and Integers

Bits, Words, and Integers Computer Science 52 Bits, Words, and Integers Spring Semester, 2017 In this document, we look at how bits are organized into meaningful data. In particular, we will see the details of how integers are

More information

Instructor: Stefan Savev

Instructor: Stefan Savev LECTURE 2 What is indexing? Indexing is the process of extracting features (such as word counts) from the documents (in other words: preprocessing the documents). The process ends with putting the information

More information

Java Style Guide. 1.0 General. 2.0 Visual Layout. Dr Caffeine

Java Style Guide. 1.0 General. 2.0 Visual Layout. Dr Caffeine September 25, 2002 Java Style Guide Dr Caffeine This document defines the style convention the students must follow in submitting their programs. This document is a modified version of the document originally

More information

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14

Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 14 Scan Converting Lines, Circles and Ellipses Hello everybody, welcome again

More information

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

More information

Review of the syntax and use of Arduino functions, with special attention to the setup and loop functions.

Review of the syntax and use of Arduino functions, with special attention to the setup and loop functions. Living with the Lab Fall 2011 What s this void loop thing? Gerald Recktenwald v: October 31, 2011 gerry@me.pdx.edu 1 Overview This document aims to explain two kinds of loops: the loop function that is

More information

International Journal for Management Science And Technology (IJMST)

International Journal for Management Science And Technology (IJMST) Volume 4; Issue 03 Manuscript- 1 ISSN: 2320-8848 (Online) ISSN: 2321-0362 (Print) International Journal for Management Science And Technology (IJMST) GENERATION OF SOURCE CODE SUMMARY BY AUTOMATIC IDENTIFICATION

More information

Maciej Sobieraj. Lecture 1

Maciej Sobieraj. Lecture 1 Maciej Sobieraj Lecture 1 Outline 1. Introduction to computer programming 2. Advanced flow control and data aggregates Your first program First we need to define our expectations for the program. They

More information

Evolizer A Platform for Software Evolution Analysis and Research

Evolizer A Platform for Software Evolution Analysis and Research Evolizer A Platform for Software Evolution Analysis and Research Michael Würsch, Harald C. Gall University of Zurich Department of Informatics software evolution & architecture lab Friday, April 23, 200

More information

About this exam review

About this exam review Final Exam Review About this exam review I ve prepared an outline of the material covered in class May not be totally complete! Exam may ask about things that were covered in class but not in this review

More information

COMP 202 Java in one week

COMP 202 Java in one week COMP 202 Java in one week... Continued CONTENTS: Return to material from previous lecture At-home programming exercises Please Do Ask Questions It's perfectly normal not to understand everything Most of

More information

Implementing HtDP Teachpacks, Libraries, and Customized Teaching Languages

Implementing HtDP Teachpacks, Libraries, and Customized Teaching Languages Implementing HtDP Teachpacks, Libraries, and Customized Teaching Languages Version 7.0 July 27, 2018 DrRacket has two different mechanisms for making available additional functions and functionality to

More information

4. Java Project Design, Input Methods

4. Java Project Design, Input Methods 4-1 4. Java Project Design, Input Methods Review and Preview You should now be fairly comfortable with creating, compiling and running simple Java projects. In this class, we continue learning new Java

More information

Documentation Nick Parlante, 1996.Free for non-commerical use.

Documentation Nick Parlante, 1996.Free for non-commerical use. Documentation Nick Parlante, 1996.Free for non-commerical use. A program expresses an algorithm to the computer. A program is clear or "readable" if it also does a good job of communicating the algorithm

More information

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989

P Is Not Equal to NP. ScholarlyCommons. University of Pennsylvania. Jon Freeman University of Pennsylvania. October 1989 University of Pennsylvania ScholarlyCommons Technical Reports (CIS) Department of Computer & Information Science October 1989 P Is Not Equal to NP Jon Freeman University of Pennsylvania Follow this and

More information

Construction: High quality code for programming in the large

Construction: High quality code for programming in the large Construction: High quality code for programming in the large Paul Jackson School of Informatics University of Edinburgh What is high quality code? High quality code does what it is supposed to do......

More information

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Spring 2016 Howard Rosenthal

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Spring 2016 Howard Rosenthal Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Spring 2016 Howard Rosenthal Lesson Goals Understand Control Structures Understand how to control the flow of a program

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Shorthand for values: variables

Shorthand for values: variables Chapter 2 Shorthand for values: variables 2.1 Defining a variable You ve typed a lot of expressions into the computer involving pictures, but every time you need a different picture, you ve needed to find

More information

(Refer Slide Time: 01.26)

(Refer Slide Time: 01.26) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture # 22 Why Sorting? Today we are going to be looking at sorting.

More information

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1)

Topics. Java arrays. Definition. Data Structures and Information Systems Part 1: Data Structures. Lecture 3: Arrays (1) Topics Data Structures and Information Systems Part 1: Data Structures Michele Zito Lecture 3: Arrays (1) Data structure definition: arrays. Java arrays creation access Primitive types and reference types

More information

A PROGRAM IS A SEQUENCE of instructions that a computer can execute to

A PROGRAM IS A SEQUENCE of instructions that a computer can execute to A PROGRAM IS A SEQUENCE of instructions that a computer can execute to perform some task. A simple enough idea, but for the computer to make any use of the instructions, they must be written in a form

More information

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal

Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal Selec%on and Decision Structures in Java: If Statements and Switch Statements CSC 121 Fall 2016 Howard Rosenthal Lesson Goals Understand Control Structures Understand how to control the flow of a program

More information

Designing and documenting the behavior of software

Designing and documenting the behavior of software Chapter 8 Designing and documenting the behavior of software Authors: Gürcan Güleşir, Lodewijk Bergmans, Mehmet Akşit Abstract The development and maintenance of today s software systems is an increasingly

More information

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions

Dr. Amotz Bar-Noy s Compendium of Algorithms Problems. Problems, Hints, and Solutions Dr. Amotz Bar-Noy s Compendium of Algorithms Problems Problems, Hints, and Solutions Chapter 1 Searching and Sorting Problems 1 1.1 Array with One Missing 1.1.1 Problem Let A = A[1],..., A[n] be an array

More information

1007 Imperative Programming Part II

1007 Imperative Programming Part II Agenda 1007 Imperative Programming Part II We ve seen the basic ideas of sequence, iteration and selection. Now let s look at what else we need to start writing useful programs. Details now start to be

More information

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays

Introduction to Programming in C Department of Computer Science and Engineering. Lecture No. #43. Multidimensional Arrays Introduction to Programming in C Department of Computer Science and Engineering Lecture No. #43 Multidimensional Arrays In this video will look at multi-dimensional arrays. (Refer Slide Time: 00:03) In

More information

CS 553 Compiler Construction Fall 2009 Project #1 Adding doubles to MiniJava Due September 8, 2009

CS 553 Compiler Construction Fall 2009 Project #1 Adding doubles to MiniJava Due September 8, 2009 CS 553 Compiler Construction Fall 2009 Project #1 Adding doubles to MiniJava Due September 8, 2009 In this assignment you will extend the MiniJava language and compiler to enable the double data type.

More information

CS101 Introduction to Programming Languages and Compilers

CS101 Introduction to Programming Languages and Compilers CS101 Introduction to Programming Languages and Compilers In this handout we ll examine different types of programming languages and take a brief look at compilers. We ll only hit the major highlights

More information

Lecture 6 Binary Search

Lecture 6 Binary Search Lecture 6 Binary Search 15-122: Principles of Imperative Computation (Spring 2018) Frank Pfenning One of the fundamental and recurring problems in computer science is to find elements in collections, such

More information

CS 2505 Computer Organization I

CS 2505 Computer Organization I Instructions: Print your name in the space provided below. This examination is closed book and closed notes, aside from the permitted one-page formula sheet. No calculators or other computing devices may

More information

C++ Style Guide. 1.0 General. 2.0 Visual Layout. 3.0 Indentation and Whitespace

C++ Style Guide. 1.0 General. 2.0 Visual Layout. 3.0 Indentation and Whitespace C++ Style Guide 1.0 General The purpose of the style guide is not to restrict your programming, but rather to establish a consistent format for your programs. This will help you debug and maintain your

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

CS106A, Stanford Handout #30. Coding Style

CS106A, Stanford Handout #30. Coding Style CS106A, Stanford Handout #30 Fall, 2004-05 Nick Parlante Coding Style When writing paper, you can have well-crafted, correctly spelled sentences and create "A" work. Or you can hack out the text in a hurry.

More information

Math Modeling in Java: An S-I Compartment Model

Math Modeling in Java: An S-I Compartment Model 1 Math Modeling in Java: An S-I Compartment Model Basic Concepts What is a compartment model? A compartment model is one in which a population is modeled by treating its members as if they are separated

More information

6.001 Notes: Section 8.1

6.001 Notes: Section 8.1 6.001 Notes: Section 8.1 Slide 8.1.1 In this lecture we are going to introduce a new data type, specifically to deal with symbols. This may sound a bit odd, but if you step back, you may realize that everything

More information

Reviewing for the Midterm Covers chapters 1 to 5, 7 to 9. Instructor: Scott Kristjanson CMPT 125/125 SFU Burnaby, Fall 2013

Reviewing for the Midterm Covers chapters 1 to 5, 7 to 9. Instructor: Scott Kristjanson CMPT 125/125 SFU Burnaby, Fall 2013 Reviewing for the Midterm Covers chapters 1 to 5, 7 to 9 Instructor: Scott Kristjanson CMPT 125/125 SFU Burnaby, Fall 2013 2 Things to Review Review the Class Slides: Key Things to Take Away Do you understand

More information

(Refer Slide Time: 02.06)

(Refer Slide Time: 02.06) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 27 Depth First Search (DFS) Today we are going to be talking

More information

Unit-II Programming and Problem Solving (BE1/4 CSE-2)

Unit-II Programming and Problem Solving (BE1/4 CSE-2) Unit-II Programming and Problem Solving (BE1/4 CSE-2) Problem Solving: Algorithm: It is a part of the plan for the computer program. An algorithm is an effective procedure for solving a problem in a finite

More information

6.001 Notes: Section 15.1

6.001 Notes: Section 15.1 6.001 Notes: Section 15.1 Slide 15.1.1 Our goal over the next few lectures is to build an interpreter, which in a very basic sense is the ultimate in programming, since doing so will allow us to define

More information

(Refer Slide Time: 06:01)

(Refer Slide Time: 06:01) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 28 Applications of DFS Today we are going to be talking about

More information

Chapter 3. Describing Syntax and Semantics

Chapter 3. Describing Syntax and Semantics Chapter 3 Describing Syntax and Semantics Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Attribute Grammars Describing the Meanings of Programs:

More information

Course Outline. Introduction to java

Course Outline. Introduction to java Course Outline 1. Introduction to OO programming 2. Language Basics Syntax and Semantics 3. Algorithms, stepwise refinements. 4. Quiz/Assignment ( 5. Repetitions (for loops) 6. Writing simple classes 7.

More information

(Refer Slide Time: 00:02:00)

(Refer Slide Time: 00:02:00) Computer Graphics Prof. Sukhendu Das Dept. of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 18 Polyfill - Scan Conversion of a Polygon Today we will discuss the concepts

More information

Starting to Program in C++ (Basics & I/O)

Starting to Program in C++ (Basics & I/O) Copyright by Bruce A. Draper. 2017, All Rights Reserved. Starting to Program in C++ (Basics & I/O) On Tuesday of this week, we started learning C++ by example. We gave you both the Complex class code and

More information

Automatic Merging of Specification Documents in a Parallel Development Environment

Automatic Merging of Specification Documents in a Parallel Development Environment Automatic Merging of Specification Documents in a Parallel Development Environment Rickard Böttcher Linus Karnland Department of Computer Science Lund University, Faculty of Engineering December 16, 2008

More information

FROM 4D WRITE TO 4D WRITE PRO INTRODUCTION. Presented by: Achim W. Peschke

FROM 4D WRITE TO 4D WRITE PRO INTRODUCTION. Presented by: Achim W. Peschke 4 D S U M M I T 2 0 1 8 FROM 4D WRITE TO 4D WRITE PRO Presented by: Achim W. Peschke INTRODUCTION In this session we will talk to you about the new 4D Write Pro. I think in between everyone knows what

More information

Key Properties for Comparing Modeling Languages and Tools: Usability, Completeness and Scalability

Key Properties for Comparing Modeling Languages and Tools: Usability, Completeness and Scalability Key Properties for Comparing Modeling Languages and Tools: Usability, Completeness and Scalability Timothy C. Lethbridge Department of Electrical Engineering and Computer Science, University of Ottawa

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 10 Reference and Pointer Welcome to module 7 of programming in

More information

LibRCPS Manual. Robert Lemmen

LibRCPS Manual. Robert Lemmen LibRCPS Manual Robert Lemmen License librcps version 0.2, February 2008 Copyright c 2004 2008 Robert Lemmen This program is free software; you can redistribute

More information

What do Compilers Produce?

What do Compilers Produce? What do Compilers Produce? Pure Machine Code Compilers may generate code for a particular machine, not assuming any operating system or library routines. This is pure code because it includes nothing beyond

More information

NOTE: Answer ANY FOUR of the following 6 sections:

NOTE: Answer ANY FOUR of the following 6 sections: A-PDF MERGER DEMO Philadelphia University Lecturer: Dr. Nadia Y. Yousif Coordinator: Dr. Nadia Y. Yousif Internal Examiner: Dr. Raad Fadhel Examination Paper... Programming Languages Paradigms (750321)

More information

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.

Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 20 Concurrency Control Part -1 Foundations for concurrency

More information

Expressions and Data Types CSC 121 Fall 2015 Howard Rosenthal

Expressions and Data Types CSC 121 Fall 2015 Howard Rosenthal Expressions and Data Types CSC 121 Fall 2015 Howard Rosenthal Lesson Goals Understand the basic constructs of a Java Program Understand how to use basic identifiers Understand simple Java data types and

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

CMPSCI 646, Information Retrieval (Fall 2003)

CMPSCI 646, Information Retrieval (Fall 2003) CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where

More information

Pace University. Fundamental Concepts of CS121 1

Pace University. Fundamental Concepts of CS121 1 Pace University Fundamental Concepts of CS121 1 Dr. Lixin Tao http://csis.pace.edu/~lixin Computer Science Department Pace University October 12, 2005 This document complements my tutorial Introduction

More information

Software Design and Analysis for Engineers

Software Design and Analysis for Engineers Software Design and Analysis for Engineers by Dr. Lesley Shannon Email: lshannon@ensc.sfu.ca Course Website: http://www.ensc.sfu.ca/~lshannon/courses/ensc251 Simon Fraser University Slide Set: 9 Date:

More information

CE221 Programming in C++ Part 1 Introduction

CE221 Programming in C++ Part 1 Introduction CE221 Programming in C++ Part 1 Introduction 06/10/2017 CE221 Part 1 1 Module Schedule There are two lectures (Monday 13.00-13.50 and Tuesday 11.00-11.50) each week in the autumn term, and a 2-hour lab

More information

CS112 Lecture: Primitive Types, Operators, Strings

CS112 Lecture: Primitive Types, Operators, Strings CS112 Lecture: Primitive Types, Operators, Strings Last revised 1/24/06 Objectives: 1. To explain the fundamental distinction between primitive types and reference types, and to introduce the Java primitive

More information

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies

A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies A Case Study on the Similarity Between Source Code and Bug Reports Vocabularies Diego Cavalcanti 1, Dalton Guerrero 1, Jorge Figueiredo 1 1 Software Practices Laboratory (SPLab) Federal University of Campina

More information

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Programming in C++ Prof. Partha Pratim Das Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 31 Static Members Welcome to Module 16 of Programming in C++.

More information

Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015

Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples. By John B. Owen All rights reserved 2011, revised 2015 Lesson 1A - First Java Program HELLO WORLD With DEBUGGING examples By John B. Owen All rights reserved 2011, revised 2015 Table of Contents Objectives Hello World Lesson Sequence Compile Errors Lexical

More information

(Refer Slide Time: 1:43)

(Refer Slide Time: 1:43) (Refer Slide Time: 1:43) Digital Circuits and Systems Prof. S. Srinivasan Department of Electrical Engineering Indian Institute of Technology, Madras Lecture - 27 Pattern Detector So, we talked about Moore

More information

Summary: Open Questions:

Summary: Open Questions: Summary: The paper proposes an new parallelization technique, which provides dynamic runtime parallelization of loops from binary single-thread programs with minimal architectural change. The realization

More information

Nano-Lisp The Tutorial Handbook

Nano-Lisp The Tutorial Handbook Nano-Lisp The Tutorial Handbook Francis Sergeraert March 4, 2006 1 Various types of Nano-Lisp objects. There are several type notions and in this documentation only the notion of implementation type (itype

More information

Last Time. University of British Columbia CPSC 111, Intro to Computation Alan J. Hu. Readings

Last Time. University of British Columbia CPSC 111, Intro to Computation Alan J. Hu. Readings University of British Columbia CPSC 111, Intro to Computation Alan J. Hu Writing a Simple Java Program Intro to Variables Readings Your textbook is Big Java (3rd Ed). This Week s Reading: Ch 2.1-2.5, Ch

More information

Lecture 2: SML Basics

Lecture 2: SML Basics 15-150 Lecture 2: SML Basics Lecture by Dan Licata January 19, 2012 I d like to start off by talking about someone named Alfred North Whitehead. With someone named Bertrand Russell, Whitehead wrote Principia

More information

STUDENT LESSON A12 Iterations

STUDENT LESSON A12 Iterations STUDENT LESSON A12 Iterations Java Curriculum for AP Computer Science, Student Lesson A12 1 STUDENT LESSON A12 Iterations INTRODUCTION: Solving problems on a computer very often requires a repetition of

More information

Utilities (Part 3) Implementing static features

Utilities (Part 3) Implementing static features Utilities (Part 3) Implementing static features 1 Goals for Today learn about preconditions versus validation introduction to documentation introduction to testing 2 Yahtzee class so far recall our implementation

More information

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1.

CSE 413 Final Exam Spring 2011 Sample Solution. Strings of alternating 0 s and 1 s that begin and end with the same character, either 0 or 1. Question 1. (10 points) Regular expressions I. Describe the set of strings generated by each of the following regular expressions. For full credit, give a description of the sets like all sets of strings

More information

Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore

Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore Software Testing Prof. Meenakshi D Souza Department of Computer Science and Engineering International Institute of Information Technology, Bangalore Lecture 04 Software Test Automation: JUnit as an example

More information

Hardware versus software

Hardware versus software Logic 1 Hardware versus software 2 In hardware such as chip design or architecture, designs are usually proven to be correct using proof tools In software, a program is very rarely proved correct Why?

More information

Categorizing Migrations

Categorizing Migrations What to Migrate? Categorizing Migrations A version control repository contains two distinct types of data. The first type of data is the actual content of the directories and files themselves which are

More information

Guideline for the application of COSMIC-FFP for sizing Business applications Software

Guideline for the application of COSMIC-FFP for sizing Business applications Software Abstract: Guideline for the application of COSMIC-FFP for sizing Business applications Software Arlan Lesterhuis (Sogeti Nederland B.V.) arlan.lesterhuis@sogeti.nl The COSMIC-FFP functional sizing method

More information

5 The Control Structure Diagram (CSD)

5 The Control Structure Diagram (CSD) 5 The Control Structure Diagram (CSD) The Control Structure Diagram (CSD) is an algorithmic level diagram intended to improve the comprehensibility of source code by clearly depicting control constructs,

More information