Retaining Comments when Refactoring Code or

Similar documents
Automated Unit Testing A Practitioner's and Teacher's Perspective

C++- und Ruby- und... Refactoring für Eclipse

Retaining Comments when Refactoring Code

C++1z: Concepts-Lite !!! !!! !!!! Advanced Developer Conference C slides:

CUTE GUTs for GOOD Good Unit Tests drive Good OO Design

Semantic Analysis. Lecture 9. February 7, 2018

CS 360 Programming Languages Interpreters

Outline. Introduction Concepts and terminology The case for static typing. Implementing a static type system Basic typing relations Adding context

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The role of semantic analysis in a compiler

Andy Clement, SpringSource/VMware SpringSource, A division of VMware. All rights reserved

Syntax Errors; Static Semantics

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS1 Lecture 5 Jan. 26, 2018

Project Compiler. CS031 TA Help Session November 28, 2011

CS 415 Midterm Exam Spring 2002

Dynamic Languages Toolkit. Presented by Andrey Tarantsov

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Building Compilers with Phoenix

Semantic Analysis. Compiler Architecture

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CS1 Lecture 5 Jan. 25, 2019

Chapter 4 Defining Classes I

Decaf Language Reference Manual

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

C++ Test-driven Development Unit Testing, Code Assistance and Refactoring for Testability

Introduction to Lexical Analysis

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

Introduction to Lexical Analysis

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Outline. Java Models for variables Types and type checking, type safety Interpretation vs. compilation. Reasoning about code. CSCI 2600 Spring

Introduction to Programming Using Java (98-388)

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

Lecture 16: Static Semantics Overview 1

Seminar on Languages for Scientific Computing Aachen, 6 Feb Navid Abbaszadeh.

Functional Parsing. Languages for Lunch 10/14/08 James E. Heliotis & Axel T. Schreiner

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

CSE au Final Exam Sample Solution

Course introduction. Advanced Compiler Construction Michel Schinz

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

CSE 341: Programming Languages

CMPT 379 Compilers. Parse trees

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

Functions & First Class Function Values

G Programming Languages Spring 2010 Lecture 4. Robert Grimm, New York University

Index. Comma-separated values (CSV), 30 Conditional expression, 16 Continuous integration (CI), 34 35, 41 Conway s Law, 75 CurrencyPair class,

Syntax-Directed Translation. Lecture 14

CDT C++ Refactorings

Why are there so many programming languages?

ASTs, Objective CAML, and Ocamlyacc

THOMAS LATOZA SWE 621 FALL 2018 DESIGN ECOSYSTEMS

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

The Compiler So Far. Lexical analysis Detects inputs with illegal tokens. Overview of Semantic Analysis

An Overview to Compiler Design. 2008/2/14 \course\cpeg421-08s\topic-1a.ppt 1


5. Semantic Analysis!

QUIZ. 1. Explain the meaning of the angle brackets in the declaration of v below:

Writing Evaluators MIF08. Laure Gonnord

Exercise ANTLRv4. Patryk Kiepas. March 25, 2017

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

Scoping and Type Checking

Compilers. Prerequisites

CS 3304 Comparative Languages. Lecture 1: Introduction

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

Syntax and Grammars 1 / 21

C++ Modern and Lucid C++ for Professional Programmers

Rust for high level applications. Lise Henry

Introducing Wybe a language for everyone

COMP 202 Java in one week

Python as a First Programming Language Justin Stevens Giselle Serate Davidson Academy of Nevada. March 6th, 2016

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Test Driven Development for Eclipse CDT

CS 553 Compiler Construction Fall 2007 Project #1 Adding floats to MiniJava Due August 31, 2005

Abstract Syntax Trees

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Implementing Actions

Working of the Compilers

Programs as data first-order functional language type checking

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

CMSC 330: Organization of Programming Languages

Automatic Generation of Graph Models for Model Checking

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

CS 231 Data Structures and Algorithms, Fall 2016

C++ Modern and Lucid C++ for Professional Programmers

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Technical Game Development II IMGD 4000 (D 11) 1. having agents follow pre-set actions rather than choosing them dynamically

What s new in CDT 4.0 and beyond. Doug Schaefer QNX Software Systems CDT Project Lead

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Lexical Analysis 1 / 52

CONTENTS. PART 1 Structured Programming 1. 1 Getting started 3. 2 Basic programming elements 17

Principles of Programming Languages. Lecture Outline

Eclipse and Java 8. Daniel Megert Platform and JDT Lead Eclipse PMC Member IBM Rational Zurich Research Lab

Eclipse Support for Using Eli and Teaching Programming Languages


Software Engineering Principles

Programming Languages. Streams Wrapup, Memoization, Type Systems, and Some Monty Python

Programs as data Parsing cont d; first-order functional language, type checking

(800) Toll Free (804) Fax Introduction to Java and Enterprise Java using Eclipse IDE Duration: 5 days

Object-Relational Mapping

Transcription:

Retaining Comments when Refactoring Code or Why and how we build Refactoring Eclipse plug-ins for several non-java languages Prof. Peter Sommerlad IFS Institute for Software HSR Rapperswil, Switzerland

2 Peter Sommerlad peter.sommerlad@hsr.ch Work Areas o Refactoring tools (C++, Ruby, Python, Groovy, PHP, JavaScript,...) for Eclipse o o o Decremental Development (make SW 10% its size!) modern software engineering Patterns POSA 1 and Security Patterns Background o Diplom-Informatiker Univ. Frankfurt/M o o o Siemens Corporate Research Munich itopia corporate information technology, Zurich (Partner) Professor for Software HSR Rapperswil, Head Institute for Software Credo: People create Software o communication o feedback o courage Experience through Practice o programming is a trade o Patterns encapsulate practical experience Pragmatic Programming o test-driven development o o automated development Simplicity: fight complexity

3 Outline Why Refactoring Tools o and overall challenges Challenges o Comments o Syntax Representation o (Re-)generating Code Potential Solutions o Gathering Comments o Comment Categories o Associating Comments Conclusion Outlook: Decremental Development

4 Why? Refactoring is essential to Agile development o as is test automation Working code is used longer than intended o requires more and more maintenance Pro-active maintenance will prevail o fix and patch degenerates design -> design debt Design improvement is needed o need to pay back design debt! Decremental Development o need to do more with less code

5 Tools we ve created as Eclipse Plug-ins C++ Refactoring o now part of CDT 5.0 (more to come) o plus a unit testing framework and plug-in: CUTE Ruby Refactoring o part of Aptana s IDE and plug-in for DLTK-Ruby Groovy Refactoring o actively participating in ongoing improvement Python Refactoring o parts are integrated into PyDev (Open Source) PHP and JavaScript Refactoring o student projects requiring more work to be usable

6 Reasons for Challenges Some Eclipse language plug-ins started by copying (older versions) of JDT o CDT, JavaScript (JSDT) o obsolete code, structures not fitting the language Focus on editing and navigation not Refactoring o relationship between internal representation and program code only partial Language parsers often derived from compilers o abstract syntax tree is too abstract o relationship to source code is too coarse grained

7 Political Reasons Companies supporting Open Source Eclipsebased IDEs have own commercial versions o which might provide their own Refactoring support (i.e. Zend Studio vs. PHP-Eclipse plug-in, PyDev) o CDT and many companies (WindRiver, Nokia,...) o Resources of Open Source projects need to work on commercial parts Eclipse project members focus on other areas o esp. debugging for CDT, not UT and Refactoring Lack of funding or ongoing interest o e.g., student projects get abandoned Becoming an Eclipse committer is hard

8 Refactoring and Comments Improving the design of existing code without changing its external behavior. Comments do not contribute to external behavior. Removing comments is a valid Refactoring. Loosing comments while Refactoring is not wrong. o but annoying to developers!

9 Compiler s View original code (foo.c) (1) #define mul(a,b) a * b compiler s internal representation lacks detail needed for refactoring (3) int main() { (4) /* Comment1 */ (5) return mul(3 + 4, 5); // Comment2 (6) } after preprocessing Function main statement list foo.c:3 int main() { statement return expression } return 3 + 4 * 5; foo.c:5 constant compilation Abstract Syntax Tree type int value 23

10 IDE s view of the AST (neglecting pre-processor) original code (foo.c) IDE s AST abstracts away things (1)#define mul(a,b) a * b (3)int main() { Function main start end (4) /* Comment1 */ (5) return mul(3 + 4, 5); // Comment2 (6)} statement list foo.c line: 3 col: 0 foo.c line: 6 col: 2 parser statement return expression Abstract Syntax Tree foo.c line:5 col: 5 foo.c line: 5 col: 5 constant foo.c line:5 col: 11 foo.c line: 5 col: 24 positions might be wrong type int value 23

11 Problem: AST too abstract especially in Groovy, equivalent syntax elements map to identical AST structure def a=3,b=4;println a+b*5 semantically identical Groovy code def a = 3 /* use Object instead of def */ Object b = 4 println (a + (b * 5)) DeclarationExpression(a = 3) Left Expr a (Type Object) Right Expr 3 (Type Integer) DeclarationExpression(b = 4) Left Expr b (Type Object) Right Expr 4 (Type Integer) MethodCallExpression Method println Argument BinaryExpression(a + (b * 5)) Left Expr a Operation + Right Expr BinaryExprssion(b * 5) Left Expr b Operation * Right Expr 5 Groovy AST (simplified)

Re-Generating Code When extracting code o new AST nodes o existing AST nodes in new positions with new parents Structure changes o positions can be wrong o indentation? o which syntax? How to keep the comments? int main() { } int a = 5, b = 6; int c = ++a + b++; // calculate return a + b + c; int main() { } int a = 5, b = 6; int c = calculate(a, b); return a + b + c; int calculate(int & a, int & b) { } int c = ++a + b++; return c; extract method:calculate OOPSLA - Practitioner Report 12

13 Eclipse JDT loosing comments: Most Java Refactorings work fine with comments, however, some edge cases lose comments (e.g., extract super class, dragging along a field): class CommentDemo { } //leadingcomment public static final int THE_ANSWER = 42; //trailingcomment //freestandingcomment extract interface keeps the same comments! extract super class public class CommentLost { class CommentDemo extends CommentLost { } public static final int THE_ANSWER = 42; public CommentLost() { super(); } }

14 Comment types Line comments (lead token till EOL) o whole lines o line end comments, after a statement (part) //line comment in Java/C++ # a line comment in another language... Block comments (pair of tokens, nesting?) o not all languages provide these /* a block comment */ Special cases o comments with semantic relationship to the code e.g. JavaDoc like comments

15 Problem: How to get the comments? Scanner/Parser typically skips comments, as well as whitespace (often) o No longer present in token stream (Groovy): /* * This is a whitespace comment */ println "// This is not a comment /* this too */" // but this is a line comment Need to adjust scanner/parser o recognition already present o adjust scanner to collect comments with source positions

16 Where to put the comment information Option 1: (usually doesn t work) o store them within the AST o if the compiler s AST --> performance could suffer o if the IDE s AST --> needs invasive change Option 2: (worked for us) o collect and store them separately with positions o simple, could be easily turned on and off o re-associate with AST nodes later, only when needed Option 3: o ignore all comments and throw them away (not possible now but might be a good idea ;-)

17 Comment s relative posistions: Categories Likelihood of association of comment with an AST element must be determined to move elements with the code class Foo { //leading public: Foo(); virtual ~Foo(); /*trailing*/ /* leading */ void bar(); private: }; int i; //freestanding belongs to Foo() belongs to ~Foo() belongs to bar() belongs to class Foo Example Heuristic

18 Association algorithm For each range covered by an AST node: o check if there are adjacent comments assign leading/trailing comments o recurse to sub-nodes associate them to sub-nodes o associate remaining comments Algorithm relies on comment and node positions o which need to be exact to decide proximity

19 Representing comment association: Invasive First approach was to change AST node base class to keep associate comments o often required change to language s implementation or IDE s core data structures Drawbacks o non-trivial impact on foreign code hard to judge large patches necessary o could slow down parsing in IDE or even execution in stand-alone interpreter/compiler measurements proved that o comments blow up AST s size, even when never needed for Refactoring o no guard against unexpected positions/occurences

20 minimal invasive comment association Inspired by Java s reflection capabilities, dynamic proxy objects were created, wrapping existing AST nodes and added comments and comment handling methods where needed Drawbacks: loss of object identity, overhead o visitor infrastructure could not cope with proxy nodes o no means to navigate from non-proxy AST-node to proxy object o Algorithms failed

21 Separate comment association The ultimate solution is to associate comments with their AST-nodes by using a hash map, where the AST node is the key and a collection of comment-nodes is the value Advantages o association is only populated if really needed o no change of AST data structure required Disadvantage o parser still needs to collect all comments with their positions o AST iteration to associate comments still needed

22 Conclusion Refactoring tools require a more advanced syntax representation than compilers or navigation-only IDEs o exact source positions often a problem Handling comments in a conscious way by associating them with AST elements while refactoring can provide improved developer experience o however, cutting original source sections could be sufficient in many cases Maintaining a separate hash map from AST nodes to associated comments seems to be the best way to keep the information

23 Sidelook: Comments harmful? value(comment) < 0 (Alan Kelly) comments clutter code and can be wrong o Only the code tells the truth P.S. Suggestion: o Take each comment you write as a hint for required refactoring. idea: auto-refactoring when you type // or /* o Document behavior by corresponding unit tests. o Whenever you see a comment, try to refactor the code that the comment can be deleted.

24 Outlook: Decremental Development Software is used longer than intended. We need to change and evolve existing code. Simpler and less code is easier to maintain. Write less code and write simple code! and/or Refactor towards simpler code that is smaller! Decremental Development means reduce code size to 10% while keeping (essential) functionality.

Questions? Have Fun Refactoring! see also Demo on Groovy Refactoring: o Wed 10:15-11:00 in 107 OOPSLA - Practitioner Report 25