Project 1: Scheme Pretty-Printer

Similar documents
Project 2: Scheme Interpreter

SCHEME 7. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. October 29, 2015

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Fall 2017 Discussion 7: October 25, 2017 Solutions. 1 Introduction. 2 Primitives

Summer 2017 Discussion 10: July 25, Introduction. 2 Primitives and Define

SCHEME 8. 1 Introduction. 2 Primitives COMPUTER SCIENCE 61A. March 23, 2017

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Intro. Scheme Basics. scm> 5 5. scm>

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Grammars and Parsing, second week

ML 4 A Lexer for OCaml s Type System

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

CS 6353 Compiler Construction Project Assignments

MP 3 A Lexer for MiniJava

MP 3 A Lexer for MiniJava

SCHEME The Scheme Interpreter. 2 Primitives COMPUTER SCIENCE 61A. October 29th, 2012

A simple syntax-directed

Spring 2018 Discussion 7: March 21, Introduction. 2 Primitives

CS 6353 Compiler Construction Project Assignments

EECS 311: Data Structures and Data Management Program 1 Assigned: 10/21/10 Checkpoint: 11/2/10; Due: 11/9/10

Lexical and Syntax Analysis

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Autumn /20/ Hal Perkins & UW CSE H-1

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

Fall 2018 Discussion 8: October 24, 2018 Solutions. 1 Introduction. 2 Primitives

Lecture 09: Data Abstraction ++ Parsing is the process of translating a sequence of characters (a string) into an abstract syntax tree.

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

CS164: Midterm I. Fall 2003

Programming Assignment I Due Thursday, October 9, 2008 at 11:59pm

The PCAT Programming Language Reference Manual

Scheme. Functional Programming. Lambda Calculus. CSC 4101: Programming Languages 1. Textbook, Sections , 13.7

Tail Calls. CMSC 330: Organization of Programming Languages. Tail Recursion. Tail Recursion (cont d) Names and Binding. Tail Recursion (cont d)

A Simple Syntax-Directed Translator

CMSC 330: Organization of Programming Languages. Formal Semantics of a Prog. Lang. Specifying Syntax, Semantics

CS 360 Programming Languages Interpreters

Left to right design 1

1. Lexical Analysis Phase

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

CSE P 501 Compilers. Implementing ASTs (in Java) Hal Perkins Winter /22/ Hal Perkins & UW CSE H-1

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

Computer Science 21b (Spring Term, 2015) Structure and Interpretation of Computer Programs. Lexical addressing

Functional Programming. Pure Functional Programming

1/16/2013. Program Structure. Language Basics. Selection/Iteration Statements. Useful Java Classes. Text/File Input and Output.

The SPL Programming Language Reference Manual

Stating the obvious, people and computers do not speak the same language.

Compiler principles, PS1

CSSE 304 Assignment #13 (interpreter milestone #1) Updated for Fall, 2018

KU Compilerbau - Programming Assignment

Programming Assignment I Due Thursday, October 7, 2010 at 11:59pm

Peace cannot be kept by force; it can only be achieved by understanding. Albert Einstein

Project 2: Scheme Lexer and Parser

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

ECS 142 Spring Project (part 2): Lexical and Syntactic Analysis Due Date: April 22, 2011: 11:59 PM.

LECTURE 7. Lex and Intro to Parsing

Pace University. Fundamental Concepts of CS121 1

CS 11 Ocaml track: lecture 6

Scheme Quick Reference

CS143 Handout 05 Summer 2011 June 22, 2011 Programming Project 1: Lexical Analysis

Why do we need an interpreter? SICP Interpretation part 1. Role of each part of the interpreter. 1. Arithmetic calculator.

Semantic Analysis. Lecture 9. February 7, 2018

An ANTLR Grammar for Esterel

CMSC 201 Fall 2016 Lab 09 Advanced Debugging

Programming Assignment III

String Computation Program

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

Project 5 Due 11:59:59pm Wed, Nov 25, 2015 (no late submissions)

Project 4 Due 11:59:59pm Thu, May 10, 2012

CS 415 Midterm Exam Fall 2003

CS 415 Midterm Exam Spring SOLUTION

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

9. The Compiler I Syntax Analysis 1

ASTs, Objective CAML, and Ocamlyacc

Recognizer/Parser Module

CSE 401 Midterm Exam Sample Solution 2/11/15

2.8. Decision Making: Equality and Relational Operators

CS 330 Homework Comma-Separated Expression

CSc 453 Compilers and Systems Software

Syntax and Grammars 1 / 21

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

Programming Assignment II

Briefly describe the purpose of the lexical and syntax analysis phases in a compiler.

Java How to Program, 10/e. Copyright by Pearson Education, Inc. All Rights Reserved.

Principles of Programming Languages 2017W, Functional Programming

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Defining syntax using CFGs

Introduction to Scheme

SFU CMPT 379 Compilers Spring 2018 Milestone 1. Milestone due Friday, January 26, by 11:59 pm.

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

Documentation for LISP in BASIC

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Case Study: Undefined Variables

The syntax and semantics of Beginning Student

The syntax and semantics of Beginning Student

Implementing Programming Languages

Transcription:

Project 1: Scheme Pretty-Printer CSC 4101, Fall 2017 Due: 7 October 2017 For this programming assignment, you will implement a pretty-printer for a subset of Scheme in either C++ or Java. The code should be developed in a team of two using pair programming. Your pretty-printer will read a Scheme program from standard input, parse it, and print it back out onto standard output. The output program will be indented in a uniform style. (It won t really be all that pretty.) You should use an object-oriented programming style using inheritance and virtual functions where appropriate. Structure of the Pretty-Printer Like any tool that processes text, the pretty-printer needs to parse the input first and store it in an internal data structure, before it can print it in some indented style. The pretty-printer consists of the following parts: 1. a lexical analyzer that splits the input text into tokens, 2. a recursive-descent parser that analyses the structure of the input program and builds a parse tree, 3. a parse-tree traversal that pretty-prints the input program. Lexical Analysis In lexical analysis, the input file is broken into individual words or symbols. These words or symbols are then represented as tokens and passed to the parser. Your lexical analyzer needs to read ASCII characters from standard input and do the following: 1. discard white space (space, tab, newline, carriage-return, and form-feed characters), 2. discard comments (everything from a semicolon to the end of the line), 3. recognize quotes, dots, and open and closing parentheses, 4. recognize the boolean constants #t and #f, 5. recognize integer constants (for simplicity only decimal digits without a sign), 6. recognize string constants (anything between double quotes), 7. recognize identifiers. For the precise definition of identifiers, see the Revised 5 Scheme Report and follow that specification: http://people.csail.mit.edu/jaffer/r5rs_4.html 1

http://people.csail.mit.edu/jaffer/r5rs_9.html Scheme does not distinguish between uppercase and lowercase characters in identifiers. It is, therefore, easiest for later parts of the pretty printer or the interpreter to convert all letters to lowercase. The typical structure of a lexical analyzer is to write a function getnexttoken(), that reads a character at a time from the input and returns the next token that it finds. Use a program structure like this (in C# syntax): enum TokenType { QUOTE, LPAREN, RPAREN, DOT, TRUE, FALSE, INT, STRING, IDENT class Token { TokenType tt; class IntToken : Token { private int intval; class StringToken : Token { private string stringval; class IdentToken : Token { private string name; class Scanner { public Token getnexttoken(); If a special character or a boolean constant is recognized in the input, the method getnexttoken() returns an object of class Token with the appropriate token type set. If an integer constant, string constant, or identifier is recognized, the method getnexttoken() returns an object of class IntToken, StringToken, and IdentToken, respectively. These tokens contain the integer value, string value, and the name of an identifier, respectively, so that the values are available to the printing routine. Parser The parser gets tokens from the scanner and analyzes the syntactic structure of the token stream. Since Scheme has a very simple grammar, parsing using a recursive-descent parser is not difficult. The subset of Scheme that we will be working with for testing and for the interpreter in Project 2 is defined by the following grammar: 2

exp -> () #f #t exp integer_constant string_constant identifier ( list ) list -> quote exp lambda ( [ parm ] ) exp+ lambda identifier exp+ begin exp+ if exp exp [ exp ] let ( bind* ) exp+ cond case+ define def set! identifier exp exp+ [. exp ] case -> ( test exp* ) test -> else exp bind -> ( identifier exp ) def -> identifier exp ( parm ) exp+ parm -> identifier+ [. identifier ] For details of the syntax of Scheme and the meaning of these constructs, you can refer to the Revised 5 Scheme Report, http://people.csail.mit.edu/jaffer/r5rs_9.html but just for parsing, you don t need to worry about the meaning of these constructs yet. Since Scheme allows constructing code as data, we need to make the parser more general so that it doesn t complain about incorrect code inside a list. We, therefore, need two parsing steps. The recursive descent parser for Project 1 will only need to recognize the much simpler language exp -> ( rest #f #t exp integer_constant string_constant identifier rest -> ) exp+ [. exp ] ) 3

and then build a parse tree. The second parse step, were we check the detailed syntax of the special forms will happen in the form of error checks in the interpreter we ll write for Project 2. When the main program requests an expression to be parsed, the parser needs to make parsing decisions without lookahead. This grammar is designed so you can easily build a recursive descent parser without lookahead. Inside the parser, for parsing rest, you will need a token of lookahead to make parsing decisions. For the recursive descent parser, define a class Parser as follows: class Parser { public Parser(Scanner s) {... public Node parseexp() {... where Node is the root of the parse tree node class hierarchy. The pretty printer is simple enough that you could print the input program to the output during parsing, i.e., without constructing a parse tree. However, we will extend the pretty-printer into an interpreter in Project 2. For the interpreter we will need the parse tree. These parse trees will be our internal list representation. We will later use the pretty-printer in the interpreter as our printing routine for printing the result of interpreting an expression (which will be a parse tree again). Parse Tree For Scheme, the parse trees are really just regular Scheme lists. However, since we are not programming in Scheme, we need to implement the list data structure. The typical way to implement such a data structure in an object-oriented language is as a class hierarchy: class Node {... class Ident : Node {... class BoolLit : Node {... class IntLit : Node {... class StringLit : Node {... class Nil : Node {... class Cons : Node {... where the class Node is an abstract class. Build your parse tree such that only a single object of class Nil and only two objects of class BoolLit get created. E.g., multiple occurrences of Nil should be pointers pointing to the same object. This will simplify the implementation of some of the built-in functions in Project 2. To make the code for cons cells more modular and more object-oriented, we will further specialize the data structure by factoring out the printing code (and later the evaluation code) that is specific to a special form. This results in the following class hierarchy: class Special {... class Quote : Special {... class Lambda : Special {... class Begin : Special {... class If : Special {... class Let : Special {... class Cond : Special {... 4

class Define : Special {... class Set : Special {... class Regular : Special {... A cons cell (an object of class Cons) will then contain an object of class Special. When constructing a cons cell, the constructor of class Cons will parse the list, build an object of the appropriate subclass of Special, and keep a pointer to that object in the cons cell. Any code that is in common between all special forms can be kept in class Special. Any code for regular function applications or lists as data structures will be in class Regular. These data structures are designed using the following Design Patterns. The Node class hierarchy is an instance of the Interpreter Pattern. The classes Nil and BoolLit are instances of the Singleton Pattern. And the Special hierarchy is an instance of the Strategy Pattern. In a production-quality interpreter, we would further implement class Ident using the Flyweight Pattern and the print() methods using the Visitor Pattern, but for our purposes that would add too much complexity. The printing code for the Token data structure is not implemented in an object-oriented style to get you started faster. Pretty Printing Print the output according to the following rules: constants and identifiers are printed directly, Cons expressions that are not special forms (i.e., regular lists) as well as the set! special form and the variable definition syntax of define are printed in the style: (+ 2 3) (define x 0) (set! x (+ 2 3)) The elements of a list are separated by a single space. a quoted expression is printed with a quote character followed by the printed representation of its argument, e.g., x or (1 2 3) Quoted special forms are printed as regular lists. the special forms begin, let, and cond are printed with the keyword immediately following the left parenthesis and with subsequent lines indented by four spaces each, e.g., (begin (set! x 6) (set! y 7) (* x y) ) Subexpressions of special forms, are printed as regular lists. the special forms if and lambda, as well as the function definition syntax of define are printed with the first two list elements on the same line and subsequent lines indented by four spaces each: 5

(define (fac n) (if (= n 0) 1 (* n (fac (- n 1))) ) ) Ok, this printing style isn t exactly pretty, but it s reasonably easy to generate. A better pretty-printer would get a lot more complicated. The object-oriented style of structuring the pretty-printing code is to include a virtual method print() in each class of the parse tree node hierarchy as well as in each class of the Special hierarchy. I suggest you pass the current position on the output line as argument to print(). Also, for printing lists, you will need a boolean parameter indicating whether the opening parenthesis has been printed already or not. You will likely need additional information for getting all the printing styles right. For any expressions for which more than one of the above printing rules applies, the printing style is undefined. This means you can decide how to format it. Administrative Stuff Program this project in groups of two. I strong suggest that you work in a Pair Programming style, i.e., that you always sit in front of the screen together and that one of you types. Turn in a directory containing all the files needed to build your program as well as a README.txt file. If you are programming in C++, include a Makefile that produces an executable called spp. If you are programming in Java, make sure the main class is called Main and is in the default package (i.e., not inside a subdirectory). Put your files into the directory /prog1 in your cs4101xx account and submit using /classes/cs4101/cs4101_bau/bin/p_copy 1 (or using the command make submit ). Do not submit executables, object files, or byte code files. In the README.txt file, briefly explain how you designed your program and what works and what doesn t. If there are problems with your submission, include any information that would make it easier for the grader to give you partial credit. Make sure that you mention who the group members are. You can find skeleton code in the directory cs4101_bau/pub/ as well as on the course web page. There is also a reference implementation that was written in C#. You can run it using cat./spp.exe The hardest parts of this project are understanding the data structures and understanding object-oriented programming. To make this process easier, look at the scanner and token data structure first and implement the scanner. Later add the parse tree data structure and write the parser and the pretty printer in parallel for one language construct at a time. 6