EECS 311: Data Structures and Data Management Program 1 Assigned: 10/21/10 Checkpoint: 11/2/10; Due: 11/9/10

Similar documents
EECS 311: Data Structures and Data Management Program 2 Assigned: 11/16/10 Checkpoint: 11/23/10; Due: 12/03/10

Project 1: Scheme Pretty-Printer

Project 1: Implementation of the Stack ADT and Its Application

CS 211 Programming Practicum Spring 2018

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CS 211 Programming Practicum Spring 2017

CS 330 Homework Comma-Separated Expression

Project 2: Scheme Lexer and Parser

CS 211 Programming Practicum Fall 2018

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Lab 7 1 Due Thu., 6 Apr. 2017

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

CS 2604 Minor Project 1 DRAFT Fall 2000

CSCI 204 Introduction to Computer Science II. Lab 6: Stack ADT

COMP 250 Winter stacks Feb. 2, 2016

Stacks. stacks of dishes or trays in a cafeteria. Last In First Out discipline (LIFO)

Assignment 5. Introduction

Parsing Combinators: Introduction & Tutorial

Assignment 4 Publication date: 27/12/2015 Submission date: 17/01/ :59 People in-charge: R. Mairon and Y. Twitto

What s different about Factor?

Programming Assignment III

This chapter is intended to take you through the basic steps of using the Visual Basic

Stacks. Chapter 5. Copyright 2012 by Pearson Education, Inc. All rights reserved

CMPSCI 187 / Spring 2015 Postfix Expression Evaluator

Parser Tools: lex and yacc-style Parsing

CS 6353 Compiler Construction Project Assignments

CS 6353 Compiler Construction Project Assignments

CS 2604 Minor Project 1 Summer 2000

CSE 12 Abstract Syntax Trees

P2: Collaborations. CSE 335, Spring 2009

CSCI 200 Lab 4 Evaluating infix arithmetic expressions

Parser Tools: lex and yacc-style Parsing

Stacks. Chapter 5. Copyright 2012 by Pearson Education, Inc. All rights reserved

This is an individual assignment and carries 100% of the final CPS 1000 grade.

CS2112 Fall Assignment 4 Parsing and Fault Injection. Due: March 18, 2014 Overview draft due: March 14, 2014

A LISP Interpreter in ML

CS 342 Software Design Spring 2018 Term Project Part III Saving and Restoring Exams and Exam Components

Homework Assignment 3

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

Postfix Notation is a notation in which the operator follows its operands in the expression (e.g ).

2.2 Syntax Definition

COMP 110 Project 1 Programming Project Warm-Up Exercise

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Important Project Dates

CS Introduction to Data Structures How to Parse Arithmetic Expressions

Today s Topics. Last Time Top-down parsers - predictive parsing, backtracking, recursive descent, LL parsers, relation to S/SL

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

-The Hacker's Dictionary. Friedrich L. Bauer German computer scientist who proposed "stack method of expression evaluation" in 1955.

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Lab 1: Introduction to Java

CS61B, Fall 2014 Project #1 (revision 4) P. N. Hilfinger

LL Parsing, LR Parsing, Complexity, and Automata

infix expressions (review)

CMPSCI 187 / Spring 2015 Sorting Kata

CS 164 Programming Languages and Compilers Handout 9. Midterm I Solution

15-122: Principles of Imperative Computation, Spring Due: Thursday, March 10, 2016 by 22:00

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

SimpleCalc. which can be entered into a TI calculator, like the one on the right, like this:

EC8393FUNDAMENTALS OF DATA STRUCTURES IN C Unit 3

C311 Lab #3 Representation Independence: Representation Independent Interpreters

Eat (we provide) link. Eater. Goal: Eater(Self) == Self()

VARIABLES. Aim Understanding how computer programs store values, and how they are accessed and used in computer programs.

CS 342 Software Design Spring 2018 Term Project Part II Development of Question, Answer, and Exam Classes

Do not write in this area EC TOTAL. Maximum possible points:

COMP 105 Homework: Type Systems

CS02b Project 2 String compression with Huffman trees

ASTs, Objective CAML, and Ocamlyacc

A simple syntax-directed

Lexical Analysis. Chapter 2

[CS302-Data Structures] Homework 2: Stacks

, has the form T i1i 2 i m. = κ i1i 2 i m. x i1. 1 xi2 2 xim m (2)

Honu. Version November 6, 2010

Stack Applications. Lecture 27 Sections Robb T. Koether. Hampden-Sydney College. Wed, Mar 29, 2017

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

March 13/2003 Jayakanth Srinivasan,

CS131 Compilers: Programming Assignment 2 Due Tuesday, April 4, 2017 at 11:59pm

3.5 Practical Issues PRACTICAL ISSUES Error Recovery

Parser Design. Neil Mitchell. June 25, 2004

CS2110 Assignment 2 Lists, Induction, Recursion and Parsing, Summer

Cmpt 101 Lab 1 - Outline

ENCE 3241 Data Lab. 60 points Due February 19, 2010, by 11:59 PM

CS 1044 Program 6 Summer I dimension ??????

Parsing Text Input CS 005. This document is aimed at increasing your understanding of parsing text input, as well as informing you of

CIT 590 Homework 5 HTML Resumes

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS

CS 11 Ocaml track: lecture 6

CS3114 (Fall 2013) PROGRAMMING ASSIGNMENT #2 Due Tuesday, October 11:00 PM for 100 points Due Monday, October 11:00 PM for 10 point bonus

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Discussion 2C Notes (Week 3, January 21) TA: Brian Choi Section Webpage:

COMPILER DESIGN. For COMPUTER SCIENCE

Syntax Analysis MIF08. Laure Gonnord

8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

Question Points Score

CSE 413 Final Exam. June 7, 2011

Chapter 4 Defining Classes I

CS2113 Lab: Collections 10/29/2018

Disclaimer. A Bit About Forth. No Syntax. Outline. I don't know Forth Typical Forth tutorial. Dave Eckhardt

Lab 6: P.S. it s a stack Due 11:59 pm Monday, Mar 31, 2008

Transcription:

EECS 311: Data Structures and Data Management Program 1 Assigned: 10/21/10 Checkpoint: 11/2/10; Due: 11/9/10 1 Project: Scheme Parser. In many respects, the ultimate program is an interpreter. Why? Because given the right input (i.e., program) it can do anything. Any application that allows the user to define user defined macros to manipulate the application has a built in interpreter. In fact, building an interpreter into an application is a common way to make the application extremely extensible. Popular examples of these applications include text editors such as emacs, spreadsheet programs such as excel, and web browsers (e.g., using AppleScript). While you will not write a scheme interpreter in this project, you will get pretty close. Your first project is to write a scheme parser. 1.1 Scheme Review. Scheme is a popular programming language that is especially easy to interpret. One of the reasons for this is that scheme data types look just like scheme programs. Scheme supports standard data objects such as strings and numbers as well as lists of data objects (note the recursive definition). Consider: (+ 1 (* 2 3)). This is a list with three elements. The first element is the symbol +. The second element is the number 1. The third element is the three element list (* 2 3). Note that this scheme data looks exactly like a scheme program, the one that adds 1 to the product of 2 and 3. Let us take a more detailed look at scheme lists. The building blocks of scheme lists are the emptylist, represented as (), and the pair a b, represented as (a. b). A list is either an emptylist or it is a pair where the second field is a list (note the recursive definition). Thus, (+. (1. (2. ()))) is a list; though, it is a bit hard to read. Since lists are so commonly used in scheme a special list syntax has been adopted, e.g., (+ 1 2) instead of (+. (1. (2. ()))). Obviously the special list syntax is easier to write and read for a human, but the pair syntax is closer to what is going on internally in a scheme data structure. The list and pair syntaxes can be combined in arbitrary ways, e.g, (a b. (c d)) is equivalent to (a b c d). 1.2 Scheme Objects in Java. To represent scheme data you will create a class hierarchy of scheme objects. The parent class is SchemeObject. Subclasses you will implement are SchemePair(a,b), where member fields a and b are SchemeObjects themselves; SchemeNull, which both represents an emptylist as well as the end of a non-empty list; and SchemeSymbol(x), where member field x is a string. The very first step of writing an interpreter is parsing. That is, reading from input the string (+ (- 1) (* 2 3)) and turning it into a SchemeObject; in particular, the following list of lists: 1

+ / - 1 / * 2 3 / For this programming assignment you are to write such a parser. We will divide this task into two steps, tokenizing and parsing. 1.3 Tokenizing. The first step (commonly called tokenizing ) is turning a sequence of characters into a sequence of tokens. For instance, punctuation such as the "(", ")", and "." characters are special tokens. Symbols such as "+", "car", and "25" are tokens. 1 As an example the sequence of characters ")hello 25. ( (" should be converted into a sequence of tokens: CloseParen, TokenSymbol("hello"), TokenSymbol("25"), Period, OpenParen, OpenParen. Notice that whitespace helps delineate symbols, but is not itself a token. Also notice, that this particular sequence of tokens really does not correspond to a well-formed scheme object. Tokenizing can be quite complicated; fortunately, the subset of scheme we are parsing is quite simple. To aid in this task, java.io.bufferedreader provides a readline() method that returns a String object. The String class provides a split() method. For instance if String s = ")hello 25. ( (" then s.split("\\s+") returns the five element array[")hello", "25", ".", "(", "("]. While this does not quite do the trick because it does not separate the string ")hello", it is a good start. 1.4 Parsing. The second step (commonly called parsing ) is to convert the sequence of tokens into a sequence of SchemeObjects. Here the most important task is to match up parentheses. This is actually very easy using a stack. The approach we outline here is for shift-reduce parsing. Shift-reduce parsing simply repeats the following steps: 1. While the top of the stack matches any of a set of patterns, reduce (i.e., simplify). 2. Shift (i.e., push) a new token (from the input) onto the top of the stack. 3. Repeat. Our stack needs to hold both Tokens and SchemeObjects. While shift only pushes new tokens onto the stack, reduce will replace combinations of Tokens and SchemeObjects with SchemeObjects that combine them. A natural approach would be to see if the top of the stack looks like...,openparen,schemeobject 1,..., SchemeObject k, CloseParen. and then reduce it to the appropriate scheme list object; however, this requires looking at k+2 stack elements at once. In general, it would be costly (in runtime) to pattern match large numbers of stack elements at once; instead you should use a reduce procedure that only ever operates with the few elements at the very top of the stack. The idea behind this 1 Real scheme interpreters treat numbers and other symbols differently. For simplicity, we will treat numbers as symbols. 2

reduce technique is a syntactical equivalence, e.g., between (+ 1 2) and (+. (1. (2. ()))). As discussed in Section 1.1, the former is easier to read and write for a human, but the latter is closer to what is going on internally in a data structure. This motivates the transformation from what a human would like to write to what a computer can understand. To illustrate this, consider taking the expression (+ 1 2) and transforming it, starting from the tail of the list, from human readable notation to the internal data representation. Below, the internal data representation is underlined. (+ 1 2) (+ 1 2. ()) (+ 1. (2. ())) (+. (1. (2. ()))) (+. (1. (2. ()))). This suggests a reduce technique that builds up the list backwards and each reduce uses at most the top four elements in the stack. Consider, below, the patterns that reduce should process and what the stack contents should look like after the reduce. 1. Convert a symbol token to a symbol object: Before:..., TokenSymbol(x). After:..., SchemeSymbol(x). 2. Make a long list smaller by contracting the last two elements into a pair: 2 Before:..., SchemeObject a, Period, SchemeObject b, CloseParen. After:..., Period, SchemePair(SchemeOjbect a,schemeobject b ), CloseParen. 3. To finish a list simply remove the opening parenthesis, period, and closing parenthesis: 3 Before:..., OpenParen, Period, SchemeObject, CloseParen. After:..., SchemeObject. 4. If the top item on the stack is a closing parenthesis and none of the above patterns matched, then this is the end of a long list. Convert closing parenthesis to a period preceding an empty list: 4 Before:..., CloseParen. After:..., Period, SchemeNull, CloseParen. Notice that all of these patterns involve matching only the very top several elements on the stack. This is an important aspect of the shift-reduce parser! A successful parse should result in a stack containing only SchemeObjects. See Section 4 for an example. 2 This is appropriate because (a... x y. z) is equivalent to (a... x. (y. z)). 3 This is appropriate because (. x) is equivalent to x. 4 This is appropriate because (a... z) is equivalent to (a... z. ()). 3

1.5 Interpreting. If we were implementing an interpreter, the next step would be to take the output of the parser (a list of SchemeObjects) and interpret them one at a time, that is, treat them as scheme code and evaluate them. We will not complete the interpreter for this homework. Instead we will just write a routine to print out each each object (recursively). 2 Tasks. 1. Implement the SchemeObject class hierarchy: SchemePair(a,b), SchemeNull, and SchemeSymbol(x). 2. Implement a method on each class in the SchemeObject hierarchy that (recursively) prints out the object in proper scheme syntax, i.e., the same syntax your parser reads. Your lists should be printed out in the special list syntax, e.g., (a b c). 3. Implement the Stack ADT. You may wish to add, to the standard stack operations, the operation peek where peek(i) returns the ith element from the top of the stack. This will be useful in your shift-reduce parser. You can assume that i is a constant, i.e., it is ok for peek(i) to take O(i) time. 4. Implement the Token class hierarchy: CloseParen, OpenParen, Period, and TokenSymbol(x). 5. Implement a class that allows tokens to be read from an input stream. For instance the constructor for such a class might take an input stream as input (such as System.in). The class might provide a method nexttoken() that returns the next token that is on the input stream. Repeated calls to nexttoken() would return the sequence of tokens on the input stream. 6. Implement the shift-reduce parser as discussed above. For instance you might implement a class with a constructor that takes your tokenizing class from Task 5. The class might provide a method nextobject() operates the shift-reduce parcer, calling nexttoken() needed, until there is a single SchemeObject on the stack. The routine should then pop this object off the stack and return it. 7. Implement an interactive program that reads in scheme expressions and prints them out. Each expression read in should be printed on a newline. In the example below the input is underlined: parse> 1 (2) hello () 1 (2) hello () parse> (* (+ 2 3 ) (- 4 5)) (* (+ 2 3) (- 4 5)) parse> ( hello world ) (hello world) parse> 4

This example demonstrates how whitespace in input can be arbitrary, while the whitespace in the output is always in a standard form. 3 Logistics. This assignment will be graded out of a total of 40 points. This program will be completed in two parts. The first part is due on Thursday, 10/28/10 (at midnight) and will be graded out of 20 points. The second part is due on 11/04/10 (at midnight) and will be graded out of 20 points. If you fix any problems with the first part when you turn in the second part you can regain up to half the points lost. For example, you made two mistakes on the first part and lost two points each. If you fix one of the mistakes for the final submission you will get one point back. If you fix both mistakes for the final submission you will get two points back. Part 1: Complete Tasks 1-3. You should also write and turn in a test program that thoroughly verifies that all tasks for this part are properly completed. Part 2: Complete Tasks 4-7. Submitting your code. The T-Lab (Tech F252) is available for your programming needs and the TA will hold extra office hours (TBA) in the T-Lab to assist you. Your programs should compile using javac on the command line. To submit your program send the zipped source code by email to the TA. Use the subject line SUBMIT PROG1 PART1 and SUBMIT PROG1 PART2 for each respective part. Do not submit executables. If you use a makefile, include the makefile in your submission. If you compile your program directly on the command line, specify the command you use for compilation in your email. Anything more complicated should be explained in a README file. Grading guidelines. You will be graded on your ability to write efficient code. You will be graded on your ability to write reasonable Java code. You will be graded on your ability to implement the required tasks. You will be graded on your ability to implement and use data structures from class. Resources. You may consult your text book or other books on Java and data structures. You must build your own advanced data structures; you may not use data structure implementations in the Java class library. You must not copy code from anywhere. You may talk with your classmates about the project at a high level, but your implementations must be 100% original. You may consult with the instructor and TA on any aspect of the project. 4 Example: Shift-Reduce in Action Suppose we read in input"(+ 1 2)" into token sequence OpenParen, TokenSymbol("+"), TokenSymbol("1"), TokenSymbol("2"), and CloseParen. The shift-reduce parser takes the following steps. Initially the stack is empty. Changes from one step to the next are underlined. 5

0. Initially empty stack. Stack:. 1. Nothing to reduce, shift token OpenParen onto stack. Stack: OpenParen. 2. Nothing to reduce, shift token TokenSymbol("+") onto stack. Stack: OpenParen, TokenSymbol("+"). 3. Reduce using Pattern 1. Stack: OpenParen, SchemeSymbol("+"). 4. Nothing to reduce, shift token TokenSymbol("1") onto stack. Stack: OpenParen, SchemeSymbol("+"), TokenSymbol("1"). 5. Reduce using Pattern 1. Stack: OpenParen, SchemeSymbol("+"), SchemeSymbol("1"). 6. Nothing to reduce, shift token TokenSymbol("2") onto stack. Stack: OpenParen, SchemeSymbol("+"), SchemeSymbol("1"), TokenSymbol("2"). 7. Reduce using Pattern 1. Stack: OpenParen, SchemeSymbol("+"), SchemeSymbol("1"), SchemeSymbol("2"). 8. Nothing to reduce, shift token CloseParen onto stack. Stack: OpenParen, SchemeSymbol("+"), SchemeSymbol("1"), SchemeSymbol("2"), CloseParen. 9. Reduce using Pattern 4. Stack: OpenParen, SchemeSymbol("+"), SchemeSymbol("1"), SchemeSymbol("2"), Period, XXSchemeNull, CloseParen. 10. Reduce using Pattern 2. Stack: OpenParen, (SchemeSymbol "+"), (SchemeSymbol "1"), Period, XXSchemePair(SchemeSymbol("2"), SchemeNull), CloseParen. 11. Reduce using Pattern 2. Stack: OpenParen, SchemeSymbol("+"), Period, SchemePair(SchemeSymbol("1"), XXSchemePair(SchemeSymbol("2"), SchemeNull)), CloseParen. 12. Reduce using Pattern 2. Stack: OpenParen, Period, SchemePair(SchemeSymbol("+"), SchemePair(SchemeSymbol("1"), XXSchemePair(SchemeSymbol("2"), SchemeNull))), CloseParen. 13. Reduce using Pattern 3. Stack: SchemePair(SchemeSymbol("+"), SchemePair(SchemeSymbol("1"), XXSchemePair(SchemeSymbol("2"), SchemeNull))). 14. Nothing to reduce, single scheme object on stack, pop it off stack and return it. Stack:. 6