Lisp: Lab Information Donald F. Ross
General Model program source text stream What you need to write lexical analysis lexemes tokens syntax analysis is_id, is_number etc. + grammar rules + symbol table fns true or false 2
What can we re use from the earlier labs? Conceptual model parser <grammar rules> get_token match lookahead Lisp program clichés conditional expression (if / cond) recursion (tail recursion) (no iteration!!!) no (or few) variables 3
What code is provided? get lex (state) returns (c lexeme) list extracts a lexeme (from input stream (ip)) get token (state) state (the parse descriptor) is updated map lexeme (lexeme) ;; partial code! returns a list containing a (token lexeme) create parser state (stream) constructor for structure pstate pstate(stream, lookahead, nextchar, status, symtab) init to (ip, ( ), #\Space, OK, ( ) ) 4
What code is provided? match (state symbol) Compare symbol (expected token) with the token in state (token stream input) parser driver function for the system requires the i/p file name checks to see if this is a program i.e. according to the grammar program is S G = (S, P, NT, T) (the start symbol) (S start symbol) 5
What code is required? is id (lexeme), is number (lexeme), test lexeme predicates may require some help functions (you write these!) THINK ABOUT WHAT IS REQUIRED!!! the input is a string a string may be thought of as a list of characters (check the character and string handling functions in Lisp + some other functions Hint: for each; for all; (check suitable functions)) 6
What code is required? map lexeme must be completed lexeme to token conversion The code for the grammar is required program has been provided as an example parser is the driver program you may think about modifications here you may want to think about how you are going to run the test suite 7
What to think about The grammar code should be simple Ts are handled using match Ts match NTs require a function (applied to state) NTs function conditional expressions are required (tail) recursion must (!) be used you may need to define some help predicates your code should be easy to upgrade use your knowledge from labs 1 & 2!!! Ts = Terminal Symbols NTs = Non Terminal Symbols 8
The code: defstruct (define a structure) (defstruct pstate (stream) ;; input stream (ip) param (lookahead) ;; (token lexeme) a list (nextchar) ;; next char after lexeme a char (status) ;; parse status OK / NOTOK symbol (symtab) ;; the symbol table a list ) This is a STATE DESCRIPTOR since the parser process is actually a state process. 9
The code: defstruct (define a structure) Defines a structure with fields Creates a constructor (make pstate pstate (structure name) make pstate :stream ip ;; input stream :lookahead () ;; empty list :nextchar #\Space ;; space char :status 'OK ;; symbol :symtab () ) ;; empty list readers for the fields (pstate stream state) (pstate lookahead etc. writer: (setf (pstate stream state) x) ;; state is an instance writer: (setf (pstate lookahead state) x) ;; of structure pstate a predicate (pstate p x) Test an object to see if it is of the defstruct defined data type 10
The code: parser (defunparse (filename) (format t "~% ") (format t "~% Parsing program: ~S " filename) (format t "~% ~%") (with open file (ip (open filename) :direction :input) (setf state (create parser state ip)) ;; constructor + writer (setf (pstate nextchar state) (read char ip nil 'EOF)) ;; writer (get token state) ;; get first token (program state) ;; parse the program (check end state) (symtab display state) ;; display symbol table ) (if (eq (pstate status state) 'OK) ;; reader (format t "Parse Successful. ~%") (format t "Parse Fail. ~%") ) (format t " ~%") ) ;; check for extra symbols 11
The code: parser Parameters: filename Write message to screen (standard output) Open the file for read using the filename If the input program is legal (program), output Parse Successful else output Parse Fail NB the parser descriptor is initialised (see below) (defun create parser state (ip) (make pstate :stream ip :lookahead () :nextchar #\Space :status 'OK ) ) :symtab () () = empty list 12
The code: get token (defun get token (state) (let ((result (get lex state))) ;; (c lexeme) (setf (pstate nextchar state) (first result)) (setf (pstate lookahead state) ;; (token lexeme) (map lexeme (second result))) ) ) ;; return value is (token lexeme) get lex returns a list result is set to (c lexeme) ;; a list state nextchar is set to c ;; next character AFTER lexeme state lookahead is set to map lexeme applied to lexeme to give a (token lexeme) list ;; i.e. lexeme (token lexeme) 13
The code: map lexeme (defun map lexeme (lexeme) (format t Symbol: ~S ~%" lexeme) (list (cond ((string= lexeme "program ) 'PROGRAM) ((string= lexeme "var ) 'VAR) ;... ((string= lexeme "( ) 'LP) ((string= lexeme ") ) 'RP) ((string= lexeme " ) 'EOF); NB! ; ((is id lexeme ) 'ID) ; ((is number lexeme ) 'NUM) (t 'UNKNOWN) ) lexeme) ) ; NB the result is a list: (token lexeme) e.g. (VAR var ) 14
The code: map lexeme Parameters: lexeme (a string) Write out the lexeme to the screen Return a list (of two elements) with the token and the corresponding lexeme e.g. (PROGRAM program ) Using string= to compare the lexeme with a pattern keywords ( program, var, ) symbols ( (, ), ) is_id (a predicate for identifiers) alphanumeric strings beginning with an alpha (a predicate for numbers) numeric strings is_number You have to write is_id and is_number 15
The code: match (defun match (state symbol) (if (eq symbol (token state)) (get token state) ;; get next token (synerr1 state symbol) ;; error message ) ) NB: identify the reader here 16
How to test Think about what you want to do FIRST! Stepwise development Top down Slowly (festina lente!) Add 1 construct at a time program header; var part; stat part Test a correct Pascal program first (test case #1) Run a clisp window + editor window Decide on which error conditions you can test + corresponding error messages Test each error condition separately as you add the code Decide how you are going to run the test suite 17
Parser summary Use what you know from lab 1 and lab 2 NB: as in Prolog, the character after the lexeme must be kept Recall that parsing is a linear process (reading the ip stream) Reader + Lexer: read lexemes & return (c lexeme) list pstate defines a descriptor for the parser state at each stage is initialised by make pstate & updated by parse (ip, nextchar), get token (lookahead, nextchar), error functions (status) and symbol table functions (symtab) Look for a constructor, readers & writers, a predicate a constructor: make pstate a reader: (pstate <fieldname> state) a writer: (setf (pstate <fieldname> state) x) a predicate: (pstate p x) 18
Parser functional description Reader & Lexer get lex: state char x lexeme Help functions ctos: char string str con: string x char string whitespace: character Boolean get name: ip x lexeme x char char x lexeme get number: ip x lexeme x char char x lexeme get symbol: ip x lexeme x char char x lexeme x is the cross product recall that the last character read must be kept and passed forward (get lex state) returns a pair (char lexeme) 19
Parser state descriptor (state) state (a record) lookahead (token lexeme) stream ip the input stream pointer nextchar the last char read (after the token/lexeme) status the parse status OK/ not OK symtab the Symbol Table state describes the current state of the parser lookahead has now become a pair of values (token lexeme) 20
Parser (the driver: parse) parse: filename Boolean Description Print a header Open the input file & read the first character Set state to the input stream (ip) and first character Get the first token ( (get token state) (get lex state) ) Call program (start of parse (program state) ) Check for extra characters after the program text Print the Symbol Table (symtab display state) Print the parse result: Parse Successful / Parse Fail Print a footer 21
Parser help functions get token: state state lookahead is set to (token lexeme) match: state x symbol state if token = symbol (get token state) else error message application (match state BEGIN) map lexeme: lexeme (token lexeme) return a (token lexeme) pair from a lexeme grammar functions for non terminals example (defun stat part (state) (match state 'BEGIN) (stat list state) (match state 'END) (match state 'FSTOP) ) 22