Parser Generators. Mark Boady. August 14, 2013

Similar documents
Python Lex-Yacc. Language Tool for Python CS 550 Programming Languages. Alexander Gutierrez May 12, 2016

Parsing lexical analysis: parsing: 2.1. Lexical analysis

Intermediate Representation (IR)

A Problem Course in Compilation: From Python to x86 Assembly. Jeremy G. Siek

A Problem Course in Compilation: From Python to x86 Assembly. Jeremy G. Siek Bor-Yuh Evan Chang

Project 2 Interpreter for Snail. 2 The Snail Programming Language

A Problem Course in Compilation: From Python to x86 Assembly. Jeremy G. Siek

Syntax Analysis The Parser Generator (BYacc/J)

Abstract Syntax Trees

Introduction to Compiler Design

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

Last Time. What do we want? When do we want it? An AST. Now!

Preparing for the ACW Languages & Compilers

Today. Elements of Programming Languages. Concrete vs. abstract syntax. L Arith. Lecture 1: Abstract syntax

CS360: Mini Language. Michael Conway Original slides by Mark Boady. March 11, 2015

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Interpreter. Scanner. Parser. Tree Walker. read. request token. send token. send AST I/O. Console

ASTs, Objective CAML, and Ocamlyacc

CS 415 Midterm Exam Spring 2002

Left to right design 1

Project 1: Scheme Pretty-Printer

Lecture 12: Parser-Generating Tools

CS 11 Ocaml track: lecture 6

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

MP 3 A Lexer for MiniJava

Defining syntax using CFGs

A clarification on terminology: Recognizer: accepts or rejects strings in a language. Parser: recognizes and generates parse trees (imminent topic)

sly Documentation Release 0.0 David Beazley

Programming Language Syntax. CSE 307 Principles of Programming Languages Stony Brook University

A Simple Syntax-Directed Translator

CSE 401 Midterm Exam Sample Solution 11/4/11

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Topic 3: Syntax Analysis I

The Structure of a Syntax-Directed Compiler

LECTURE 11. Semantic Analysis and Yacc

CUP. Lecture 18 CUP User s Manual (online) Robb T. Koether. Hampden-Sydney College. Fri, Feb 27, 2015

How To Create Your Own Freaking Awesome Programming Language

CSCI 2041: Advanced Language Processing

Introduc)on. Tristan Naumann. Sahar Hasan. Steve Hehl. Brian Lu

Semantic Analysis. Lecture 9. February 7, 2018

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 2. Lexer design. Syntax Analysis: Context-Free Grammars. HW2 (out, due Tues)

CSC 306 LEXICAL ANALYSIS PROJECT SPRING 2011

Syntax Analysis MIF08. Laure Gonnord

ffl assign The assign has the form: identifier = expression; For example, this is a valid assign : var1 = 20-3*2; In the assign the identifier gets th

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSE 413 Programming Languages & Implementation. Hal Perkins Winter 2019 Grammars, Scanners & Regular Expressions

COP 3402 Systems Software Syntax Analysis (Parser)

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Semantic Analysis Attribute Grammars

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Exercise ANTLRv4. Patryk Kiepas. March 25, 2017

CS 536 Midterm Exam Spring 2013

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Chapter 27 Putting it all together. Practical C++ Programming Copyright 2003 O'Reilly and Associates Page1

ffl assign : The assign has the form: identifier = expression ; For example, this is a valid assign : var1 = 20-3*2 ; In the assign the identifier get

Compiler Construction D7011E

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Simple Lexical Analyzer

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Lab 2. Lexing and Parsing with ANTLR4

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

Lexical Analysis. Chapter 2

I/O and Parsing Tutorial

LECTURE 3. Compiler Phases

This class is about understanding how programs work

Compilers. Bottom-up Parsing. (original slides by Sam

Lab 2 Tutorial (An Informative Guide)

Lexical Analysis. Lecture 2-4

The Structure of a Syntax-Directed Compiler

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CSE 401/M501 Compilers

Compilers. Compiler Construction Tutorial The Front-end

Compiler Lab. Introduction to tools Lex and Yacc

Optimizing Finite Automata

CSc 453 Compilers and Systems Software

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Syntax and Grammars 1 / 21

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

An ANTLR Grammar for Esterel

Using an LALR(1) Parser Generator

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Software II: Principles of Programming Languages

Properties of Regular Expressions and Finite Automata

Syntax and Parsing COMS W4115. Prof. Stephen A. Edwards Fall 2003 Columbia University Department of Computer Science

Lexical Analysis. Lecture 3. January 10, 2018

CSCI312 Principles of Programming Languages!

Compiler Design (40-414)

Compilation 2014 Warm-up project

Abstract Syntax. Mooly Sagiv. html://

A program that performs lexical analysis may be termed a lexer, tokenizer, or scanner, though scanner is also a term for the first stage of a lexer.

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CMSC 330: Organization of Programming Languages. Context Free Grammars

Introduction to Lexical Analysis

Over-simplified history of programming languages

Transcription:

Parser Generators Mark Boady August 14, 2013

nterpreters We have seen many interpreters for different programming languages Every programming language has a grammar Math using integers, addition, and multiplication is a simple grammar We will use a parser generator to automatically generate a parser based on a grammar n PA3, we will see a full language implemented with a parser generator

Context Free Grammar What is a grammar for math with integers, addition, and multiplication expr -> term + expr term term -> factor * term factor factor -> NUMBER ( expr ) NUMBER -> [0-9]+ How do we parse an expression? Example: 5+7*9 expr -> term + expr (term=5, expr=7*9) Path 1: term=5 term -> factor (factor=5) factor -> NUMBER (number=5)

Context Free Grammar Grammar expr -> term + expr term term -> factor * term factor factor -> NUMBER ( expr ) NUMBER -> [0-9]+ Example: 5+7*9 expr -> term + expr (term=5, expr=7*9) Path 2: expr=7*9 expr -> term (term=7*9) term -> factor * term (factor=7, term=9) factor -> NUMBER (number=7) term -> factor (factor 9) factor -> NUMBER (number=9)

Parser Generator Parser Generators for many languages exist Python: PLY C++: YACC/LEXX Java: JFLEX We will be using PLY in this lab This gives us a way to quickly make a parser for our CFG n Lab 5, we will use math_example.py written using PLY The python file needs to be defined in a very specifc way.

PLY File nitialize Libraries At the top of the file we need to import the PLY libraries import ply.lex as lex import ply.yacc as yacc import sys

Tokens The first thing to define are the tokens Tokens: Single characters or base symbols. nitially we just list the names we give out tokens. tokens =( NUMBER, PLUS, TMES, LPAREN, RPAREN )

Tokens After listing the tokens, we need to define what each token means. n PLY, we can use regular expressions to define tokens. t_plus= r \+ t_tmes= r \* t_lparen= r \( t_rparen= r \) t_ignore= \t #gnore whitespace def t_number( t) : r [0-9]+ t.value = int(t.value) return t;

Special Tokens There are two additional special tokens we need to define Error tells PLY what to do if it finds an illegal character def t_error(t): print "llegal character %s on line %d" % (t.value[0],t.lexer.lineno) return t; Newline tells PLY what you want your newline character to be. We can also could line numbers here for errors def t_newline( t ): r \n+ t.lexer.lineno += len(t.value)

Tokens After all tokens have been defined, we need to tell PLY to build part of the parser. The lexer uses a parse table to break the input string up into tokens (a tokenizer) lex.lex()

Grammar After Defining the tokens, we can give the grammar We need to define each rule in our grammar expr -> term + expr term This requires 2 function definitions in PLY def p_expr_a( p ): expr : term PLUS expr p[0] =p[1] + p[3] p_expr_b( p ): expr : term p[0] = p[1]

Naming The grammar definition follows strict rules def p_expr_a( p ): expr : term PLUS expr p[0] =p[1] + p[3] The function name must start with a p_ The name you give the function must match the name used in the CFG string f you need multiple definitions, you can append _a, _b, etc

Definitions The grammar definition follows strict rules def p_expr_a( p ): expr : term PLUS expr p[0] =p[1] + p[3] The first line of the definition gives the CFG to match The second line gives the action to perform on a match n this example p[0] = the return value expr p[1] = the value from term p[2] = the string for + p[3] = the value from expr

Multiplication term -> factor * term factor The rules for multiplication in PLY def p_term_a( p ): term : factor TMES term p[0] = p[1] * p[3] def p_term_b( p ): term : factor p[0] = p[1]

Factors The factor symbol is used to make expressions in parenthesis act like numbers factor -> NUMBER ( expr ) The rules for factor in PLY def p_factor_a( p ): factor : NUMBER p[0] = p[1] def p_factor_b( p ): factor : LPAREN expr RPAREN p[0] = p[2] Reminder: We know NUMBER returns an int from the token definition

Parser We need two more commands to finish the parser First we define how to handle errors def p_error( p ): print "Syntax error in input!", str(p) sys.exit( 2 ) Lastly, we build the actual parser yacc.yacc()

Using the Parser The final part of the file defines the main function t reads from stdin and then calls the parser we generated if name == main : print "... print "..."; expression = sys.stdin.readline() while expression!= "": result = yacc.parse(expression) print "The Answer is "+str(result) expression = sys.stdin.readline()

Lab 5 Test the parser to see how it works for simple expressions Extend the Parser to handle exponents 2^5 Change the grammar mplement the new grammar you came up with f you don t finish you can submit by email