Parsing CSCI-400. Principles of Programming Languages.

Similar documents
Syntax-Directed Translation. Lecture 14

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Syntax Analysis. Chapter 4

Syntax and Grammars 1 / 21

A Simple Syntax-Directed Translator

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

CSE 12 Abstract Syntax Trees

A simple syntax-directed

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

2.2 Syntax Definition

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Lexical and Syntax Analysis

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Chapter 4. Lexical and Syntax Analysis

A programming language requires two major definitions A simple one pass compiler

4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Chapter 4 :: Semantic Analysis

Building Compilers with Phoenix

Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

Semantic Analysis. Lecture 9. February 7, 2018

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

Public-Service Announcement

CS 406: Syntax Directed Translation

Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

61A Lecture 26. Monday, October 31

Introduction to Parsing. Lecture 8

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

CSE 3302 Programming Languages Lecture 2: Syntax

Syntax Analysis. The Big Picture. The Big Picture. COMP 524: Programming Languages Srinivas Krishnan January 25, 2011

Derivations vs Parses. Example. Parse Tree. Ambiguity. Different Parse Trees. Context Free Grammars 9/18/2012

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Chapter 3: CONTEXT-FREE GRAMMARS AND PARSING Part 1

Chapter 3. Describing Syntax and Semantics ISBN

Review main idea syntax-directed evaluation and translation. Recall syntax-directed interpretation in recursive descent parsers

([1-9] 1[0-2]):[0-5][0-9](AM PM)? What does the above match? Matches clock time, may or may not be told if it is AM or PM.

CMSC 330: Organization of Programming Languages. Context Free Grammars

COMPILER DESIGN LECTURE NOTES

Introduction to Parsing Ambiguity and Syntax Errors

Topic 3: Syntax Analysis I

Syntax Analysis Part I

CPS 506 Comparative Programming Languages. Syntax Specification

Grammars and Parsing. Paul Klint. Grammars and Parsing

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Semantic Analysis. Compiler Architecture

CSE 341 Section 7. Eric Mullen Spring Adapted from slides by Nicholas Shahan, Dan Grossman, and Tam Dang

Project 1: Scheme Pretty-Printer

Using an LALR(1) Parser Generator

Introduction to Parsing Ambiguity and Syntax Errors

CSE 341 Section 7. Ethan Shea Autumn Adapted from slides by Nicholas Shahan, Dan Grossman, Tam Dang, and Eric Mullen

Compiling expressions

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COP 3402 Systems Software Syntax Analysis (Parser)

Compilers and computer architecture From strings to ASTs (2): context free grammars

Chapter 4. Lexical and Syntax Analysis. Topics. Compilation. Language Implementation. Issues in Lexical and Syntax Analysis.

ICOM 4036 Spring 2004

Introduction to Syntax Analysis Recursive-Descent Parsing

The role of semantic analysis in a compiler

Homework & Announcements

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Today. Assignments. Lecture Notes CPSC 326 (Spring 2019) Quiz 5. Exam 1 overview. Type checking basics. HW4 due. HW5 out, due in 2 Tuesdays

CMSC 330: Organization of Programming Languages

4. An interpreter is a program that

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COMPILERS BASIC COMPILER FUNCTIONS

1 Lexical Considerations

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Implementing a VSOP Compiler. March 12, 2018

Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

CS453 : JavaCUP and error recovery. CS453 Shift-reduce Parsing 1

Syntax Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Lecture 4: Syntax Specification

Building Compilers with Phoenix

Introduction to Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Decaf Language Reference

Abstract Syntax Trees & Top-Down Parsing

Abstract Syntax Trees & Top-Down Parsing

CSE 401 Midterm Exam Sample Solution 11/4/11

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

afewadminnotes CSC324 Formal Language Theory Dealing with Ambiguity: Precedence Example Office Hours: (in BA 4237) Monday 3 4pm Wednesdays 1 2pm

Formal Semantics. Chapter Twenty-Three Modern Programming Languages, 2nd ed. 1

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 97 INSTRUCTIONS

A lexical analyzer generator for Standard ML. Version 1.6.0, October 1994

Chapter 4. Abstract Syntax

Introduction to Automata Theory. BİL405 - Automata Theory and Formal Languages 1

Transcription:

Parsing Principles of Programming Languages https://lambda.mines.edu

Activity & Overview

Review the learning group activity with your group. Compare your solutions to the practice problems. Did anyone have any issues with the problems? Then, as a learning group, work on a regular expression to match double-quoted string literals: print("hello, World!") if (strcspn(cmdline, "'\"`")!= strlen(cmdline)) { printf("<text:p text:style-name=\"glossary\">"); escape("\"1 < 2\"") What you should match is shown in bold. Try your regular expressions at the Python REPL.

Suppose we have the following source code (which, presumably, might be for some programming language): alpha = beta + gamma * 4 How does our language implementation know what to do with this code? How do we determine the order of operations on this expression so that we compute beta + (gamma * 4) rather than (beta + gamma) * 4? How can we represent this code in memory in a way that makes it easy to evaluate or compile? How do we handle cases where programmers write the same expression but with different spacing or style, like this: alpha=beta+gamma *4

The goal of parsing is to convert textual source code into a representation that makes it easy to interpret compile. id: alpha Assign Sum We typically represent this using an abstract syntax tree. The abstract syntax tree for alpha = beta + gamma * 4 is shown. Value Product Conveys order of operation and nesting of parentheses: Product is a child of Sum here. id: beta Value id: gamma Value int: 4

Parsers are typically implemented using two stages: Lexical Analysis During lexical analysis, the input is tokenized to produce a sequence of tokens from the input. Syntactic Analysis During syntactic analysis, the tokens from lexical analysis are formed into an abstract syntax tree.

Lexical Analysis

During lexical analysis, we tokenize the input into a list tokens consisting of two fields: Token Type Data (optional) alpha=beta+gamma*4 LA Id(alpha), Equals, Id(beta), Plus, Id(gamma), Times, Int(4) Tokens which won t appear in the AST are called control tokens: these control the operation of the parser.

tokens_p = re.compile(r''' \s*(?: (=) (\+) (\*) # operators (-?\d+) # integers (\w+) # identifiers (.) # error )\s*''', re.verbose) def tokenize(code): for m in tokens_p.finditer(code): if m.group(1): yield Equals()... elif m.group(5): yield Id(m.group(5)) elif m.group(6): raise SyntaxError

Syntactic Analysis

During syntactic analysis, we turn the token stream from the lexical analysis into an abstract syntax tree. Id(alpha), Equals, Id(beta), Plus, Id(gamma), Times, Int(4) SA AST In general, there s two ways to parse a stream of tokens: Top-Down: form the node at the root of the syntax tree, then recursively form the children. Bottom-Up: start by forming the leaf nodes, then forming their parents.

In order to parse a language, we need a notation to formalize the constructs of our language. We define a set of production rules that state what the various constructs are formed of: Assign IdEqualsSum Sum SumPlusProduct Sum Product Product ProductTimesValue Product Value Value Int Value Id This is actually a specific kind of context-free grammar called a LR (left-recursive) grammar. It makes it convenient for using shift-reduce parsers (coming up!)

Shift-reduce is a type of bottom-up parser. We place a cursor at the beginning of the token stream, and parse each step using one of two transitions: Shift: move the cursor to the next token to the right. Reduce: match a production rule to the tokens directly to the left of the cursor, reducing them to the LHS of the production rule. We refer to the token just to the right of the cursor as the lookahead token. We use the lookahead token to determine that the left of the cursor can unambiguously be reduced, otherwise we will shift. Example on Whiteboard Example shown on whiteboard of using our grammar to create an AST using shift-reduce.