# LECTURE 7. Lex and Intro to Parsing

Size: px
Start display at page:

Transcription

1 LECTURE 7 Lex and Intro to Parsing

2 LEX Last lecture, we learned a little bit about how we can take our regular expressions (which specify our valid tokens) and create real programs that can recognize them. We ended the lecture with a little introduction to Lex and a sample Lex file. Today, we re going to create a Lex file together.

3 LEX Let s say I want to create a scanner which matches the following tokens: Integers any whole number and its negation. Reals in decimal format. Identifiers any sequence of letters, digits, and underscores which start with a letter. Operators any of +, -, *, /, = Whitespace I m going to print out the tokens that I match and count how many of each I encounter.

4 LEX Let s review the format of a Lex file. {definitions} %% {rules} %% {user subroutines} Definitions: Can include global C code as well as macro definitions for regular expressions. Rules: Regular expressions of tokens to match and their corresponding actions. User Subroutines: For our purposes, just contains an optional main function.

5 LEX Let s start by creating a Lex file which simply matches our integers. We defined integers as being any whole number and its negation (i.e. -3, -2, -1, 0, 1, 2, 3 ). So what s an appropriate regular expression to match this regular set?

6 LEX Let s start by creating a Lex file which simply matches our integers. We defined integers as being any whole number and its negation (i.e. -3, -2, -1, 0, 1, 2, 3 ). So what s an appropriate regular expression to match this regular set? (-?[1-9][0-9]*) 0 - is optional

7 LEX Let s start by creating a Lex file which simply matches our integers. We defined integers as being any whole number and its negation (i.e. -3, -2, -1, 0, 1, 2, 3 ). So what s an appropriate regular expression to match this regular set? (-?[1-9][0-9]*) 0 We can match zero or more numbers from the range 0-9, but it must be preceded by a number from the range 1-9.

8 LEX Let s start by creating a Lex file which simply matches our integers. We defined integers as being any whole number and its negation (i.e. -3, -2, -1, 0, 1, 2, 3 ). So what s an appropriate regular expression to match this regular set? (-?[1-9][0-9]*) 0 Alternatively, just match zero.

9 LEX simple.l: %{ int numints = 0; %} inttoken (-?[1-9][0-9]*) 0 %% {inttoken} %% {printf("matched integer: %s\n", yytext); numints++;} int main(){ yylex(); printf("number of integer tokens: %d \n", numints); return 0; }

10 LEX simple.l: %{ int numints = 0; %} inttoken (-?[1-9][0-9]*) 0 Macro created for integers Lex creates a pointer to the matched string, yytext. We ll use it to print what we found. %% {inttoken} %% {printf("matched integer: %s\n", yytext); numints++;} int main(){ yylex(); printf("number of integer tokens: %d \n", numints); return 0; }

11 LEX Let s give this a little test before we move on. What s with the output? < test_nums.txt Matched integer: 12 Matched integer: Matched integer: 1 Matched integer: Matched integer: 456 Matched integer: -4.Matched integer: 567 Matched integer: Matched integer: 3 test_nums.txt Matched integer: -45 Matched integer: 4862 Number of integer tokens: 11

12 LEX Now, let s take care of the reals (e.g. 6.7, -3.0, -.54). How about this regular expression? (- (-?[1-9][0-9]*) 0))?. [0-9]+ We already defined inttoken to be (-?[1-9][0-9]*) 0, so we can also do this: (- {inttoken})?. [0-9]+

13 LEX Now, let s take care of the reals (e.g. 6.7, -3.0, -.54). How about this regular expression? (- (-?[1-9][0-9]*) 0))?. [0-9]+ We already defined inttoken to be (-?[1-9][0-9]*) 0, so we can also do this: (- {inttoken})?. [0-9]+ - We either allow inttokens before the decimal, a single negative sign, or nothing. - Followed by the decimal itself. - Followed by at least one digit in the range What are our limitations? What do we not allow?

14 LEX simple.l: %{ int numints = 0, numdoubles = 0; %} inttoken (-?[1-9][0-9]*) 0 %% {inttoken} {printf("matched integer: %s\n", yytext); numints++;} (- {inttoken})?. [0-9]+ {printf("matched real: %s\n", yytext); numdoubles++;} %% int main(){ yylex(); printf("number of integer tokens: %d \n", numints); printf("number of real tokens: %d \n", numdoubles); return 0; }

15 LEX < test_nums.txt Matched integer: 12 Matched integer: Matched integer: 1 Matched integer: Matched real:.456 Matched real: Matched real: Matched integer: -45 test_nums.txt Matched integer: 4862 Number of integer tokens: 6 Number of real tokens: 3

16 LEX Now, we ll do the next three all at once (check out simple.l on the website): Identifiers: any sequence of letters, digits, and underscores which start with a letter. [a-za-z][a-za-z_0-9]* Operators: [+\-/*=] Whitespace: [ \n\t] (Note: we have to escape - it has special meaning in the brackets.)

17 LEX Okay, so let s try this with a new test file. < test_all.txt Matched identifier: my_int1 Matched operator: = Matched integer: 1 Matched identifier: my_int2 Matched operator: = Matched integer: 3 Matched operator: + Matched identifier: my_int1 Matched identifier: Myreal1 Matched operator: = Matched real: -3.4 Matched operator: - Matched real: 2.0 Matched identifier: Myreal2 Matched operator: = Matched identifier: Myreal1 Matched operator: / Matched real: -2.5 Number of integer tokens: 2 Number of real tokens: 3 Number of identifiers: 6 Number of operators: 7 Number of whitespace characters: 17 test_all.txt my_int1 = 1 my_int2 = 3 + my_int1 Myreal1 = Myreal2 = Myreal1/-2.5

18 LEX Source:

19 LEX Source:

20 LEX There are some excellent Lex references out there! Go read about it. We will do a little project on Lex Lex is available on all linprog machines, so you can start playing with it! Just create a simple.l file and try to make it more and more detailed. Remember to compile with the lfl flag on linprog (e.g. gcc lex.yy.c -lfl).

21 PARSING So now that we know the ins-and-outs of how compilers determine the valid tokens of a program, we can talk about how they determine valid patterns of tokens. A parser is the part of the compiler which is responsible for serving as the recognizer of the programming language, in the same way that the scanner is the recognizer for the tokens.

22 PARSING Even though we typically picture parsing as the stage that comes after scanning, this isn t really the case. In a real scenario, the parser will generally call the scanner as needed to obtain input tokens. It creates a parse tree out of the tokens and passes the tree to the later stages of the compiler. This style of compilation is known as syntax-directed translation.

23 PARSING Let s review context-free grammars. Each context-free grammar has four components: A finite set of tokens (terminal symbols), denoted T. A finite set of non-terminals, denoted N. A finite set of productions N (T N)* A special nonterminal called the start symbol. The idea is similar to regular expressions, except that we can create recursive definitions. Therefore, context-free grammars are more expressive.

24 PARSING Given a context-free grammar, parsing is the process of determining whether the start symbol can derive the program. If successful, the program is a valid program. If failed, the program is invalid.

25 PARSING We can derive parse trees from context-free grammars given some input string. expr id number - expr ( expr ) expr op expr op + - * / expr expr expr op expr expr op expr op expr expr op expr op number expr op expr / number expr op id / number expr * id / number ( expr ) * id / number ( expr op expr ) * id / number ( expr op id ) * id / number ( expr + id ) * id / number ( id + id ) * id / number expr ( ) expr expr id op + expr id op * expr expr op expr id / num

26 PARSING There are two classes of grammars for which linear-time parsers can be constructed: LL Left-to-right, leftmost derivation Input is read from left to right. Derivation is left-most, meaning the left-most nonterminal is replaced at every step. Can be hand-written or generated by a parser generator. Use top-down or predictive parsers. LR Left-to-right, rightmost derivation Input is read from left to right. Derivation is right-most, meaning the right-most nonterminal is replaced at every step. More common, larger class of grammars. Almost always automatically generated. Use bottom-up parsers.

27 PARSING LL parsers are Top-Down ( Predictive ) parsers. Construct the parse tree from the root down, predicting the production used based on some lookahead. LL parsers are easier to understand, but the grammars are less intuitive. LR parsers are Bottom-Up parsers. Construct the parse tree from the leaves up, joining nodes together under single parents. LR parsers can parse more intuitive grammars, but are harder to create. When you see a () suffix with a number (e.g. LL(1) ), that indicates how many tokens of look-ahead the parser requires. We will be focusing on LL parsers in this class.

28 RECURSIVE DESCENT PARSING Recursive descent parsers are an LL parser in which every non-terminal in the grammar corresponds to a subroutine of the parser. Typically hand-written but can be automatically generated. Used when a language is relatively simple.

29 RECURSIVE DESCENT PARSER Let s look at an example. Take the following context-free grammar. It has certain properties (notably, the absence of left recursion) that make it a good candidate to be parsed by a recursive descent parser. program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int Note: a grammar is left-recursive if nonterminal A can derive to a sentential form A Aw where w is a string of terminals and nonterminals. In other words, the nonterminal appears on the left-most side of the replacement. LL grammars cannot be left-recursive!

30 RECURSIVE DESCENT PARSER Some strings we can derive from this grammar include: (1 * 2) (1 * 2) * 4 etc! program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

31 RECURSIVE DESCENT PARSER In order to create a parser for this grammar, all we have to do is create appropriate subroutines for each nonterminal. Let s start with program. Because program is our starting nonterminal, it s always the first function called. Now, inside of program, let s think about what we want to do! We ll probably want to call the expr() function because it is the only production for program. But in which cases do we make this call? program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

32 RECURSIVE DESCENT PARSER In order to create a parser for this grammar, all we have to do is create appropriate subroutines for each nonterminal. Let s start with program. procedure program case input of: '(', int: else error expr() match('\$') Note: \$ is a symbol meaning end-of-input. Typically, this would be the EOF character. It is the last thing we should consume to know we re done parsing. We use match() calls to consume terminal tokens. program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

33 RECURSIVE DESCENT PARSER Now let s look at expr. procedure expr case input of: '(', int: term() expr_tail() else error program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

34 RECURSIVE DESCENT PARSER Now let s look at term. procedure term case input of: '(', int: factor() term_tail() else error program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

35 RECURSIVE DESCENT PARSER Now let s look at factor. procedure factor case input of: '(': match('(') expr() match(')') int: match(int) else error program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

36 RECURSIVE DESCENT PARSER Now let s look at expr_tail procedure expr_tail case input of: '+': match('+') term() expr_tail() '\$, ) : skip else error program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

37 RECURSIVE DESCENT PARSER Now let s look at expr_tail procedure expr_tail case input of: '+': match('+') term() expr_tail() '\$, ) : skip else error This is where it gets a little tricky notice, we check for an input of \$ or ). This is how we handle the case where expr_tail is the empty string. The only thing that could follow expr_tail in that case is \$ or ). program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

38 RECURSIVE DESCENT PARSER Now let s look at term_tail procedure term_tail case input of: '*': match('*') factor() term_tail() '+', '\$, ) : skip else error Again notice that we check for +, ) and \$. These are the only possible valid tokens that could follow term_tail if it were the empty string. program expr expr term expr_tail expr_tail + term expr_tail ε term factor term_tail term_tail * factor term_tail ϵ factor ( expr ) int

39 RECURSIVE DESCENT PARSER Putting all of these subroutines together would give us a nice little recursive descent parser for our grammar. But this code only verifies that the program is syntactically correct. We know that parsers must create a parse tree for the next step in the compilation process. Basically, we can build in the construction of a parse tree by creating and linking new nodes as we encounter the terminal or non-terminal symbols. But nodes created for non-terminal symbols must be expanded further.

40 RECURSIVE DESCENT PARSER Some recursive descent parsers require backtracking. The grammar we used was an LL(1) grammar it requires a look-ahead of only one character. This allowed us to create a predictive parser, which does not require backtracking. Any LL(k) grammar can be recognized by a predictive parser.

41 NEXT LECTURE More LL parsing.

### 10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

### 10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Lexical and Syntactic Analysis Lexical and Syntax Analysis In Text: Chapter 4 Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input characters and output

### COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

COMPILER CONSTRUCTION Lab 2 Symbol table LABS Lab 3 LR parsing and abstract syntax tree construction using ''bison' Lab 4 Semantic analysis (type checking) PHASES OF A COMPILER Source Program Lab 2 Symtab

### COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview Tokens and regular expressions Syntax and context-free grammars Grammar derivations More about parse trees Top-down and bottom-up

### CSC 467 Lecture 3: Regular Expressions

CSC 467 Lecture 3: Regular Expressions Recall How we build a lexer by hand o Use fgetc/mmap to read input o Use a big switch to match patterns Homework exercise static TokenKind identifier( TokenKind token

### COP 3402 Systems Software Top Down Parsing (Recursive Descent)

COP 3402 Systems Software Top Down Parsing (Recursive Descent) Top Down Parsing 1 Outline 1. Top down parsing and LL(k) parsing 2. Recursive descent parsing 3. Example of recursive descent parsing of arithmetic

### Top down vs. bottom up parsing

Parsing A grammar describes the strings that are syntactically legal A recogniser simply accepts or rejects strings A generator produces sentences in the language described by the grammar A parser constructs

### Lexical and Syntax Analysis

Lexical and Syntax Analysis In Text: Chapter 4 N. Meng, F. Poursardar Lexical and Syntactic Analysis Two steps to discover the syntactic structure of a program Lexical analysis (Scanner): to read the input

### COP4020 Programming Languages. Syntax Prof. Robert van Engelen

COP4020 Programming Languages Syntax Prof. Robert van Engelen Overview n Tokens and regular expressions n Syntax and context-free grammars n Grammar derivations n More about parse trees n Top-down and

### Parsing III. CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Parsing III (Top-down parsing: recursive descent & LL(1) ) (Bottom-up parsing) CS434 Lecture 8 Spring 2005 Department of Computer Science University of Alabama Joel Jones Copyright 2003, Keith D. Cooper,

### flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input. More often than not, though, you ll want to use flex to generate a scanner that divides

### MIT Top-Down Parsing. Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology

MIT 6.035 Top-Down Parsing Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology Orientation Language specification Lexical structure regular expressions Syntactic structure

### LECTURE 11. Semantic Analysis and Yacc

LECTURE 11 Semantic Analysis and Yacc REVIEW OF LAST LECTURE In the last lecture, we introduced the basic idea behind semantic analysis. Instead of merely specifying valid structures with a context-free

### Parsing. Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice.

Parsing Note by Baris Aktemur: Our slides are adapted from Cooper and Torczon s slides that they prepared for COMP 412 at Rice. Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students

### TDDD55 - Compilers and Interpreters Lesson 3

TDDD55 - Compilers and Interpreters Lesson 3 November 22 2011 Kristian Stavåker (kristian.stavaker@liu.se) Department of Computer and Information Science Linköping University LESSON SCHEDULE November 1,

### Introduction to Parsing. Lecture 8

Introduction to Parsing Lecture 8 Adapted from slides by G. Necula Outline Limitations of regular languages Parser overview Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

### 8 Parsing. Parsing. Top Down Parsing Methods. Parsing complexity. Top down vs. bottom up parsing. Top down vs. bottom up parsing

8 Parsing Parsing A grammar describes syntactically legal strings in a language A recogniser simply accepts or rejects strings A generator produces strings A parser constructs a parse tree for a string

### COP4020 Programming Assignment 2 - Fall 2016

COP4020 Programming Assignment 2 - Fall 2016 To goal of this project is to implement in C or C++ (your choice) an interpreter that evaluates arithmetic expressions with variables in local scopes. The local

### Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Roadmap > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing The role of the parser > performs context-free syntax analysis > guides

### Lexical and Syntax Analysis. Top-Down Parsing

Lexical and Syntax Analysis Top-Down Parsing Easy for humans to write and understand String of characters Lexemes identified String of tokens Easy for programs to transform Data structure Syntax A syntax

### CS1622. Today. A Recursive Descent Parser. Preliminaries. Lecture 9 Parsing (4)

CS1622 Lecture 9 Parsing (4) CS 1622 Lecture 9 1 Today Example of a recursive descent parser Predictive & LL(1) parsers Building parse tables CS 1622 Lecture 9 2 A Recursive Descent Parser. Preliminaries

### Part 5 Program Analysis Principles and Techniques

1 Part 5 Program Analysis Principles and Techniques Front end 2 source code scanner tokens parser il errors Responsibilities: Recognize legal programs Report errors Produce il Preliminary storage map Shape

### Parsing II Top-down parsing. Comp 412

COMP 412 FALL 2018 Parsing II Top-down parsing Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2018, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

### Compiler Construction: Parsing

Compiler Construction: Parsing Mandar Mitra Indian Statistical Institute M. Mitra (ISI) Parsing 1 / 33 Context-free grammars. Reference: Section 4.2 Formal way of specifying rules about the structure/syntax

### LL(k) Parsing. Predictive Parsers. LL(k) Parser Structure. Sample Parse Table. LL(1) Parsing Algorithm. Push RHS in Reverse Order 10/17/2012

Predictive Parsers LL(k) Parsing Can we avoid backtracking? es, if for a given input symbol and given nonterminal, we can choose the alternative appropriately. his is possible if the first terminal of

### COP4020 Programming Assignment 2 Spring 2011

COP4020 Programming Assignment 2 Spring 2011 Consider our familiar augmented LL(1) grammar for an expression language (see Syntax lecture notes on the LL(1) expression grammar): ->

### Recursive Descent Parsers

Recursive Descent Parsers Lecture 7 Robb T. Koether Hampden-Sydney College Wed, Jan 28, 2015 Robb T. Koether (Hampden-Sydney College) Recursive Descent Parsers Wed, Jan 28, 2015 1 / 18 1 Parsing 2 LL Parsers

### CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS415 Compilers Syntax Analysis These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University Limits of Regular Languages Advantages of Regular Expressions

### UNIT III & IV. Bottom up parsing

UNIT III & IV Bottom up parsing 5.0 Introduction Given a grammar and a sentence belonging to that grammar, if we have to show that the given sentence belongs to the given grammar, there are two methods.

### Lexical Analysis. Introduction

Lexical Analysis Introduction Copyright 2015, Pedro C. Diniz, all rights reserved. Students enrolled in the Compilers class at the University of Southern California have explicit permission to make copies

### CSE 3302 Programming Languages Lecture 2: Syntax

CSE 3302 Programming Languages Lecture 2: Syntax (based on slides by Chengkai Li) Leonidas Fegaras University of Texas at Arlington CSE 3302 L2 Spring 2011 1 How do we define a PL? Specifying a PL: Syntax:

### CS 230 Programming Languages

CS 230 Programming Languages 10 / 16 / 2013 Instructor: Michael Eckmann Today s Topics Questions/comments? Top Down / Recursive Descent Parsers Top Down Parsers We have a left sentential form xa Expand

### Outline. Limitations of regular languages. Introduction to Parsing. Parser overview. Context-free grammars (CFG s)

Outline Limitations of regular languages Introduction to Parsing Parser overview Lecture 8 Adapted from slides by G. Necula Context-free grammars (CFG s) Derivations Languages and Automata Formal languages

### TDDD55- Compilers and Interpreters Lesson 2

TDDD55- Compilers and Interpreters Lesson 2 November 11 2011 Kristian Stavåker (kristian.stavaker@liu.se) Department of Computer and Information Science Linköping University PURPOSE OF LESSONS The purpose

### Lexical and Syntax Analysis

Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Lexical and Syntax Analysis (of Programming Languages) Top-Down Parsing Easy for humans to write and understand String of characters

### Context-free grammars

Context-free grammars Section 4.2 Formal way of specifying rules about the structure/syntax of a program terminals - tokens non-terminals - represent higher-level structures of a program start symbol,

### CSCI312 Principles of Programming Languages

Copyright 2006 The McGraw-Hill Companies, Inc. CSCI312 Principles of Programming Languages! LL Parsing!! Xu Liu Derived from Keith Cooper s COMP 412 at Rice University Recap Copyright 2006 The McGraw-Hill

### Introduction to Parsing

Introduction to Parsing The Front End Source code Scanner tokens Parser IR Errors Parser Checks the stream of words and their parts of speech (produced by the scanner) for grammatical correctness Determines

### COMPILERS AND INTERPRETERS Lesson 4 TDDD16

COMPILERS AND INTERPRETERS Lesson 4 TDDD16 Kristian Stavåker (kristian.stavaker@liu.se) Department of Computer and Information Science Linköping University TODAY Introduction to the Bison parser generator

### CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Chapter 4 Lexical and Syntax Analysis Introduction - Language implementation systems must analyze source code, regardless of the specific implementation approach - Nearly all syntax analysis is based on

### Syntax Analysis, III Comp 412

COMP 412 FALL 2017 Syntax Analysis, III Comp 412 source code IR Front End Optimizer Back End IR target code Copyright 2017, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp

### Chapter 4. Lexical analysis. Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Chapter 4 Lexical analysis Lexical scanning Regular expressions DFAs and FSAs Lex Concepts CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley

### Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Syntax Syntax Syntax defines what is grammatically valid in a programming language Set of grammatical rules E.g. in English, a sentence cannot begin with a period Must be formal and exact or there will

### Chapter 2 :: Programming Language Syntax

Chapter 2 :: Programming Language Syntax Michael L. Scott kkman@sangji.ac.kr, 2015 1 Regular Expressions A regular expression is one of the following: A character The empty string, denoted by Two regular

### PRACTICAL CLASS: Flex & Bison

Master s Degree Course in Computer Engineering Formal Languages FORMAL LANGUAGES AND COMPILERS PRACTICAL CLASS: Flex & Bison Eliana Bove eliana.bove@poliba.it Install On Linux: install with the package

### CS 2210 Sample Midterm. 1. Determine if each of the following claims is true (T) or false (F).

CS 2210 Sample Midterm 1. Determine if each of the following claims is true (T) or false (F). F A language consists of a set of strings, its grammar structure, and a set of operations. (Note: a language

### Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

Compilers Parsing Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts) Next step text chars Lexical analyzer tokens Parser IR Errors Parsing: Organize tokens into sentences Do tokens conform

### LECTURE 6 Scanning Part 2

LECTURE 6 Scanning Part 2 FROM DFA TO SCANNER In the previous lectures, we discussed how one might specify valid tokens in a language using regular expressions. We then discussed how we can create a recognizer

### Concepts. Lexical scanning Regular expressions DFAs and FSAs Lex. Lexical analysis in perspective

Concepts Lexical scanning Regular expressions DFAs and FSAs Lex CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 1 CMSC 331, Some material 1998 by Addison Wesley Longman, Inc. 2 Lexical analysis

### Syntax Analysis, III Comp 412

Updated algorithm for removal of indirect left recursion to match EaC3e (3/2018) COMP 412 FALL 2018 Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, III Comp 412 source code

### Concepts Introduced in Chapter 3. Lexical Analysis. Lexical Analysis Terms. Attributes for Tokens

Concepts Introduced in Chapter 3 Lexical Analysis Regular Expressions (REs) Nondeterministic Finite Automata (NFA) Converting an RE to an NFA Deterministic Finite Automatic (DFA) Lexical Analysis Why separate

### Prelude COMP 181 Tufts University Computer Science Last time Grammar issues Key structure meaning Tufts University Computer Science

Prelude COMP Lecture Topdown Parsing September, 00 What is the Tufts mascot? Jumbo the elephant Why? P. T. Barnum was an original trustee of Tufts : donated \$0,000 for a natural museum on campus Barnum

### CS 314 Principles of Programming Languages

CS 314 Principles of Programming Languages Lecture 5: Syntax Analysis (Parsing) Zheng (Eddy) Zhang Rutgers University January 31, 2018 Class Information Homework 1 is being graded now. The sample solution

### Introduction to Lex & Yacc. (flex & bison)

Introduction to Lex & Yacc (flex & bison) Lex & Yacc (flex & bison) lexical rules (regular expression) lexical rules (context-free grammar) lex (flex) yacc (bison) Input yylex() yyparse() Processed output

### Parsing. Lecture 11: Parsing. Recursive Descent Parser. Arithmetic grammar. - drops irrelevant details from parse tree

Parsing Lecture 11: Parsing CSC 131 Fall, 2014 Kim Bruce Build parse tree from an expression Interested in abstract syntax tree - drops irrelevant details from parse tree Arithmetic grammar ::=

### 3. Parsing. Oscar Nierstrasz

3. Parsing Oscar Nierstrasz Thanks to Jens Palsberg and Tony Hosking for their kind permission to reuse and adapt the CS132 and CS502 lecture notes. http://www.cs.ucla.edu/~palsberg/ http://www.cs.purdue.edu/homes/hosking/

### Parsing Algorithms. Parsing: continued. Top Down Parsing. Predictive Parser. David Notkin Autumn 2008

Parsing: continued David Notkin Autumn 2008 Parsing Algorithms Earley s algorithm (1970) works for all CFGs O(N 3 ) worst case performance O(N 2 ) for unambiguous grammars Based on dynamic programming,

### 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

### Building a Parser III. CS164 3:30-5:00 TT 10 Evans. Prof. Bodik CS 164 Lecture 6 1

Building a Parser III CS164 3:30-5:00 TT 10 Evans 1 Overview Finish recursive descent parser when it breaks down and how to fix it eliminating left recursion reordering productions Predictive parsers (aka

### The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

ICOM 4036 Programming Languages Lexical and Syntax Analysis Lexical Analysis The Parsing Problem Recursive-Descent Parsing Bottom-Up Parsing This lecture covers review questions 14-27 This lecture covers

### Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lecture Outline COMP-421 Compiler Design! Lexical Analyzer Lex! Lex Examples Presented by Dr Ioanna Dionysiou Figures and part of the lecture notes taken from A compact guide to lex&yacc, epaperpress.com

### 4. Lexical and Syntax Analysis

4. Lexical and Syntax Analysis 4.1 Introduction Language implementation systems must analyze source code, regardless of the specific implementation approach Nearly all syntax analysis is based on a formal

### programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

Chapter 2 :: Programming Language Syntax Programming Language Pragmatics Michael L. Scott Introduction programming languages need to be precise natural languages less so both form (syntax) and meaning

### COMPILER CONSTRUCTION Seminar 02 TDDB44

COMPILER CONSTRUCTION Seminar 02 TDDB44 Martin Sjölund (martin.sjolund@liu.se) Adrian Horga (adrian.horga@liu.se) Department of Computer and Information Science Linköping University LABS Lab 3 LR parsing

### Syntax Analysis. COMP 524: Programming Language Concepts Björn B. Brandenburg. The University of North Carolina at Chapel Hill

Syntax Analysis Björn B. Brandenburg The University of North Carolina at Chapel Hill Based on slides and notes by S. Olivier, A. Block, N. Fisher, F. Hernandez-Campos, and D. Stotts. The Big Picture Character

### Chapter 3. Parsing #1

Chapter 3 Parsing #1 Parser source file get next character scanner get token parser AST token A parser recognizes sequences of tokens according to some grammar and generates Abstract Syntax Trees (ASTs)

### THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

THE COMPILATION PROCESS Character stream CS 403: Scanning and Parsing Stefan D. Bruda Fall 207 Token stream Parse tree Abstract syntax tree Modified intermediate form Target language Modified target language

### Figure 2.1: Role of Lexical Analyzer

Chapter 2 Lexical Analysis Lexical analysis or scanning is the process which reads the stream of characters making up the source program from left-to-right and groups them into tokens. The lexical analyzer

### TDDD55- Compilers and Interpreters Lesson 3

TDDD55- Compilers and Interpreters Lesson 3 Zeinab Ganjei (zeinab.ganjei@liu.se) Department of Computer and Information Science Linköping University 1. Grammars and Top-Down Parsing Some grammar rules

Downloaded from http://himadri.cmsdu.org Page 1 LR Parsing We first understand Context Free Grammars. Consider the input string: x+2*y When scanned by a scanner, it produces the following stream of tokens:

### Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators Jeremy R. Johnson 1 Theme We have now seen how to describe syntax using regular expressions and grammars and how to create

### Types of parsing. CMSC 430 Lecture 4, Page 1

Types of parsing Top-down parsers start at the root of derivation tree and fill in picks a production and tries to match the input may require backtracking some grammars are backtrack-free (predictive)

### Introduction to Parsing. Comp 412

COMP 412 FALL 2010 Introduction to Parsing Comp 412 Copyright 2010, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled in Comp 412 at Rice University have explicit permission to make

### Syntax-Directed Translation

Syntax-Directed Translation ALSU Textbook Chapter 5.1 5.4, 4.8, 4.9 Tsan-sheng Hsu tshsu@iis.sinica.edu.tw http://www.iis.sinica.edu.tw/~tshsu 1 What is syntax-directed translation? Definition: The compilation

### COP 3402 Systems Software Syntax Analysis (Parser)

COP 3402 Systems Software Syntax Analysis (Parser) Syntax Analysis 1 Outline 1. Definition of Parsing 2. Context Free Grammars 3. Ambiguous/Unambiguous Grammars Syntax Analysis 2 Lexical and Syntax Analysis

### CS 403: Scanning and Parsing

CS 403: Scanning and Parsing Stefan D. Bruda Fall 2017 THE COMPILATION PROCESS Character stream Scanner (lexical analysis) Token stream Parser (syntax analysis) Parse tree Semantic analysis Abstract syntax

### Chapter 3. Describing Syntax and Semantics ISBN

Chapter 3 Describing Syntax and Semantics ISBN 0-321-49362-1 Chapter 3 Topics Introduction The General Problem of Describing Syntax Formal Methods of Describing Syntax Copyright 2009 Addison-Wesley. All

### CS 315 Programming Languages Syntax. Parser. (Alternatively hand-built) (Alternatively hand-built)

Programming languages must be precise Remember instructions This is unlike natural languages CS 315 Programming Languages Syntax Precision is required for syntax think of this as the format of the language

### CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CD Assignment I 1. Explain the various phases of the compiler with a simple example. The compilation process is a sequence of various phases. Each phase takes input from the previous, and passes the output

### EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

GROUP - B EXPERIMENT NO : 07 1. Title: Write a program using Lex specifications to implement lexical analysis phase of compiler to total nos of words, chars and line etc of given file. 2. Objectives :

### LL(1) predictive parsing

LL(1) predictive parsing Informatics 2A: Lecture 11 Mary Cryan School of Informatics University of Edinburgh mcryan@staffmail.ed.ac.uk 10 October 2018 1 / 15 Recap of Lecture 10 A pushdown automaton (PDA)

### Programming Language Syntax and Analysis

Programming Language Syntax and Analysis 2017 Kwangman Ko (http://compiler.sangji.ac.kr, kkman@sangji.ac.kr) Dept. of Computer Engineering, Sangji University Introduction Syntax the form or structure of

### A programming language requires two major definitions A simple one pass compiler

A programming language requires two major definitions A simple one pass compiler [Syntax: what the language looks like A context-free grammar written in BNF (Backus-Naur Form) usually suffices. [Semantics:

### Ambiguity, Precedence, Associativity & Top-Down Parsing. Lecture 9-10

Ambiguity, Precedence, Associativity & Top-Down Parsing Lecture 9-10 (From slides by G. Necula & R. Bodik) 9/18/06 Prof. Hilfinger CS164 Lecture 9 1 Administrivia Please let me know if there are continued

### EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

EDAN65: Compilers, Lecture 06 A LR parsing Görel Hedin Revised: 2017-09-11 This lecture Regular expressions Context-free grammar Attribute grammar Lexical analyzer (scanner) Syntactic analyzer (parser)

### CSE 401 Midterm Exam Sample Solution 2/11/15

Question 1. (10 points) Regular expression warmup. For regular expression questions, you must restrict yourself to the basic regular expression operations covered in class and on homework assignments:

### Revisit the example. Transformed DFA 10/1/16 A B C D E. Start

Revisit the example ε 0 ε 1 Start ε a ε 2 3 ε b ε 4 5 ε a b b 6 7 8 9 10 ε-closure(0)={0, 1, 2, 4, 7} = A Trans(A, a) = {1, 2, 3, 4, 6, 7, 8} = B Trans(A, b) = {1, 2, 4, 5, 6, 7} = C Trans(B, a) = {1,

### Introduction to Syntax Analysis Recursive-Descent Parsing

Introduction to Syntax Analysis Recursive-Descent Parsing CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Friday, February 10, 2017 Glenn G. Chappell Department of

### LECTURE 3. Compiler Phases

LECTURE 3 Compiler Phases COMPILER PHASES Compilation of a program proceeds through a fixed series of phases. Each phase uses an (intermediate) form of the program produced by an earlier phase. Subsequent

### Formal Languages and Compilers Lecture VII Part 3: Syntactic A

Formal Languages and Compilers Lecture VII Part 3: Syntactic Analysis Free University of Bozen-Bolzano Faculty of Computer Science POS Building, Room: 2.03 artale@inf.unibz.it http://www.inf.unibz.it/

### Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412

Midterm Exam: Thursday October 18, 7PM Herzstein Amphitheater Syntax Analysis, V Bottom-up Parsing & The Magic of Handles Comp 412 COMP 412 FALL 2018 source code IR Front End Optimizer Back End IR target

### CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CSEP 501 Compilers Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter 2008 1/8/2008 2002-08 Hal Perkins & UW CSE B-1 Agenda Basic concepts of formal grammars (review) Regular expressions

### Parser. Larissa von Witte. 11. Januar Institut für Softwaretechnik und Programmiersprachen. L. v. Witte 11. Januar /23

Parser Larissa von Witte Institut für oftwaretechnik und Programmiersprachen 11. Januar 2016 L. v. Witte 11. Januar 2016 1/23 Contents Introduction Taxonomy Recursive Descent Parser hift Reduce Parser

### CS 406/534 Compiler Construction Parsing Part I

CS 406/534 Compiler Construction Parsing Part I Prof. Li Xu Dept. of Computer Science UMass Lowell Fall 2004 Part of the course lecture notes are based on Prof. Keith Cooper, Prof. Ken Kennedy and Dr.

### Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Comp 411 Principles of Programming Languages Lecture 3 Parsing Corky Cartwright January 11, 2019 Top Down Parsing What is a context-free grammar (CFG)? A recursive definition of a set of strings; it is

### Top-Down Parsing and Intro to Bottom-Up Parsing. Lecture 7

Top-Down Parsing and Intro to Bottom-Up Parsing Lecture 7 1 Predictive Parsers Like recursive-descent but parser can predict which production to use Predictive parsers are never wrong Always able to guess

### Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

Agenda for Today Regular Expressions CSE 413, Autumn 2005 Programming Languages Basic concepts of formal grammars Regular expressions Lexical specification of programming languages Using finite automata