Lexical and Syntax Analysis

Similar documents
Lexical and Syntax Analysis

Syntax Analysis Part IV

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Using an LALR(1) Parser Generator

Introduction to Lex & Yacc. (flex & bison)

An Introduction to LEX and YACC. SYSC Programming Languages

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Syntax Analysis Part VIII

CSCI Compiler Design

PRACTICAL CLASS: Flex & Bison

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

LECTURE 11. Semantic Analysis and Yacc

Compiler Construction Assignment 3 Spring 2018

Yacc Yet Another Compiler Compiler

Lex and Yacc. A Quick Tour

Yacc: A Syntactic Analysers Generator

CSE302: Compiler Design

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

HW8 Use Lex/Yacc to Turn this: Into this: Lex and Yacc. Lex / Yacc History. A Quick Tour. if myvar == 6.02e23**2 then f(..!

Principles of Programming Languages

Compiler Lab. Introduction to tools Lex and Yacc

Yacc. Generator of LALR(1) parsers. YACC = Yet Another Compiler Compiler symptom of two facts: Compiler. Compiler. Parser

Compiler construction in4020 lecture 5

Parsing and Pattern Recognition

Syntax-Directed Translation

Compiler Construction

Preparing for the ACW Languages & Compilers

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

Lexical and Parser Tools

Compiler construction 2002 week 5

TDDD55 - Compilers and Interpreters Lesson 3

Parsing How parser works?

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Introduction to Compilers and Language Design

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Parsing. COMP 520: Compiler Design (4 credits) Professor Laurie Hendren.

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Introduction to Compiler Design

Compiler Construction

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Lexical and Syntax Analysis. Abstract Syntax

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS143 Handout 12 Summer 2011 July 1 st, 2011 Introduction to bison

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

CS 403: Scanning and Parsing

Compilers. Compiler Construction Tutorial The Front-end

CPS 506 Comparative Programming Languages. Syntax Specification

Compiler construction 2005 lecture 5

Syntax and Grammars 1 / 21

Lexical and Syntax Analysis

COP 3402 Systems Software Syntax Analysis (Parser)

Semantic actions for declarations and expressions

Lesson 10. CDT301 Compiler Theory, Spring 2011 Teacher: Linus Källberg

COMPILER CONSTRUCTION Seminar 02 TDDB44

L L G E N. Generator of syntax analyzier (parser)

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process

Lexical Analysis. Introduction

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Module 8 - Lexical Analyzer Generator. 8.1 Need for a Tool. 8.2 Lexical Analyzer Generator Tool

TDDD55- Compilers and Interpreters Lesson 3

A simple syntax-directed

Lex and Yacc. More Details

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Syntax Directed Translation

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

Automated Tools. The Compilation Task. Automated? Automated? Easier ways to create parsers. The final stages of compilation are language dependant

Syntax-Directed Translation. Introduction

Syntax Analysis The Parser Generator (BYacc/J)

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

1 Lexical Considerations

Sometimes an ambiguous grammar can be rewritten to eliminate the ambiguity.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax. A. Bellaachia Page: 1

A Simple Syntax-Directed Translator

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

Applications of Context-Free Grammars (CFG)

CSC 467 Lecture 3: Regular Expressions

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Lexical Considerations

Context-free grammars

Syntax Analysis Check syntax and construct abstract syntax tree

Lexical and Syntax Analysis. Top-Down Parsing

LECTURE 3. Compiler Phases

Semantic actions for declarations and expressions

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Lecture 14 Sections Mon, Mar 2, 2009

Chapter 2: Syntax Directed Translation and YACC

Semantic actions for declarations and expressions. Monday, September 28, 15

Simple LR (SLR) LR(0) Drawbacks LR(1) SLR Parse. LR(1) Start State and Reduce. LR(1) Items 10/3/2012

Compiler Construction: Parsing

Gechstudentszone.wordpress.com

Topic 5: Syntax Analysis III

COMP 181. Agenda. Midterm topics. Today: type checking. Purpose of types. Type errors. Type checking

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Principles of Programming Languages COMP251: Syntax and Grammars

Transcription:

Lexical and Syntax Analysis (of Programming Languages) Bison, a Parser Generator

Lexical and Syntax Analysis (of Programming Languages) Bison, a Parser Generator

Bison: a parser generator Bison Specification of a parser C function called yyparse() Context-free grammar with a C action for each production. Match the input string and execute the actions of the productions used.

Bison: a parser generator Bison Specification of a parser C function called yyparse() Context-free grammar with a C action for each production. Match the input string and execute the actions of the productions used.

Input to Bison The structure of a Bison (.y) file is as follows. /* Declarations */ /* Grammar rules */ /* C Code (including main function) */ Any text enclosed in /* and */ is treated as a comment.

Input to Bison The structure of a Bison (.y) file is as follows. /* Declarations */ /* Grammar rules */ /* C Code (including main function) */ Any text enclosed in /* and */ is treated as a comment.

Grammar rules Let α be any sequence of terminals and non-terminals. A grammar rule defining nonterminal n is of the form: n : α 1 action 1 α 2 action 2 α n action n ; Each action is a C statement, or a block of C statements of the form { }.

Grammar rules Let α be any sequence of terminals and non-terminals. A grammar rule defining nonterminal n is of the form: n : α 1 action 1 α 2 action 2 α n action n ; Each action is a C statement, or a block of C statements of the form { }.

Example 1 /* No declarations */ expr1.y e : 'x' 'y' '(' e '+' e ')' '(' e '*' e ')' /* No actions */ Terminal /* No main function */ Nonterminal

Example 1 /* No declarations */ expr1.y e : 'x' 'y' '(' e '+' e ')' '(' e '*' e ')' /* No actions */ Terminal /* No main function */ Nonterminal

Output of Bison Bison generates a C function int yyparse() { } Takes as input a stream of tokens. Returns zero if input conforms to grammar, and non-zero otherwise. Calls yylex() to get the next token. Stops when yylex() returns zero. When a grammar rule is used, that rule s action is executed.

Output of Bison Bison generates a C function int yyparse() { } Takes as input a stream of tokens. Returns zero if input conforms to grammar, and non-zero otherwise. Calls yylex() to get the next token. Stops when yylex() returns zero. When a grammar rule is used, that rule s action is executed.

Example 1, revisted /* No declarations */ e expr1.y : 'x' 'y' '(' e '+' e ')' '(' e '*' e ')' int yylex() { char c = getchar(); if (c == '\n') return 0; else return c; } void main() { printf( %i\n, yyparse()); } /* No actions */

Example 1, revisted /* No declarations */ e expr1.y : 'x' 'y' '(' e '+' e ')' '(' e '*' e ')' int yylex() { char c = getchar(); if (c == '\n') return 0; else return c; } void main() { printf( %i\n, yyparse()); } /* No actions */

Running Example 1 At a command prompt '>': > bison -o expr1.c expr1.y > gcc -o expr1 expr1.c -ly > expr1 (x+(y*x)) 0 Input Important! Output (0 means successful parse)

Running Example 1 At a command prompt '>': > bison -o expr1.c expr1.y > gcc -o expr1 expr1.c -ly > expr1 (x+(y*x)) 0 Input Important! Output (0 means successful parse)

Example 2 Terminals can be declared using a %token declaration, for example, to represent arithmetic variables: %token VAR e expr2.y : VAR '(' e '+' e ')' '(' e '*' e ')' /* main() and yylex() */

Example 2 Terminals can be declared using a %token declaration, for example, to represent arithmetic variables: %token VAR e expr2.y : VAR '(' e '+' e ')' '(' e '*' e ')' /* main() and yylex() */

Example 2 (continued) int yylex() { int c = getchar(); expr2.y /* Ignore white space */ while (c == ' ') c = getchar(); } if (c == '\n') return 0; if (c >= 'a' && c <= 'z') return VAR; return c; Return a VAR token void main() { printf( %i\n, yyparse()); }

Example 2 (continued) int yylex() { int c = getchar(); expr2.y /* Ignore white space */ while (c == ' ') c = getchar(); } if (c == '\n') return 0; if (c >= 'a' && c <= 'z') return VAR; return c; Return a VAR token void main() { printf( %i\n, yyparse()); }

Example 2 (continued) Alternatively, the yylex() function can be generated by Flex. %{ #include "expr2.h" %} " " /* Ignore spaces */ \n return 0; [a-z] return VAR;. return yytext[0]; expr2.lex Generated by Bison

Example 2 (continued) Alternatively, the yylex() function can be generated by Flex. %{ #include "expr2.h" %} " " /* Ignore spaces */ \n return 0; [a-z] return VAR;. return yytext[0]; expr2.lex Generated by Bison

Running Example 2 At a command prompt '>': > bison --defines -o expr2.c expr2.y > flex -o expr2lex.c expr2.lex Generate expr2.h > gcc -o expr2 expr2.c expr2lex.c ly -lfl > expr2 (a + ( b * c )) 0 Parser Lexer Output (0 means successful parse)

Running Example 2 At a command prompt '>': > bison --defines -o expr2.c expr2.y > flex -o expr2lex.c expr2.lex Generate expr2.h > gcc -o expr2 expr2.c expr2lex.c ly -lfl > expr2 (a + ( b * c )) 0 Parser Lexer Output (0 means successful parse)

Example 3 Adding numeric literals: %token VAR %token NUM e expr3.y : VAR NUM '(' e '+' e ')' '(' e '*' e ')' void main() { printf( %i\n, yyparse()); } Numeric Literal

Example 3 Adding numeric literals: %token VAR %token NUM e expr3.y : VAR NUM '(' e '+' e ')' '(' e '*' e ')' void main() { printf( %i\n, yyparse()); } Numeric Literal

Example 3 (continued) Adding numeric literals: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] return VAR; [0-9]+ return NUM;. return yytext[0]; expr3.lex Numeric Literal

Example 3 (continued) Adding numeric literals: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] return VAR; [0-9]+ return NUM;. return yytext[0]; expr3.lex Numeric Literal

Semantic values of tokens A token can have a semantic value associated with it. A NUM token contains an integer. A VAR token contains a variable name. Semantic values are returned via the yylval global variable.

Semantic values of tokens A token can have a semantic value associated with it. A NUM token contains an integer. A VAR token contains a variable name. Semantic values are returned via the yylval global variable.

Example 3 (revisited) Returning values via yylval: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] { yylval = yytext[0]; return VAR; } [0-9]+ { yylval = atoi(yytext); return NUM; }. return yytext[0]; expr3.lex

Example 3 (revisited) Returning values via yylval: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] { yylval = yytext[0]; return VAR; } [0-9]+ { yylval = atoi(yytext); return NUM; }. return yytext[0]; expr3.lex

Type of yylval Problem: different tokens may have semantic values of different types. So what is type of yylval? Solution: a union type, which can be specified using the %union declaration, e.g. %union{ char var; int num; } yylval is either a char or an int

Type of yylval Problem: different tokens may have semantic values of different types. So what is type of yylval? Solution: a union type, which can be specified using the %union declaration, e.g. %union{ char var; int num; } yylval is either a char or an int

Example 3 (revisted) Returning values via yylval: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] { yylval.var = yytext[0]; return VAR; } [0-9]+ { yylval.num = atoi(yytext); return NUM; }. return yytext[0]; expr3.lex

Example 3 (revisted) Returning values via yylval: %{ #include "expr3.h" %} " " /* Ignore spaces */ \n return 0; [a-z] { yylval.var = yytext[0]; return VAR; } [0-9]+ { yylval.num = atoi(yytext); return NUM; }. return yytext[0]; expr3.lex

Tokens have types The type of token s semantic value can be specified in a %token declaration. %union{ char var; int num; } %token <var> VAR; %token <num> NUM;

Tokens have types The type of token s semantic value can be specified in a %token declaration. %union{ char var; int num; } %token <var> VAR; %token <num> NUM;

Semantic values of non-terminals A non-terminal can also have a semantic value associated with it. In the action of a grammar rule: $n refers to the semantic value of the n th symbol in the rule; $$ refers to the semantic value of the result of the rule. The type can be specified in a %type declaration, e.g. %type <num> e;

Semantic values of non-terminals A non-terminal can also have a semantic value associated with it. In the action of a grammar rule: $n refers to the semantic value of the n th symbol in the rule; $$ refers to the semantic value of the result of the rule. The type can be specified in a %type declaration, e.g. %type <num> e;

Example 4 %{ int env[256]; /* Variable environment */ %} %union{ int num; char var; } %token <num> NUM %token <var> VAR %type <num> e s : e { printf( %i\n, $1); } e : VAR { $$ = env[$1]; } NUM { $$ = $1; } '(' e '+' e ')' { $$ = $2 + $4; } '(' e '*' e ')' { $$ = $2 * $4; } expr4.y void main() { env['x'] = 100; yyparse(); }

Example 4 %{ int env[256]; /* Variable environment */ %} %union{ int num; char var; } %token <num> NUM %token <var> VAR %type <num> e s : e { printf( %i\n, $1); } e : VAR { $$ = env[$1]; } NUM { $$ = $1; } '(' e '+' e ')' { $$ = $2 + $4; } '(' e '*' e ')' { $$ = $2 * $4; } expr4.y void main() { env['x'] = 100; yyparse(); }

Exercise 1 Consider the following abstract syntax. typedef enum { Add, Mul } Op; struct expr { enum { Var, Num, App } tag; union { char var; int num; struct { struct expr* e1; Op op; struct expr* e2; } app; }; }; typedef struct expr Expr; Modify Example 4 so that yyparse() constructs an abstract syntax tree.

Exercise 1 Consider the following abstract syntax. typedef enum { Add, Mul } Op; struct expr { enum { Var, Num, App } tag; union { char var; int num; struct { struct expr* e1; Op op; struct expr* e2; } app; }; }; typedef struct expr Expr; Modify Example 4 so that yyparse() constructs an abstract syntax tree.

Precedence and associativity The associativity of an operator can be specified using a %left, %right, or %nonassoc directive. %left '+ %left '*' %right '&' %nonassoc '=' Operators specified in increasing order of precedence, e.g. '*' has higher precedence than '+'.

Precedence and associativity The associativity of an operator can be specified using a %left, %right, or %nonassoc directive. %left '+ %left '*' %right '&' %nonassoc '=' Operators specified in increasing order of precedence, e.g. '*' has higher precedence than '+'.

%token VAR %token NUM %left '+' %left '*' e Example 5 : VAR NUM e '+' e e '*' e ( e ) expr5.y void main() { printf( %i\n, yyparse()); }

%token VAR %token NUM %left '+' %left '*' e Example 5 : VAR NUM e '+' e e '*' e ( e ) expr5.y void main() { printf( %i\n, yyparse()); }

Conflicts Sometimes Bison cannot deduce that a grammar is unambiguous, even if it is *. In such cases, Bison will report: a shift-reduce conflict; or a reduce-reduce conflict. * Not surprising: ambiguity detection is undecidable in general!

Conflicts Sometimes Bison cannot deduce that a grammar is unambiguous, even if it is *. In such cases, Bison will report: a shift-reduce conflict; or a reduce-reduce conflict. * Not surprising: ambiguity detection is undecidable in general!

Shift-Reduce Conflicts Bison does not know whether to consume more tokens (shift) or to match a production (reduce), e.g. stmt : IF expr THEN stmt IF expr THEN stmt ELSE stmt Bison defaults to shift.

Shift-Reduce Conflicts Bison does not know whether to consume more tokens (shift) or to match a production (reduce), e.g. stmt : IF expr THEN stmt IF expr THEN stmt ELSE stmt Bison defaults to shift.

Reduce-Reduce Conflicts Bison does not know which production to choose, e.g. expr : functioncall arraylookup ID functioncall : ID '(' ID ')' arraylookup : ID '(' expr ')' Bison defaults to using the first matching rule in the file.

Reduce-Reduce Conflicts Bison does not know which production to choose, e.g. expr : functioncall arraylookup ID functioncall : ID '(' ID ')' arraylookup : ID '(' expr ')' Bison defaults to using the first matching rule in the file.

Variants of Bison There are Bison variants available for many languages: Language Tool Java JavaCC, CUP Haskell Happy Python PLY C# Grammatica * ANTLR

Variants of Bison There are Bison variants available for many languages: Language Tool Java JavaCC, CUP Haskell Happy Python PLY C# Grammatica * ANTLR

Summary Bison converts a context-free grammar to a parsing function called yyparse(). yyparse() calls yylex() to obtain next token, so easy to connect Flex and Bison.

Summary Bison converts a context-free grammar to a parsing function called yyparse(). yyparse() calls yylex() to obtain next token, so easy to connect Flex and Bison.

Summary Each grammar production may have an action. Terminals and non-terminals have semantic values. Easy to construct abstract syntax trees inside actions using semantic values. Gives a declarative (high level) way to define parsers.

Summary Each grammar production may have an action. Terminals and non-terminals have semantic values. Easy to construct abstract syntax trees inside actions using semantic values. Gives a declarative (high level) way to define parsers.

Announcement Summer internship in PLASMA group: Compiling Haskell to Microblaze. Salary: 250 pounds per week. Part of the Reduceron project. Ask if interested. More details during Monday s practicals.