COMPILER CONSTRUCTION Seminar 02 TDDB44

Similar documents
COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

TDDD55 - Compilers and Interpreters Lesson 3

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

TDDD55- Compilers and Interpreters Lesson 3

COMPILER CONSTRUCTION Seminar 03 TDDB

Yacc: A Syntactic Analysers Generator

COMPILER CONSTRUCTION Seminar 01 TDDB

Syntax Analysis Part IV

Using an LALR(1) Parser Generator

Context-free grammars

CSE302: Compiler Design

Compiler Construction: Parsing

PRACTICAL CLASS: Flex & Bison

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

LECTURE 11. Semantic Analysis and Yacc

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Yacc Yet Another Compiler Compiler

Compiler Lab. Introduction to tools Lex and Yacc

Syntax Analysis The Parser Generator (BYacc/J)

Syntax-Directed Translation

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Syntax-Directed Translation. Lecture 14

Compiler Construction

flex is not a bad tool to use for doing modest text transformations and for programs that collect statistics on input.

Parsing How parser works?

Compiler Construction

LECTURE 7. Lex and Intro to Parsing

TDDD55- Compilers and Interpreters Lesson 2

Programming Project II

Lex & Yacc (GNU distribution - flex & bison) Jeonghwan Park

Preparing for the ACW Languages & Compilers

Lexical and Syntax Analysis

Automatic Scanning and Parsing using LEX and YACC

Lex & Yacc. By H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Compiler construction in4020 lecture 5

A Bison Manual. You build a text file of the production (format in the next section); traditionally this file ends in.y, although bison doesn t care.

Compil M1 : Front-End

Semantic Analysis. Lecture 9. February 7, 2018

Principles of Programming Languages

UNIT III & IV. Bottom up parsing

Introduction to Lex & Yacc. (flex & bison)

Compiler Design 1. Yacc/Bison. Goutam Biswas. Lect 8

An Introduction to LEX and YACC. SYSC Programming Languages

Compiler construction 2002 week 5

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Lecture Outline. COMP-421 Compiler Design. What is Lex? Lex Specification. ! Lexical Analyzer Lex. ! Lex Examples. Presented by Dr Ioanna Dionysiou

Lex & Yacc. by H. Altay Güvenir. A compiler or an interpreter performs its task in 3 stages:

Lexical and Parser Tools

CSCI Compiler Design

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Compiler Design (40-414)

Decaf PP2: Syntax Analysis

LECTURE 3. Compiler Phases

Lexical and Syntax Analysis

As we have seen, token attribute values are supplied via yylval, as in. More on Yacc s value stack

Software II: Principles of Programming Languages

THE COMPILATION PROCESS EXAMPLE OF TOKENS AND ATTRIBUTES

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

The Structure of a Syntax-Directed Compiler

Part III : Parsing. From Regular to Context-Free Grammars. Deriving a Parser from a Context-Free Grammar. Scanners and Parsers.

UNIVERSITY OF CALIFORNIA

CS 403: Scanning and Parsing

Properties of Regular Expressions and Finite Automata

Introduction to Compiler Design

A Simple Syntax-Directed Translator

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Big Picture: Compilation Process. CSCI: 4500/6500 Programming Languages. Big Picture: Compilation Process. Big Picture: Compilation Process.

What do Compilers Produce?

Syntax Analysis. Chapter 4

CS 426 Fall Machine Problem 1. Machine Problem 1. CS 426 Compiler Construction Fall Semester 2017

Introduction to Yacc. General Description Input file Output files Parsing conflicts Pseudovariables Examples. Principles of Compilers - 16/03/2006

Programming Language Syntax and Analysis

Compiler construction 2005 lecture 5

Semantic actions for declarations and expressions

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

EXPERIMENT NO : M/C Lenovo Think center M700 Ci3,6100,6th Gen. H81, 4GB RAM,500GB HDD

Semantic actions for declarations and expressions. Monday, September 28, 15

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

CS606- compiler instruction Solved MCQS From Midterm Papers

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

LR Parsing. Leftmost and Rightmost Derivations. Compiler Design CSE 504. Derivations for id + id: T id = id+id. 1 Shift-Reduce Parsing.

COMP 181. Prelude. Prelude. Summary of parsing. A Hierarchy of Grammar Classes. More power? Syntax-directed translation. Analysis

Lab 2. Lexing and Parsing with Flex and Bison - 2 labs

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Compiler Construction Assignment 3 Spring 2018

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

CSE 130 Programming Language Principles & Paradigms Lecture # 5. Chapter 4 Lexical and Syntax Analysis

Syn S t yn a t x a Ana x lysi y s si 1

COMPILER DESIGN. For COMPUTER SCIENCE

A programming language requires two major definitions A simple one pass compiler

Compilers and computer architecture From strings to ASTs (2): context free grammars

(F)lex & Bison/Yacc. Language Tools for C/C++ CS 550 Programming Languages. Alexander Gutierrez

Compiler Design Overview. Compiler Design 1

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

Chapter 2: Syntax Directed Translation and YACC

Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CST-402(T): Language Processors

CS143 Handout 04 Summer 2011 June 22, 2011 flex In A Nutshell

Transcription:

COMPILER CONSTRUCTION Seminar 02 TDDB44 Martin Sjölund (martin.sjolund@liu.se) Adrian Horga (adrian.horga@liu.se) Department of Computer and Information Science Linköping University

LABS Lab 3 LR parsing and abstract syntax tree construction using bison Lab 4 Semantic analysis (type checking)

PHASES OF A COMPILER Source Program Lab 2 Symtab administrates the symbol table Symbol Table Manager Error Handler Lexical Analysis Syntax Analyser Semantic Analyzer Intermediate Code Generator Lab 1 Scanner manages lexical analysis Lab 3 Parser manages syntactic analysis, build internal form Lab 4 Semantics checks static semantics Lab 6 Quads generates quadruples from the internal form Code Optimizer Code Generator Target Program Lab 5 Optimizer optimizes the internal form Lab 7 Codegen expands quadruples to assembler

LAB 3 PARSING

SYNTAX ANALYSIS The parser accepts tokens from the scanner and verifies the syntactic correctness of the program specification Along the way, it also derives information about the program and builds a fundamental data structure known as [abstract] syntax tree The syntax tree is an internal representation of the program and augments the symbol table The parse tree is a concrete syntax tree and is not produced by the parser

PURPOSE To verify the syntactic correctness of the input token stream, reporting any errors and to produce a syntax tree and certain tables for use by later phases Syntactic correctness is judged by verification against a formal grammar which specifies the language to be recognized Error messages are important and should be as meaningful as possible Parse tree and tables will vary depending on compiler

METHOD Match token stream using manually or automatically generated parser

PARSING STRATEGIES Two categories of parsers: Top-down parsers Bottom-up parsers Within each of these broad categories are a number of sub strategies depending on whether leftmost or rightmost derivations are used

TOP-DOWN PARSING Start with a goal symbol and recognize it in terms of its constituent symbols Example: recognize a procedure in terms of its sub-components (header, declarations, and body) The parse tree is then built from the top (root) and down (leaves), hence the name

TOP-DOWN PARSING (cont'd) X := ( a + b ) * c; := x * + c a b

BOTTOM-UP PARSING Recognize the components of a program and then combine them to form more complex constructs until a whole program is recognized Example: recognize a procedure from its subcomponents (header, declarations, and body) The parse tree is then built from the bottom and up, hence the name

BOTTOM-UP PARSING (cont'd) X := ( a + b ) * c; := x * + c a b

PARSING TECHNIQUES A number of different parsing techniques are commonly used for syntax analysis, including: Recursive-descent parsing LR parsing Operator precedence parsing Many more

LR PARSING A specific bottom-up technique LR stands for Left->right scan, Rightmost derivation Probably the most common & popular parsing technique yacc, bison, and many other parser generation tools utilize LR parsing Great for machines, not so cool for humans

+ AND FOR LR Advantages of LR: Accept a wide range of grammars/languages Well suited for automatic parser generation Very fast Generally easy to maintain Disadvantages of LR: Error handling can be tricky Difficult to use manually

bison AND yacc USAGE bison is a general-purpose parser generator that converts a grammar description for an LALR(1) context-free grammar into a C program to parse that grammar

bison AND yacc USAGE One of many parser generator packages Yet Another Compiler Compiler Really a poor name, is more of a parser compiler Can specify actions to be performed when each construct is recognized and thereby make a full fledged compiler but its the user of bison that specify the rest of the compilation process Designed to work with flex or other automatically or hand generated lexers

bison USAGE Bison source program parser.y Bison Compiler y.tab.c y.tab.c C Compiler a.out Token stream a.out Parse tree

bison SPECIFICATION A bison specification is composed of 4 parts %{ /* C declarations */ %} /* Bison declarations */ %% %% /* Grammar rules */ /* Additional C code */ Looks like flex specification, doesn t it? Similar function, tools, look and feel

C DECLARATIONS Contains macro definitions and declarations of functions and variables that are used in the actions in the grammar rules Copied to the beginning of the parser file so that they precede the definition of yyparse Use #include to get the declarations from a header file. If C declarations isn t needed, then the %{ and %} delimiters that bracket this section can be omitted

bison DECLARATIONS Contains declarations that define terminal and non-terminal symbols, and specify precedence

GRAMMAR RULES Contains one or more bison grammar rules, and nothing else There must always be at least one grammar rule, and the first %% (which precedes the grammar rules) may never be omitted even if it is the first thing in the file

ADITIONAL C CODE Copied verbatim to the end of the parser file, just as the C declarations section is copied to the beginning This is the most convenient place to put anything that should be in the parser file but isn t need before the definition of yyparse The definitions of yylex and yyerror often go here

bison EXAMPLE %{ #include <ctype.h> /* standard C declarations here */ }% %token DIGIT /* bison declarations */ %% /* Grammar rules */ line : expr \n { printf { %d\n, $1 }; } ; expr : expr + term { $$ = $1 + $3; } term ; term : term * factor { $$ = $1 * $3; } factor ;

bison EXAMPLE factor : ( expr ) { $$ = $2; } DIGIT ; %% /* Additional C code */ void yylex () { /* A really simple lexical analyzer */ int c; c = getchar (); if ( isdigit (c) ) { yylval = c - 0 ; return DIGIT; } return c; } Note: bison uses yylex, yylval, etc - designed to be used with flex

bison EXAMPLE expr ::= term ::= expr + term ::= expr + term + term ::= term +... + term + term + term term ::= factor ::= term * factor ::= term * factor * factor ::= factor *... * factor * factor * factor factor ::= DIGIT ::= ( expr ) ::= ( term + term +... + term ) ::= ( factor *... factor + term +... term ) ::=... DIGIT ::= [0-9]

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n'

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')'

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= term term ::= factor factor ::= DIGIT

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= term term ::= term '*' factor

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= term term ::= term '*' factor factor ::= DIGIT

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= expr '+' term term ::= term '*' factor

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= expr '+' term term ::= factor factor ::= DIGIT

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= factor factor ::= '(' expr ')' expr ::= expr '+' term term ::= factor

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= term '*' factor factor ::= '(' expr ')'

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= term '*' factor factor ::= DIGIT

bison EXAMPLE '(' '1' '*' '3' '+' '2' ')' '*' '5' '\n' line ::= expr '\n' expr ::= term term ::= term '*' factor

USING bison WITH flex bison and flex are obviously designed to work together bison produces a driver program called yylex() (actually its included in the lex library -ll) #include lex.yy.c in the third part of bison specification this gives the program yylex access to bisons token names

USING BISON WITH FLEX Thus do the following: % flex scanner.l % bison parser.y % cc y.tab.c -ly -ll This will produce an a.out which is a parser with an integrated scanner included

ERROR HANDLING IN bison Error handling in bison is provided by error productions An error production has the general form non-terminal: error synchronizing-set non-terminal where did it occur error a keyword synchronizing-set possible empty subset of tokens When an error occurs, bison pops symbols off the stack until it finds a state for which there exists an error production which may be applied

FILES TO BE CHANGED parser.y is the input file to bison. This is the only file you will do most of editing in. scanner.l need a small, but important change. The file scanner.hh is no longer needed since there is a file parser.hh, which will contain (among other things) the same declarations. parser.hh will be generated automatically by bison. Add (in this order): #include "ast.h" #include "parser.hh" and comment out #include "scanner.hh" at the top of scanner.l to reflect this.

OTHER FILES OF INTEREST error.h, error.cc, symtab.hh, symbol.cc, symtab.cc Use your completed versions from the earlier labs. ast.hh contains the definitions for the AST nodes. You ll be reading this file a lot. ast.cc contains the implementations of the AST nodes. semantic.hh and semantic.cc contain type checking code. optimize.hh and optimize.cc contain optimization code. quads.hh and quads.cc contain quad generation code. codegen.hh and codegen.cc contain assembler generation code.

OTHER FILES OF INTEREST main.cc this is the compiler wrapper, parsing flags and the like. Makefile this is not the same as the last labs. It generates a file called compiler which will take various arguments (see main.cc for information). It also takes source files as arguments, so you can start using diesel files to test your compiler-in-the-making. diesel this is a shell script which works as a wrapper around the binary compiler file, handling flags, linking, and such things. Use it when you want to compile a diesel file. At the top of this file is a list of all flags you can send to the compiler, for debugging, printouts, symbolic compilation and the like.

LAB 4 SEMANTICS

PURPOSE To verify the semantic correctness of the program represented by the parse tree, reporting any errors, possibly, to produce an intermediate form and certain tables for use by later compiler phases Semantic correctness the program adheres to the rules of the type system defined for the language (plus some other rules ) Error messages should be as meaningful as possible In this phase, there is sufficient information to be able to generate a number of tables of semantic information identifier, type and literal tables

METHOD Ad-hoc confirmation of semantic rules

IMPLEMENTATION Semantic analyzer implementations are typically syntax directed More formally, such techniques are based on attribute grammars In practice, the evaluation of the attributes is done manually

MATHEMATICAL CHECKS Divide by zero Zero must be compile-time determinable constant zero, or an expression which symbolically evaluates to zero at runtime Overflow Constant which exceeds representation of target machine language arithmetic which obviously leads to overflow Underflow Same as for overflow

UNIQUENESS TESTS In certain situations it is important that particular constructs occur only once Declarations within any given scope, each identifier must be declared only once Case statements each case constant must occur only once in the switch

TYPE CONSISTENCY Some times it is also necessary to ensure that a symbol that occurs in one place occurs in others as well. Such consistency checks are required whenever matching is required and what must be matched is not specified explicitly (i.e as a terminal string) in the grammar This means that the check cannot be done by the parser

TYPE CHECKS These checks form the bulk of semantic checking and certainly account for the majority of the overhead of this phase of compilation In general the types across any given operator must be compatible The meaning of compatible may be: the same two different sizes of the same basic type

TYPE CHECKS Must execute the same steps as for expression evaluation Effectively we are executing the expression at compile time for type information only This is a bottom-up procedure in the parse tree We know the type of things at the leaves of a parse tree corresponding to an expression (associated types stored in literal table for literals and symbol table for identifiers) When we encounter a parse tree node corresponding to some operator if the operand sub-trees are leaves we know their type and can check that the types are valid for the given operator.

FILES TO BE CHANGED semantic.hh and semantic.cc contains type checking code implementation for the AST nodes as well as the declaration and implementation of the semantic class. These are the files you're going to edit in this lab. They deal with type checking, type synthesizing, and parameter checking.

OTHER FILES OF INTEREST All these files are the same as in lab 3: parser.y is the input file to bison. This is the file you edited in the last lab, and all you should need to do now is uncomment a couple of calls to: do_typecheck(). ast.hh contains the definitions for the AST nodes. ast.cc contains (part of) the implementations of the AST nodes. optimize.hh and optimize.cc contains optimizing code. quads.hh and quads.cc contains quad generation code. codegen.hh and codegen.cc contains assembler generation code.

OTHER FILES OF INTEREST error.hh, error.cc, symtab.hh, symbol.cc, symtab.cc, scanner.l use your versions from the earlier labs. main.cc this is the compiler wrapper, parsing flags and the like. Makefile and diesel use the same files as in the last lab.