Compiler construction in4020 course 2006

Similar documents
Compiler construction lecture 1

2 SECTION 1.1 Why study compiler construction?

Compiler construction 2002 week 2

IN4305 Engineering project Compiler construction

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Compilation

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Lexical Analysis. Introduction

CSE302: Compiler Design

Chapter 3 Lexical Analysis

Undergraduate Compilers in a Day

CSc 453 Lexical Analysis (Scanning)

Compiler course. Chapter 3 Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Compiler construction in4303 answers

Lexical Analysis (ASU Ch 3, Fig 3.1)

Formal Languages and Compilers Lecture VI: Lexical Analysis

A simple syntax-directed

Compiling Regular Expressions COMP360

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

CS 415 Midterm Exam Spring SOLUTION

CPS 506 Comparative Programming Languages. Syntax Specification

A Simple Syntax-Directed Translator

COMPILER DESIGN LECTURE NOTES

LECTURE 3. Compiler Phases

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Chapter 3: Lexical Analysis

Overview of Compiler. A. Introduction

Week 2: Syntax Specification, Grammars

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Compiler construction 2002 week 5

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler construction lecture 3

Compiler construction in4303 lecture 3

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Compiler Design (40-414)

Problem Score Max Score 1 Syntax directed translation & type

Compiler construction 2005 lecture 5

Typescript on LLVM Language Reference Manual

Compiler construction assignments

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Programming in C++ 4. The lexical basis of C++

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

Project 1: Scheme Pretty-Printer

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Left to right design 1

CS 4201 Compilers 2014/2015 Handout: Lab 1

Semantic Analysis. Compiler Architecture

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

CSC 467 Lecture 3: Regular Expressions

Lexical Analysis 1 / 52

COSE312: Compilers. Lecture 1 Overview of Compilers

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Decaf Language Reference Manual

Compiler Construction D7011E

Compiler phases. Non-tokens

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

COMPILER DESIGN. For COMPUTER SCIENCE

INF5110 Compiler Construction

C Overview Fall 2014 Jinkyu Jeong

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compilers CS S-01 Compiler Basics & Lexical Analysis

Parsing and Pattern Recognition

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Lexical Scanning COMP360

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Part II : Lexical Analysis

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Syntax and Grammars 1 / 21

Lexical and Syntactic Analysis

5. Syntax-Directed Definitions & Type Analysis

Lexical and Syntax Analysis


2.2 Syntax Definition

CS202 Compiler Construction. Christian Skalka. Course prerequisites. Solid programming skills a must.

CSE 401 Midterm Exam Sample Solution 11/4/11

Character Set. The character set of C represents alphabet, digit or any symbol used to represent information. Digits 0, 1, 2, 3, 9

CS S-01 Compiler Basics & Lexical Analysis 1

DEPARTMENT OF MATHS, MJ COLLEGE

CSc 453 Compilers and Systems Software

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Compilers CS S-01 Compiler Basics & Lexical Analysis

Creating a C++ Program

Transcription:

Compiler construction in4020 course 2006 Koen Langendoen Delft University of Technology The Netherlands

Goals understand the structure of a compiler understand how the components operate understand the tools involved practice makes perfect understanding means [theory] be able to read source code [practice] be able to adapt/write source code

Format: werkcollege + lab work input 14 interactive lectures 40 hrs book Modern Compiler Design schedule: see blackboard handouts: see blackboard 2 assignments modify reference compiler written exam 80 hrs 40 hrs 11 pts output 70 pts 30 pts

Practical work computer science computer engineering front-end: language back-end: code generation Wednesday+Thursday 8:45 12:45 lab rooms Drebbelweg DW_1-010, DW_1-170 instructors: Bas, Leon, Peterpaul, Gertjan starting September 20 th

Collaboration division of assignments is NOT allowed learning by doing unbalanced assignments copying/modifying code from other groups is NOT allowed similarity checking software (is effective )

What is a compiler? program in some source language front-end analysis semantic representation back-end synthesis compiler executable code for target machine

Why study compilerconstruction? curiosity better understanding of programming language concepts wide applicability transforming data is very common many useful data structures and algorithms practical application of theory

Compiler structure program in some source language front-end analysis back-end synthesis executable code for target machine program in some source language front-end analysis semantic representation L+M modules = LxM compilers back-end synthesis compiler back-end synthesis executable code for target machine executable code for target machine

Limitations of modular approach performance generic vs specific loss of information program in some source language program in some source language front-end analysis front-end analysis semantic representation back-end synthesis back-end synthesis compiler executable code for target machine executable code for target machine variations must be small same programming paradigm similar processor architecture back-end synthesis executable code for target machine

Semantic representation program in some source language front-end analysis semantic representation back-end synthesis compiler executable code for target machine heart of the compiler intermediate code linked lists of pseudo instructions abstract syntax tree (AST)

AST example expression grammar expression expression + term expression - term term term term * factor term / factor factor factor identifier constant ( expression ) example expression b*b 4*a*c

parse tree: b*b 4*a*c expression expression - term term term * factor term * factor term * factor identifier factor identifier factor identifier c identifier b constant a b 4

AST: b*b 4*a*c - * * b b * c 4 a

annotated AST: b*b 4*a*c - type: real loc: reg1 * type: real loc: reg1 * type: real loc: reg2 b type: real loc: sp+16 b type: real loc: sp+16 * type: real loc: reg2 c type: real loc: sp+24 identifier constant term expression 4 type: real loc: const a type: real loc: sp+8

AST exercise (5 min.) expression grammar expression expression + term expression - term term term term * factor term / factor factor factor identifier constant ( expression ) example expression b*b (4*a*c) draw parse tree and AST

Answers

front-end: from program text to AST program text token description language grammar scanner generator parser generator lexical analysis tokens syntax analysis AST context handling annotated AST

Lexical analysis covert stream of characters to stream of tokens what is a token? sequence of characters with a semantic notion, see language definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - 0 ; digit = *ptr+ + - 0 ;

Tokens attributes type lexeme value file position typedef struct { int class; char *repr; file_pos position; } Token_Type; examples type IDENTIFIER NUMBER REAL IF lexeme foo, t3, ptr 15, 082, 666 1.2,.002, 1e6 if

Non-tokens white spaces spaces, tabs, newlines comments /* a C-style comment */ // a C++ comment preprocessor directives #include lex.h #define is_digit(d) ( 0 <= (d) && (d) <= 9 )

Regular expressions Basic patterns x. [abca-z] Repetition operators R? R* R+ Composition operators Matching the character x any character, usually except a newline any of the characters a,b,c and the range A-Z an R or nothing (= optionally an R) zero or more occurrences of R one or more occurrences of R R 1 R 2 an R 1 followed by an R 2 R 1 R 2 either an R 1 or an R 2 Grouping ( R ) R itself

Examples of regular expressions an integer is a sequence of digits: [0-9]+ an identifier is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]*

Regular descriptions structuring regular expressions by introducing named sub expressions letter [a-za-z] digit [0-9] letter_or_digit letter digit identifier letter letter_or_digit* define before use

Exercise (5 min.) write down regular descriptions for the following descriptions: an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). a fixed-point number is an (optional) sequence of digits followed by a dot (. ) followed by a sequence of digits. an identifier is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character.

Answers

Lexical analysis covert stream of characters to stream of tokens tokens are defined by a regular description tokens are demanded one-by-one by the syntax analyzer get_next_token() program text lexical analyzer tokens syntax analyzer AST

interface extern Token_Type Token; /* Global variable that holds the current token. */ void start_lex(void); /* Must be called before the first call to * get_next_token(). */ void get_next_token(void); /* Load the next token into the global * variable Token. */

lexical analysis by hand read complete program text into memory for simplicity avoids buffering and arbitrary limits variable length tokens get_next_token() dispatches on the next character dot input: main() { printf( hello world\n );}

void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: */ note_token_position(); /* split on first character of the token */ start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; } if (is_letter(input_char)) {recognize_identifier();} else if (is_digit(input_char)) {recognize_integer();} else if (is_operator(input_char) is_separator(input_char)) { Token.class = input_char; next_char(); } else {Token.class = ERRONEOUS; next_char();} } Token.repr = input_to_zstring(start_dot, dot-start_dot);

Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-*/", (ch))!= NULL) #define is_separator(ch) (strchr(";,(){}", (ch))!= NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char();} }