Compiler construction lecture 1

Similar documents
Compiler construction in4020 course 2006

2 SECTION 1.1 Why study compiler construction?

Compiler construction 2002 week 2

IN4305 Engineering project Compiler construction

Lexical Analysis. Textbook:Modern Compiler Design Chapter 2.1

Lexical Analysis. Introduction

Chapter 3 Lexical Analysis

Compiler course. Chapter 3 Lexical Analysis

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Formal Languages and Compilers Lecture VI: Lexical Analysis

Compiler construction in4303 answers

CSc 453 Lexical Analysis (Scanning)

Examples of attributes: values of evaluated subtrees, type information, source file coordinates,

Lexical Analysis (ASU Ch 3, Fig 3.1)

Compilation

RYERSON POLYTECHNIC UNIVERSITY DEPARTMENT OF MATH, PHYSICS, AND COMPUTER SCIENCE CPS 710 FINAL EXAM FALL 96 INSTRUCTIONS

Computer Science Department Carlos III University of Madrid Leganés (Spain) David Griol Barres

A simple syntax-directed

Week 2: Syntax Specification, Grammars

Compiling Regular Expressions COMP360

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

CS 415 Midterm Exam Spring SOLUTION

CPS 506 Comparative Programming Languages. Syntax Specification

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

10/5/17. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntax Analysis

Chapter 3: Lexical Analysis

David Griol Barres Computer Science Department Carlos III University of Madrid Leganés (Spain)

CS321 Languages and Compiler Design I. Winter 2012 Lecture 4

Compiler construction 2002 week 5

Compiler construction lecture 3

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

COMPILER DESIGN LECTURE NOTES

Undergraduate Compilers in a Day

The Front End. The purpose of the front end is to deal with the input language. Perform a membership test: code source language?

Compiler construction in4303 lecture 3

CSE302: Compiler Design

Lexical Analysis - An Introduction. Lecture 4 Spring 2005 Department of Computer Science University of Alabama Joel Jones

Project 1: Scheme Pretty-Printer

Compiler construction 2005 lecture 5

Jim Lambers ENERGY 211 / CME 211 Autumn Quarter Programming Project 2

Compiler construction assignments

Outline. 1 Scanning Tokens. 2 Regular Expresssions. 3 Finite State Automata

Left to right design 1

Typescript on LLVM Language Reference Manual

Lexical Analysis 1 / 52

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Programming in C++ 4. The lexical basis of C++

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Compiler Construction D7011E

A Pascal program. Input from the file is read to a buffer program buffer. program xyz(input, output) --- begin A := B + C * 2 end.

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

MATVEC: MATRIX-VECTOR COMPUTATION LANGUAGE REFERENCE MANUAL. John C. Murphy jcm2105 Programming Languages and Translators Professor Stephen Edwards

A Simple Syntax-Directed Translator

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Lexical Analysis. Chapter 1, Section Chapter 3, Section 3.1, 3.3, 3.4, 3.5 JFlex Manual

Problem Score Max Score 1 Syntax directed translation & type

Lexical and Syntax Analysis

CSE 401 Midterm Exam Sample Solution 11/4/11

Lexical and Syntactic Analysis

Overview of Compiler. A. Introduction

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Part II : Lexical Analysis

Binghamton University. CS-211 Fall Syntax. What the Compiler needs to understand your program

Semantic Analysis. Compiler Architecture

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CSC 467 Lecture 3: Regular Expressions

LECTURE 3. Compiler Phases

Decaf Language Reference Manual

Compiler phases. Non-tokens

Features of C. Portable Procedural / Modular Structured Language Statically typed Middle level language

Building lexical and syntactic analyzers. Chapter 3. Syntactic sugar causes cancer of the semicolon. A. Perlis. Chomsky Hierarchy

Announcements! P1 part 1 due next Tuesday P1 part 2 due next Friday

C Overview Fall 2014 Jinkyu Jeong

CSc 453 Compilers and Systems Software

Compiler Design (40-414)

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Compilers CS S-01 Compiler Basics & Lexical Analysis

Parsing and Pattern Recognition

A Scanner should create a token stream from the source code. Here are 4 ways to do this:

Lexical Scanning COMP360

Compiler Lab. Introduction to tools Lex and Yacc

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

Lexical Analysis. COMP 524, Spring 2014 Bryan Ward

Lexing, Parsing. Laure Gonnord sept Master 1, ENS de Lyon

2.2 Syntax Definition

Character Set. The character set of C represents alphabet, digit or any symbol used to represent information. Digits 0, 1, 2, 3, 9

Chapter 3. Describing Syntax and Semantics ISBN

CS202 Compiler Construction. Christian Skalka. Course prerequisites. Solid programming skills a must.

DEPARTMENT OF MATHS, MJ COLLEGE

CS S-01 Compiler Basics & Lexical Analysis 1

COMP455: COMPILER AND LANGUAGE DESIGN. Dr. Alaa Aljanaby University of Nizwa Spring 2013

Problem Solving With C++ Ninth Edition

Laboratory 2: Programming Basics and Variables. Lecture notes: 1. A quick review of hello_comment.c 2. Some useful information

Creating a C++ Program

Compilers CS S-01 Compiler Basics & Lexical Analysis

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CS164: Programming Assignment 2 Dlex Lexer Generator and Decaf Lexer

Transcription:

Compiler construction in4020 course 2006 Koen Langendoen Delft University of Technology The Netherlands Goals understand the structure of a understand how the components operate understand the tools involved practice makes perfect understanding means [theory] be able to read code [practice] be able to adapt/write code Format: werkcollege + lab work input 14 interactive lectures 40 hrs book Modern Compiler Design schedule: see blackboard handouts: see blackboard 2 assignments 80 hrs modify reference written exam 40 hrs 11 pts output 70 pts 30 pts Practical work computer science computer engineering : Wednesday+Thursday 8:45 12:45 lab rooms Drebbelweg DW_1-010, DW_1-170 instructors: Bas, Leon, Peterpaul, Gertjan starting September 20 th : code generation Collaboration What is a? division of assignments is NOT allowed learning by doing unbalanced assignments copying/modifying code from other groups is NOT allowed similarity checking software (is effective ) 1

Why study construction? Compiler structure curiosity better understanding of ming concepts wide applicability transforming data is very common many useful data structures and algorithms practical application of theory L+M modules = LxM s Limitations of modular approach Semantic performance generic vs specific loss of information variations must be small same ming paradigm similar processor architecture heart of the inediate code linked lists of pseudo instructions abstract syntax tree (AST) AST example parse tree: b*b 4*a*c expression grammar expression expression expression + expression / constant ( expression ) expression example expression b*b 4*a*c constant 2

AST: b*b 4*a*c annotated AST: b*b 4*a*c loc: reg1 loc: reg1 loc: reg2 loc: sp+16 loc: sp+16 loc: reg2 loc: sp+24 constant expression loc: const loc: sp+8 AST exercise (5 min.) Answers expression grammar expression expression + expression / constant ( expression ) example expression b*b (4*a*c) draw parse tree and AST : from text to AST Lexical text covert stream of characters to stream of tokens token description grammar scanner generator parser generator lexical tokens syntax AST context handling annotated AST what is a token? sequence of characters with a notion, see definition rule of thumb: two characters belong to the same token if inserting white space changes the meaning. digit = *ptr++ - 0 ; digit = *ptr+ + - 0 ; 3

Tokens Non-tokens attributes type lexeme value file position examples typedef struct { int class; char *repr; file_pos position; Token_Type; type IDENTIFIER NUMBER REAL IF lexeme foo, t3, ptr 15, 082, 666 1.2,.002, 1e6 if white spaces spaces, tabs, newlines comments /* a C-style comment // a C++ comment preprocessor directives #include lex.h #define is_digit(d) ( 0 <= (d) && (d) <= 9 ) Regular expressions Examples of regular expressions Basic patterns Matching x the character x. any character, usually except a newline [abca-z] any of the characters a,b,c and the range A-Z Repetition operators R? an R or nothing (= optionally an R) R* zero or more occurrences of R R+ one or more occurrences of R Composition operators R 1 R 2 an R 1 followed by an R 2 R 1 R 2 either an R 1 or an R 2 Grouping ( R ) R itself an integer is a sequence of digits: [0-9]+ an is a sequence of letters and digits; the first character must be a letter: [a-z][a-z0-9]* Regular descriptions structuring regular expressions by introducing named sub expressions letter [a-za-z] digit [0-9] letter_or_digit letter digit letter letter_or_digit* define before use Exercise (5 min.) write down regular descriptions for the following descriptions: an integral number is a non-zero sequence of digits optionally followed by a letter denoting the base class (b for binary and o for octal). a fixed-point number is an (optional) sequence of digits followed by a dot (. ) followed by a sequence of digits. an is a sequence of letters and digits; the first character must be a letter. The underscore _ counts as a letter, but may not be used as the first or last character. 4

Answers Lexical covert stream of characters to stream of tokens tokens are defined by a regular description tokens are demanded one-by-one by the syntax text lexical get_next_token() tokens syntax AST interface lexical by hand extern Token_Type Token; /* Global variable that holds the current token. void start_lex(void); /* Must be called before the first call to * get_next_token(). void get_next_token(void); /* Load the next token into the global * variable Token. read complete text into memory for simplicity avoids buffering and arbitrary limits variable length tokens get_next_token() dispatches on the next character input: dot main() { printf( hello world\n ); void get_next_token(void) { int start_dot; skip_layout_and_comment(); /* now we are at the start of a token or at end-of-file, so: note_token_position(); /* split on first character of the token start_dot = dot; if (is_end_of_input(input_char)) { Token.class = EoF; Token.repr = "<EoF>"; return; if (is_letter(input_char)) {recognize_(); else if (is_digit(input_char)) {recognize_integer(); else if (is_operator(input_char) is_separator(input_char)) { Token.class = input_char; next_char(); else {Token.class = ERRONEOUS; next_char(); Token.repr = input_to_zstring(start_dot, dot-start_dot); Character classification & token recognition #define is_end_of_input(ch) ((ch) == '\0') #define is_layout(ch) (!is_end_of_input(ch) && (ch) <= ' ') #define is_uc_letter(ch) ('A' <= (ch) && (ch) <= 'Z') #define is_lc_letter(ch) ('a' <= (ch) && (ch) <= 'z') #define is_letter(ch) (is_uc_letter(ch) is_lc_letter(ch)) #define is_digit(ch) ('0' <= (ch) && (ch) <= '9') #define is_letter_or_digit(ch) (is_letter(ch) is_digit(ch)) #define is_underscore(ch) ((ch) == '_') #define is_operator(ch) (strchr("+-", (ch))!= NULL) #define is_separator(ch) (strchr(";,(){", (ch))!= NULL) void recognize_integer(void) { Token.class = INTEGER; next_char(); while (is_digit(input_char)) {next_char(); 5