Compiler I: Syntax Analysis

Similar documents
Chapter 10: Compiler I: Syntax Analysis

Compiler I: Sytnax Analysis

9. The Compiler I Syntax Analysis 1

Chapter 11: Compiler II: Code Generation

Compiler II: Code Generation Human Thought

Virtual Machine. Part I: Stack Arithmetic. Building a Modern Computer From First Principles.

Virtual Machine Where we are at: Part I: Stack Arithmetic. Motivation. Compilation models. direct compilation:... 2-tier compilation:

Assembler Human Thought

Virtual Machine. Part II: Program Control. Building a Modern Computer From First Principles.

Motivation. Compiler. Our ultimate goal: Hack code. Jack code (example) Translate high-level programs into executable code. return; } } return

Assembler. Building a Modern Computer From First Principles.

Introduction: Hello, World Below

Chapter 6: Assembler

Introduction: From Nand to Tetris

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Course overview. Introduction to Computer Yung-Yu Chuang. with slides by Nisan & Schocken (

An Introduction to LEX and YACC. SYSC Programming Languages

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Yacc: A Syntactic Analysers Generator

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

Course overview. Introduction to Computer Yung-Yu Chuang. with slides by Nisan & Schocken (

Course overview. Introduction to Computer Yung-Yu Chuang. with slides by Nisan & Schocken (

Lexical Analysis. Introduction

Chapter 8: Virtual Machine II: Program Control

A programming language requires two major definitions A simple one pass compiler

CS 314 Principles of Programming Languages. Lecture 3

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Formal Languages and Compilers Lecture VI: Lexical Analysis

Lexical analysis. Syntactical analysis. Semantical analysis. Intermediate code generation. Optimization. Code generation. Target specific optimization

Introduction to Compiler Design

Compiling Regular Expressions COMP360

Machine (Assembly) Language

Syntax-Directed Translation. Lecture 14

Compilers and Interpreters

Lexical and Syntax Analysis. Top-Down Parsing

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

CS415 Compilers. Lexical Analysis

DVA337 HT17 - LECTURE 4. Languages and regular expressions

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

CS 314 Principles of Programming Languages

COMPILER DESIGN. For COMPUTER SCIENCE

Lexical Analysis (ASU Ch 3, Fig 3.1)

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

PESIT Bangalore South Campus Hosur road, 1km before Electronic City, Bengaluru -100 Department of Computer Science and Engineering

for (i=1; i<=100000; i++) { x = sqrt (y); // square root function cout << x+i << endl; }

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Project 1: Scheme Pretty-Printer

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

Context-Free Languages. Wen-Guey Tzeng Department of Computer Science National Chiao Tung University

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

CMSC 330: Organization of Programming Languages

Compiler Code Generation COMP360

Chapter 10 Language Translation


CS143 Handout 20 Summer 2011 July 15 th, 2011 CS143 Practice Midterm and Solution

Virtual Machine (Part II)

Machine (Assembly) Language Human Thought

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

Lexical and Syntax Analysis

The Structure of a Syntax-Directed Compiler

IWKS 3300: NAND to Tetris Spring John K. Bennett. Assembler

TDDD55 - Compilers and Interpreters Lesson 3

The Parsing Problem (cont d) Recursive-Descent Parsing. Recursive-Descent Parsing (cont d) ICOM 4036 Programming Languages. The Complexity of Parsing

COMPILER CONSTRUCTION Seminar 02 TDDB44

If you are going to form a group for A2, please do it before tomorrow (Friday) noon GRAMMARS & PARSING. Lecture 8 CS2110 Spring 2014

Compiler Design (40-414)

CS 403 Compiler Construction Lecture 3 Lexical Analysis [Based on Chapter 1, 2, 3 of Aho2]

Theory and Compiling COMP360

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CSE 3302 Programming Languages Lecture 2: Syntax

Chapter 9: High Level Language

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

LECTURE 11. Semantic Analysis and Yacc

Compiler Optimisation

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

TDDD55- Compilers and Interpreters Lesson 3

MidTerm Papers Solved MCQS with Reference (1 to 22 lectures)

CMSC 350: COMPILER DESIGN

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Non-deterministic Finite Automata (NFA)

Lexical Scanning COMP360

COMPILER DESIGN LECTURE NOTES

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Compiler Construction D7011E

Computer Architecture

COP 3402 Systems Software. Lecture 4: Compilers. Interpreters

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Appendix A The DL Language

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Introduction to Lexical Analysis

Syntax Analysis. Chapter 4


1. Lexical Analysis Phase

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Transcription:

Compiler I: Syntax Analysis Building a Modern Computer From First Principles www.nand2tetris.org Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 1

Course map Human Thought Abstract design Chapters 9, 12 abstract interface H.L. Language & Operating Sys. Compiler Chapters 10-11 abstract interface Virtual Machine Software hierarchy VM Translator Chapters 7-8 abstract interface Assembly Language Assembler Chapter 6 abstract interface Machine Language Computer Architecture Chapters 4-5 Hardware hierarchy abstract interface Hardware Platform Gate Logic Chapters 1-3 abstract interface Chips & Logic Gates Electrical Engineering Physics Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 2

Motivation: Why study about compilers? Because Compilers Are an essential part of applied computer science Are very relevant to computational linguistics Are implemented using classical programming techniques Employ important software engineering principles Train you in developing software for transforming one structure to another (programs, files, transactions, ) Train you to think in terms of description languages. Parsing files of some complex syntax is very common in many applications. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 3

The big picture Modern compilers are two-tiered: Front-end: from high-level language to some intermediate language Back-end: from the intermediate language to binary code. VM implementation over CISC platforms Some... language Some compiler VM imp. over RISC platforms Some Other language Some Other compiler Intermediate code... VM emulator Jack language Jack compiler VM imp. over the Hack platform Compiler lectures (Projects 10,11) VM lectures (Projects 7-8) CISC machine language RISC machine language......... written in a high-level language Hack machine language HW lectures (Projects 1-6) CISC machine RISC machine other digital platforms, each equipped with its VM implementation Any computer Hack computer Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 4

Compiler architecture (front end) Some... language Some compiler Some Other language Some Other compiler... Jack compiler Jack language Intermediate code Jack Compiler (Chapter 10) XML code VM implementation VM imp. over CISC over RISC platforms platforms CISC RISC machine machine... language language VM emulator VM imp. over the Hack platform written in a high-level language Hack machine language...... Syntax Analyzer Jack Program Tokenizer Parser Code Gene -ration (Chapter 11) VM code (source) scanner (target) Syntax analysis: understanding the structure of the source code Tokenizing: creating a stream of atoms Parsing: matching the atom stream with the language grammar XML output = one way to demonstrate that the syntax analyzer works Code generation: reconstructing the semantics using the syntax of the target code. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 5

Tokenizing / Lexical analysis / scanning Remove white space Construct a token list (language atoms) Things to worry about: Language specific rules: e.g. how to treat ++ Language-specific classifications: keyword, symbol, identifier, integercconstant, stringconstant,... While we are at it, we can have the tokenizer record not only the token, but also its lexical classification (as defined by the source language grammar). Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 6

C function to split a string into tokens char* strtok ( char* str, const char* delimiters ); str: string to be broken into tokens delimiters: string containing the delimiter characters Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 7

Jack Tokenizer Source code if (x < 153) {let city = Paris ;} Tokenizer Tokenizer s output <tokens> <keyword> if </keyword> <symbol> ( </symbol> <identifier> x </identifier> <symbol> < </symbol> <integerconstant> 153 </integerconstant> <symbol> ) </symbol> <symbol> { </symbol> <keyword> let </keyword> <identifier> city </identifier> <symbol> = </symbol> <stringconstant> Paris </stringconstant> <symbol> ; </symbol> <symbol> } </symbol> </tokens> Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 8

Parsing The tokenizer discussed thus far is part of a larger program called parser Each language is characterized by a grammar. The parser is implemented to recognize this grammar in given texts The parsing process: A text is given and tokenized The parser determines weather or not the text can be generated from the grammar In the process, the parser performs a complete structural analysis of the text The text can be in an expression in a : Natural language (English, ) Programming language (Jack, ). Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 9

Parsing examples English He ate an apple on the desk. Jack (5+3)*2 sqrt(9*4) parse - ate + * 2 sqrt * he an apple 5 3 9 4 on the desk Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 10

Regular expressions a b* {, a, b, bb, bbb, } (a b)* {, a, b, aa, ab, ba, bb, aaa, } ab*(c ) {a, ac, ab, abc, abb, abbc, } Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 11

Context-free grammar S () S (S) S SS S a as bs strings ending with a S x S y S S+S S S-S S S*S S S/S S (S) (x+y)*x-x*y/(x+x) Simple (terminal) forms / complex (non-terminal) forms Grammar = set of rules on how to construct complex forms from simpler forms Highly recursive. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 12

Recursive descent parser A=bB cc A=(bB)* A() { if (next()== b ) { eat( b ); B(); } else if (next()== c ) { eat( c ); C(); } } A() { while (next()== b ) { eat( b ); B(); } } Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 13

A typical grammar of a typical C-like language Grammar program: statement; statement: whilestatement ifstatement // other statement possibilities... '{' statementsequence '}' whilestatement: 'while' '(' expression ')' statement ifstatement: simpleif ifelse simpleif: 'if' '(' expression ')' statement ifelse: 'if' '(' expression ')' statement 'else' statement statementsequence: '' // null, i.e. the empty sequence statement ';' statementsequence expression: // definition of an expression comes here // more definitions follow Code sample while (expression) { if (expression) statement; while (expression) { statement; if (expression) statement; } while (expression) { statement; statement; } } if (expression) { statement; while (expression) statement; statement; } if (expression) if (expression) statement; } Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 14

Parse tree Input Text: while (count<=100) { /** demonstration */ count++; //... Tokenized: while ( count <= 100 ) { count ++ ;... statement whilestatement expression program: statement; statement: whilestatement ifstatement // other statement possibilities... '{' statementsequence '}' whilestatement: 'while' '(' expression ')' statement... statement statementsequence statement statementsequence while ( count <= 100 ) { count ++ ;... Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 15

Recursive descent parsing code sample while (expression) { statement; statement; while (expression) { while (expression) statement; statement; } } Highly recursive LL(0) grammars: the first token determines in which rule we are In other grammars you have to look ahead 1 or more tokens Jack is almost LL(0). Parser implementation: a set of parsing methods, one for each rule: parsestatement() parsewhilestatement() parseifstatement() parsestatementsequence() parseexpression(). Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 16

The Jack grammar x : x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x y: either x or y appears (x,y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 17

The Jack grammar (cont.) x : x appears verbatim x: x is a language construct x?: x appears 0 or 1 times x*: x appears 0 or more times x y: either x or y appears (x,y): x appears, then y. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 18

Jack syntax analyzer in action Class Bar { method Fraction foo(int y) { var int temp; // a variable let temp = (xxx+12)*-63;...... Syntax analyzer Using the language grammar, a programmer can write a syntax analyzer program (parser) Syntax analyzer The syntax analyzer takes a source text file and attempts to match it on the language grammar If successful, it can generate a parse tree in some structured format, e.g. XML. The syntax analyzer s algorithm shown in this slide: If xxx is non-terminal, output: <xxx> Recursive code for the body of xxx </xxx> If xxx is terminal (keyword, symbol, constant, or identifier), output: <xxx> xxx value </xxx> <vardec> <keyword> var </keyword> <keyword> int </keyword> <identifier> temp </identifier> <symbol> ; </symbol> </vardec> <statements> <letstatement> <keyword> let </keyword> <identifier> temp </identifier> <symbol> = </symbol> <expression> <term> <symbol> ( </symbol> <expression> <term> <identifier> xxx </identifier> </term> <symbol> + </symbol> <term> <int.const.> 12 </int.const.> </term> </expression>... Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 19

JackTokenizer: a tokenizer for the Jack language (proposed implementation) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 20

JackTokenizer (cont.) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 21

CompilationEngine: a recursive top-down parser for Jack The CompilationEngine effects the actual compilation output. It gets its input from a JackTokenizer and emits its parsed structure into an output file/stream. The output is generated by a series of compilexxx() routines, one for every syntactic element xxx of the Jack grammar. The contract between these routines is that each compilexxx() routine should read the syntactic construct xxx from the input, advance() the tokenizer exactly beyond xxx, and output the parsing of xxx. Thus, compilexxx()may only be called if indeed xxx is the next syntactic element of the input. In the first version of the compiler, which we now build, this module emits a structured printout of the code, wrapped in XML tags (defined in the specs of project 10). In the final version of the compiler, this module generates executable VM code (defined in the specs of project 11). In both cases, the parsing logic and module API are exactly the same. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 22

CompilationEngine (cont.) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 23

CompilationEngine (cont.) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 24

CompilationEngine (cont.) Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 25

Summary and next step Syntax analysis: understanding syntax Code generation: constructing semantics Jack Compiler (Chapter 10) XML code Syntax Analyzer Jack Program Tokenizer Parser Code Gene -ration (Chapter 11) VM code The code generation challenge: Extend the syntax analyzer into a full-blown compiler that, instead of generating passive XML code, generates executable VM code Two challenges: (a) handling data, and (b) handling commands. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 26

Perspective The parse tree can be constructed on the fly Syntax analyzers can be built using: Lex tool for tokenizing (flex) Yacc tool for parsing (bison) Do everything from scratch (our approach...) The Jack language is intentionally simple: Statement prefixes: let, do,... No operator priority No error checking Basic data types, etc. Richer languages require more powerful compilers The Jack compiler: designed to illustrate the key ideas that underlie modern compilers, leaving advanced features to more advanced courses Industrial-strength compilers: (LLVM) Have good error diagnostics Generate tight and efficient code Support parallel (multi-core) processors. Elements of Computing Systems, Nisan & Schocken, MIT Press, www.nand2tetris.org, Chapter 10: Compiler I: Syntax Analysis slide 27