Course Overview. Levels of Programming Languages. Compilers and other translators. Tombstone Diagrams. Syntax Specification

Similar documents
Course Overview. Introduction (Chapter 1) Compiler Frontend: Today. Compiler Backend:

JVM. What This Topic is About. Course Overview. Recap: Interpretive Compilers. Abstract Machines. Abstract Machines. Class Files and Class File Format

Languages and Compilers (SProg og Oversættere) Lecture 15 (2) Bent Thomsen Department of Computer Science Aalborg University

Course Overview. PART I: overview material. PART II: inside a compiler. PART III: conclusion

CSE 12 Abstract Syntax Trees

The Phases of a Compiler. Course Overview. In Chapter 4. Syntax Analysis. Syntax Analysis. Multi Pass Compiler. PART I: overview material

The Structure of a Syntax-Directed Compiler

When do We Run a Compiler?

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Compiler Theory. (Semantic Analysis and Run-Time Environments)

Derivations of a CFG. MACM 300 Formal Languages and Automata. Context-free Grammars. Derivations and parse trees

Languages and Compilers (SProg og Oversættere)

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

Syntax and Grammars 1 / 21

CS5363 Final Review. cs5363 1

Final Examination May 5, 2005

Compiler construction 2009

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Specifying Syntax. An English Grammar. Components of a Grammar. Language Specification. Types of Grammars. 1. Terminal symbols or terminals, Σ

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Formats of Translated Programs

CSE 3302 Programming Languages Lecture 2: Syntax

What do Compilers Produce?

Principles of Programming Languages COMP251: Syntax and Grammars

Grammars & Parsing. Lecture 12 CS 2112 Fall 2018

Programming Languages Third Edition. Chapter 7 Basic Semantics

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

Question Bank. 10CS63:Compiler Design

A programming language requires two major definitions A simple one pass compiler

CSc 453 Interpreters & Interpretation

Context-Free Grammars

The Structure of a Syntax-Directed Compiler

Static Program Analysis

CPS 506 Comparative Programming Languages. Syntax Specification

Context-Free Grammar. Concepts Introduced in Chapter 2. Parse Trees. Example Grammar and Derivation

Time : 1 Hour Max Marks : 30

The role of semantic analysis in a compiler

Lecture 4: Syntax Specification

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Introduction to Programming Using Java (98-388)

MIT Specifying Languages with Regular Expressions and Context-Free Grammars

Lexical and Syntax Analysis. Top-Down Parsing

The Structure of a Syntax-Directed Compiler

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

Introduction to Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

Regular Expressions. Agenda for Today. Grammar for a Tiny Language. Programming Language Specifications

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

CSCE 314 Programming Languages. Type System

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

CSCI312 Principles of Programming Languages!

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

Week 2: Syntax Specification, Grammars

Compiler Construction I

ECE251 Midterm practice questions, Fall 2010

Non-deterministic Finite Automata (NFA)

11. a b c d e. 12. a b c d e. 13. a b c d e. 14. a b c d e. 15. a b c d e

Intermediate Code Generation

Languages and Compilers

Semester Review CSC 301

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Abstract Syntax Trees and Contextual Analysis. Roland Backhouse March 8, 2001

COP 3402 Systems Software Top Down Parsing (Recursive Descent)

List of Figures. About the Authors. Acknowledgments

CST-402(T): Language Processors

COMP3131/9102: Programming Languages and Compilers

Parsing Scheme (+ (* 2 3) 1) * 1

The Structure of a Compiler

CSE P 501 Compilers. Java Implementation JVMs, JITs &c Hal Perkins Winter /11/ Hal Perkins & UW CSE V-1

CSE 431S Final Review. Washington University Spring 2013

ADTS, GRAMMARS, PARSING, TREE TRAVERSALS

MIT Specifying Languages with Regular Expressions and Context-Free Grammars. Martin Rinard Massachusetts Institute of Technology

Front End. Hwansoo Han

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Semantic actions for declarations and expressions

A Simple Syntax-Directed Translator

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

EDAN65: Compilers, Lecture 06 A LR parsing. Görel Hedin Revised:

Programming Languages & Translators PARSING. Baishakhi Ray. Fall These slides are motivated from Prof. Alex Aiken: Compilers (Stanford)

CSCE 314 Programming Languages

Compiling Techniques

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Principles of Programming Languages COMP251: Syntax and Grammars

Lexical and Syntax Analysis

Semantic actions for declarations and expressions

EDAN65: Compilers, Lecture 04 Grammar transformations: Eliminating ambiguities, adapting to LL parsing. Görel Hedin Revised:

Reading Assignment. Scanner. Read Chapter 3 of Crafting a Compiler.

Semantic actions for expressions

CS 536 Midterm Exam Spring 2013

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Building Compilers with Phoenix

Transcription:

Course Overview Levels of Programming Languages PART I: overview material 1 Introduction 2 Language processors (tombstone diagrams, bootstrapping) 3 Architecture of a compiler PART II: inse a compiler 4 Syntax analysis 5 Contextual analysis 6 Runtime organization 7 Code generation PART III: conclusion 8 Interpretation 9 Review 1 High-level program class Triangle {... float area( ) { return b*h/2; } Low-level program LOAD r1,b LOAD r2,h MUL r1,r2 DIV r1,#2 RT xecutable Machine code 0001001001000101 0010010011101100 10101101001... 2 Compilers and other translators Tombstone Diagrams xamples: Chinese => nglish Java => JVM byte codes Scheme => C C => Scheme x86 Assembly Language => x86 binary codes What are they? diagrams consisting out of a set of puzzle pieces we can use to reason about language processors and programs different kinds of pieces combination rules (not all diagrams are well formed ) Program P implemented in L Translator implemented in L P L S --> T L Other non-traditional examples: disassembler, decompiler (e.g. JVM => Java) Machine implemented in hardware M Language interpreter in L M L 3 4 Syntax Specification Syntax is specified using Context Free Grammars : A finite set of terminal symbols A finite set of non-terminal symbols A start symbol A finite set of production rules Often CFG are written in Bachus Naur Form or BNF notation. ach production rule in BNF notation is written as: N ::= α where N is a non terminal and α a sequence of terminals and non-terminals N ::= α β... is an abbreviation for several rules with N as left-hand se. 5 Concrete and Abstract Syntax The grammar specifies the concrete syntax of a programming language. The concrete syntax is important for the programmer who needs to knowexactly how to write syntactically wellformed programs. The abstract syntax omits irrelevant syntactic details and only specifies the essential structure of programs. xample: different concrete syntaxes for an assignment v := e (set! v e) e -> v v = e 6 1

Grammar String Context -Free Grammars + () + Context -Free Grammars (continued) The given string has 2 parse trees (concrete syntax trees). So the grammar is ambiguous. * + * + 7 8 Abstract Syntax Trees Contextual Constraints Abstract Syntax Tree for: d:=d+10*n AssignmentCmd Syntax rules alone are not enough to specify the format of well-formed programs. Binaryxpression Binaryxpression VName VNamexp Integerxp VNamexp SimpleVName SimpleVName SimpleVName Ident Ident Op Int-Lit Op Ident d d + 10 * n Note: Triangle does not have precedence levels like C++ xample 1: let const m~2 in putint(m + x) Undefined! xample 2: let const m~2 ; var n:boolean in begin n := m<4; n := n+1 Type error! end Type Rules Scope Rules 9 10 Semantics Phases of a Compiler Specification of semantics is concerned with specifying the meaning of well-formed programs. A compiler s phases are steps in transforming source code into object code. Terminology: xpressions are evaluated and yield values (and may or may not perform se effects). Commands are executed and perform se effects. Declarations are elaborated to produce bindings. The different phases correspond roughly to the different parts of the language specification: Syntax analysis <--> Syntax Contextual analysis <--> Contextual constraints Code generation <--> Semantics Se effects: change the values of variables perform input/output 11 12 2

Compiler Passes A pass is a complete traversal of the source program, or a complete traversal of some internal representation of the source program (such as the syntax tree). A pass can correspond to a phase but it does not have to! Sometimes a single pass corresponds to several phases that are interleaved in time. What and how many passes a compiler does over the source program is an important design decision. Syntax Analysis Dataflow chart Source Program Stream of Characters Scanner rror Reports Stream of Tokens Parser rror Reports Abstract Syntax Tree 13 14 Regular xpressions Language Defined by a Regular xpression R are a notation for expressing a set of strings of terminal symbols. Recall: language = set of strings Language defined by a regular expression = set of strings that match the expression Different kinds of R: ε The empty string t Generates only the string t X Y Generates any string xy such that x is generated by x and y is generated by Y X Y Generates any string which generated either by X or by Y X* The concatenation of zero or more strings generated by X (X) For grouping, Regular xpression Corresponding Set of Strings ε {""} a {"a"} a b c {"abc"} a b c {"a", "b", "c"} (a b c)* {"", "a", "b", "c", "aa", "ab",..., "bccabb"...} 15 16 FSM and the implementation of Scanners DFSM xample: Integer Literals Regular expressions, NFSM s, and DFSM s are all equivalent formalisms in terms of what languages can be defined with them. Regular expressions are a convenient notation for describing the tokens of programming languages. Regular expressions can be converted into NFSM s (the algorithm for conversion into DFSM is straightforward). DFSM s can be easily implemented as computer programs. Here is a DFSM that accepts integer literals with an optional + or sign: digit S + B A digit digit 17 18 3

Parsing Top-down parsing Parsing == Recognition + determining syntax structure (for example by generating AST) Different types of parsing strategies bottom up top down Recursive descent parsing What is it How to implement one given an BNF specification Sentence Subject Verb Object. Noun Noun The cat sees a rat. 19 20 Bottom up parsing Development of Recursive Descent Parser Sentence Subject Object Noun Verb Noun The cat sees a rat. (1) xpress grammar in BNF (2) Grammar Transformations: Left factorization and Left recursion elimination (3) Create a parser class with private variable currenttoken methods to call the scanner: acceptand acceptit (4) Implement a public method for main function to call: public parse method that fetches the first token from the scanner calls parses (where S is start symbol of the grammar) verifies that scanner next produces the end of file token (5) Implement private parsing methods: add private parsen method for each non terminal N 21 22 LL 1 Grammars Contextual Analysis --> Decorated AST The presented algorithm to convert BNF into a parser does not work for all possible grammars. It only works for so called LL 1 grammars. Basically, an LL 1 grammar is a grammar which can be parsed with a top-down parser with a lookahead(in the input stream of tokens) of one token. What grammars are LL 1? How can we recognize that a grammar is (or is not) LL 1? => We can deduce the necessary conditions from the parser generation algorithm. SequentialDeclaration Program LetCommand Annotations: result of entification :type result of type checking SequentialCommand AssignCommand AssignCommand SimpleV Binaryxpr VarDecl VarDecl Char.xpr :char VNamexp Int.xpr SimpleT SimpleT SimpleV :char SimpleV Ident Ident Ident Ident Ident Char.Lit Ident Ident Op Int.Lit n Integer c Char c & n n + 1 23 24 4

Nested Block Structure Type Checking Nested A language exhibits nested block structure if if blocks may be be nested one within another (typically with no upper bound on the level of of nesting that is is allowed). There can be any number of scope levels (depending on the level of nesting of blocks): Typical scope rules: no entifier may be declared more than once within the same block (at the same level). for any applied occurrence there must be a corresponding declaration, either within the same block or in a block in which it is nested. For most statically typed programming languages, a bottom up algorithm over the AST: Types of expression AST leaves are known immediately: literals => obvious variables => from the ID table named constants => from the ID table Types of internal nodes are inferred from the type of the children and the type rule for that kind of expression 25 26 Runtime organization Java Virtual Machine Data Representation: how to represent values of the source language on the target machine. Primitives, arrays, structures, unions, pointers xpression valuation: How to organize computing the values of expressions (taking care of intermediate results) Register machine vs. stack machine Storage Allocation: How to organize storage for variables (consering various lifetimes of global, local, and heap variables) Activation records, static/dynamic links, dynamic allocation Routines: How to implement procedures, functions (and how to pass their parameters and return values) Value vs. reference parameters, closures, recursion Object Orientation: Runtime organization for OO languages Method tables xternal representation (platform independent).class files load JVM Internal representation (implementation dependent) classes objects arrays methods primitive types strings The JVM is an abstract machine in the truest sense of the word. The JVM specification does not give implementation details (can be dependent on target OS/platform, performance requirements, etc.) The JVM specification defines a machine independent class file format that all JVM implementations must support. 27 28 Inspecting JVM code Compiling and Disassembling... % javac Factorial.java % javap -c -verbose Factorial Compiled from Factorial.java class Factorial extends java.lang.object { Factorial(); /* Stack=1, Locals=1, Args_size=1 */ int fac(int); /* Stack=2, Locals=4, Args_size=2 */ } Method Factorial() 0 aload_0 1 invokespecial #1 <Method java.lang.object()> 4 return // address: 0 1 2 3 Method int fac(int) // stack: this n result i 0 iconst_1 // stack: this n result i 1 1 istore_2 // stack: this n result i 2 iconst_2 // stack: this n result i 2 3 istore_3 // stack: this n result i 4 goto 14 7 iload_2 // stack: this n result i result 8 iload_3 // stack: this n result i result i 9 imul // stack: this n result i result*i 10 istore_2 // stack: this n result i 11 iinc 3 1 // stack: this n result i 14 iload_3 // stack: this n result i i 15 iload_1 // stack: this n result i i n 16 if_icmplt 7 // stack: this n result i 19 iload_2 // stack: this n result i result 20 ireturn 29 30 5

Source Program let var n: integer; var c: char in begin c := & ; n := n+1 end Code Generation Source and target program must be semantically equivalent ~ Target program PUSH 2 LOADL 38 STOR 1[SB] LOAD 0[SB] LOADL 1 CALL add STOR 0[SB] POP 2 HALT Semantic specification of the source language is structured in terms of phrases in the SL: expressions, commands, etc. => Code generation follows the same inductive structure. Specifying Code Generation with Code Templates The code generation functions for Mini Triangle Syntax class Function ffect of the generated code Program Command xpression V-name V-name Declaration run P execute C evaluate fetch V assign V elaborate D Run program P then halt. Start and finish with empty stack. xecute command C. May update variables but does not shrink or grow the stack! valuate expression. Net result is pushing the value of onto the stack. Push the value of constant or variable onto the stack. Pop value from stack and store in variable V. laborate declaration D. Make space on the stack for constants and variables in D. 31 32 Code Generation with Code Templates Two Kinds of Interpreters While command execute [while do C] = JUMP h g: execute [C] h: evaluate[] JUMPIF(1) g C Iterative interpretation: Well suited for quite simple languages, and fast (at most 10 times slower than compiled languages) Recursive interpretation: Well suited for more complex languages, but slower (up to 100 times slower than compiled languages) 33 34 Hypo: a Hypothetical Abstract Machine Mini-Basic Interpreter 4096-word code store and 4096-word data store PC: program counter (register), initially 0 ACC: general purpose accumulator (register), initially 0 4-bit opcode and 12-bit operand Instruction set: Opcode Instruction Meaning 0 STOR d word at address d := ACC 1 LOAD d ACC := word at address d 2 LOADL d ACC := d 3 ADD d ACC := ACC + word at address d 4 SUB d ACC := ACC word at address d 5 JUMP d PC := d 6 JUMPZ d if ACC = 0 then PC := d 7 HALT stop execution Mini-Basic abstract machine: Data store: array of size 26 floating-point values Code store: array of commands Possible representations for each command: Character string (yields slowest execution) Sequence of tokens (good compromise) AST (yields longest response time) 35 36 6

Recursive Interpretation Recursively defined languages cannot be interpreted iteratively (fetch-analyze-execute), because each command can contain any number of other commands Both analysis and execution must be recursive (similar to the parsing phase when compiling a high-level language) Hence, the entire analysis must precede the entire execution: Step 1: Fetch and analyze (recursively) Step 2: xecute (recursively) xecution is a traversal of the decorated AST, hence we can use a new visitor Values (variables and constants) are handled internally Code optimization (improvement) The code generated by our compiler is not efficient: It computes some values at runtime that could be known at compile time It computes some values more times than necessary We can do better! Constant folding Common sub-expression elimination Code motion Dead code elimination 37 38 Optimization implementation Is the optimization correct or safe? Is the optimization really an improvement? What sort of analyses do we need to perform to get the required information? Local Global 39 7