Week 3: Compilers and Interpreters

Similar documents
Week 2: Syntax Specification, Grammars

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers Crash Course

symbolic name data type (perhaps with qualifier) allocated in data area, stack, or heap duration (lifetime or extent)

CS 4240: Compilers and Interpreters Project Phase 1: Scanner and Parser Due Date: October 4 th 2015 (11:59 pm) (via T-square)

Introduction to Lexical Analysis

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

A Tour of Language Implementation

CSc 453 Compilers and Systems Software

COMPILER DESIGN LECTURE NOTES

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

Semantic Analysis. Lecture 9. February 7, 2018

CSCE 314 Programming Languages. Type System

C Compilation Model. Comp-206 : Introduction to Software Systems Lecture 9. Alexandre Denault Computer Science McGill University Fall 2006

Chapter 3 Lexical Analysis

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Chapter 2 - Programming Language Syntax. September 20, 2017

Compiler course. Chapter 3 Lexical Analysis

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Programming Languages, Summary CSC419; Odelia Schwartz

LECTURE 18. Control Flow

Chapter 3: Describing Syntax and Semantics. Introduction Formal methods of describing syntax (BNF)

CIS 341 Midterm February 28, Name (printed): Pennkey (login id): SOLUTIONS

Compiler Construction LECTURE # 1

CSE P 501 Exam 11/17/05 Sample Solution

CST-402(T): Language Processors

CS5363 Final Review. cs5363 1

CMSC 350: COMPILER DESIGN

4) C = 96 * B 5) 1 and 3 only 6) 2 and 4 only

Lexical Analyzer Scanner

Writing Evaluators MIF08. Laure Gonnord

Compiler Design (40-414)

CSE 401 Midterm Exam Sample Solution 2/11/15

CS 415 Midterm Exam Spring 2002

Intermediate Code Generation

Formal Languages and Compilers Lecture VI: Lexical Analysis

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

CSE 582 Autumn 2002 Exam Sample Solution

Undergraduate Compilers in a Day

Lexical Analyzer Scanner

Chapter 3. Describing Syntax and Semantics ISBN

Lexical Analysis. Lexical analysis is the first phase of compilation: The file is converted from ASCII to tokens. It must be fast!

CS 360 Programming Languages Interpreters

Lexical Analysis. Dragon Book Chapter 3 Formal Languages Regular Expressions Finite Automata Theory Lexical Analysis using Automata

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

COMP-421 Compiler Design. Presented by Dr Ioanna Dionysiou

The Structure of a Syntax-Directed Compiler

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

CIT Week13 Lecture

COP4020 Programming Languages. Syntax Prof. Robert van Engelen

Time : 1 Hour Max Marks : 30

CS412/413. Introduction to Compilers Tim Teitelbaum. Lecture 2: Lexical Analysis 23 Jan 08

Formal Languages and Grammars. Chapter 2: Sections 2.1 and 2.2

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILING

CS164: Programming Assignment 5 Decaf Semantic Analysis and Code Generation

Compiler Construction D7011E

Lexical Analysis. Chapter 2

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

CS152 Programming Language Paradigms Prof. Tom Austin, Fall Syntax & Semantics, and Language Design Criteria

CS606- compiler instruction Solved MCQS From Midterm Papers

CSE 401 Midterm Exam Sample Solution 11/4/11

Scanners. Xiaokang Qiu Purdue University. August 24, ECE 468 Adapted from Kulkarni 2012

SEMANTIC ANALYSIS TYPES AND DECLARATIONS

Lecture Outline. Code Generation. Lecture 30. Example of a Stack Machine Program. Stack Machines

COMPILER DESIGN. For COMPUTER SCIENCE

Lexical Analysis. Lecture 2-4

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

Compiler Construction D7011E

Introduction to Compiler Design

Programming Languages (CS 550) Lecture 4 Summary Scanner and Parser Generators. Jeremy R. Johnson

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compilation I. Hwansoo Han

Monday, August 26, 13. Scanners

Programming. translate our algorithm into set of instructions machine can execute

Parsing and Pattern Recognition

A Simple Syntax-Directed Translator

Code Generation. Lecture 30

10/4/18. Lexical and Syntactic Analysis. Lexical and Syntax Analysis. Tokenizing Source. Scanner. Reasons to Separate Lexical and Syntactic Analysis

Data in Memory. variables have multiple attributes. variable

Wednesday, September 3, 14. Scanners

General Concepts. Abstraction Computational Paradigms Implementation Application Domains Influence on Success Influences on Design

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

A simple syntax-directed

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

G Programming Languages - Fall 2012

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Part 5 Program Analysis Principles and Techniques

Implementation of Lexical Analysis

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

Compiler Design IIIT Kalyani, West Bengal 1. Introduction. Goutam Biswas. Lect 1

Software II: Principles of Programming Languages

Front End. Hwansoo Han

The role of semantic analysis in a compiler

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100


CSE 3302 Programming Languages Lecture 2: Syntax

Writing a Lexer. CS F331 Programming Languages CSCE A331 Programming Language Concepts Lecture Slides Monday, February 6, Glenn G.

Lecture 9 CIS 341: COMPILERS

Transcription:

CS320 Principles of Programming Languages Week 3: Compilers and Interpreters Jingke Li Portland State University Fall 2017 PSU CS320 Fall 17 Week 3: Compilers and Interpreters 1/ 52

Programming Language Implementation Framework high-level Programs Compiler/ Interpreter low-level Execution Programming languages enable people to express tasks at high-level However, to perform a program s actions, we need to execute it on a low-level machine Compilers and interpreters provide the bridge between these two parts PSU CS320 Fall 17 Week 3: Compilers and Interpreters 2/ 52

High-Level vs. Low-Level High-level Language s Features: Declarations and nested scopes Many data types, allowing declarations and nested scopes Many forms of expressions and statements Many levels of program abstractions Type-inference, exceptions, concurrency,... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 3/ 52

High-Level vs. Low-Level High-level Language s Features: Declarations and nested scopes Many data types, allowing declarations and nested scopes Many forms of expressions and statements Many levels of program abstractions Type-inference, exceptions, concurrency,... Some other forms of high-level descriptions can also be included in this framework: Speeches, written texts, images, videos,... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 3/ 52

High-Level vs. Low-Level High-level Language s Features: Declarations and nested scopes Many data types, allowing declarations and nested scopes Many forms of expressions and statements Many levels of program abstractions Type-inference, exceptions, concurrency,... Some other forms of high-level descriptions can also be included in this framework: Speeches, written texts, images, videos,... Low-Level Language s Characteristics: Explicit registers, explicit memory management Limited operation forms: machine instructions Limited control mechanism: only labels and conditional branches PSU CS320 Fall 17 Week 3: Compilers and Interpreters 3/ 52

Compiler vs. Interpreter An interpreter implements a program in one single step. It runs a program directly: Source program Interpreter execution PSU CS320 Fall 17 Week 3: Compilers and Interpreters 4/ 52

Compiler vs. Interpreter An interpreter implements a program in one single step. It runs a program directly: Source program Interpreter execution A compiler implements a program in two steps. It translates a program first; then executes: Source program Compiler Target program execution PSU CS320 Fall 17 Week 3: Compilers and Interpreters 4/ 52

Compiler vs. Interpreter An interpreter implements a program in one single step. It runs a program directly: Source program Interpreter execution A compiler implements a program in two steps. It translates a program first; then executes: Source program Compiler Target program execution JIT (Just-In-Time) compiler performs compilation after source program is loaded into memory for execution. (To the user, it appears like an interpreter, since there is no explicit target program.) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 4/ 52

Compiler vs. Interpreter Any programming language can be implemented either through interpretation or compilation. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 5/ 52

Compiler vs. Interpreter Any programming language can be implemented either through interpretation or compilation. However, due to the differences of PLs features, Some languages are more suitable for compilation, e.g. languages with many static features Fortran, C, Ada,... They sometimes are referred to as compiled languages. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 5/ 52

Compiler vs. Interpreter Any programming language can be implemented either through interpretation or compilation. However, due to the differences of PLs features, Some languages are more suitable for compilation, e.g. languages with many static features Fortran, C, Ada,... They sometimes are referred to as compiled languages. Some are more suitable for interpretation, e.g. Very simple languages: BASIC, Logo,... Scripting languages: PHP, Python, Ruby, Perl, Javascript,... Declarative languages: Lisp, Scheme, ML, Haskell, Prolog,... They sometimes are referred to as interpreted languages. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 5/ 52

Compiler vs. Interpreter Some languages use a compilation-interpretation combined approach, e.g. through an intermediate representation: Pascal (p-code), Java (bytecode), VB (p-code), C#,... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 6/ 52

Compiler vs. Interpreter Some languages use a compilation-interpretation combined approach, e.g. through an intermediate representation: Pascal (p-code), Java (bytecode), VB (p-code), C#,... Some languages have both forms of implementations: Pascal, Lisp, C/C++,... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 6/ 52

Compiler vs. Interpreter Since a compiler is a language-to-language translator, they can be used in a chain: L1 Program L1 Compiler L2 Program L2 Compiler L3 Program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 7/ 52

Compiler vs. Interpreter Since a compiler is a language-to-language translator, they can be used in a chain: L1 Program L1 Compiler L2 Program L2 Compiler L3 Program Example: The classical Unix C compiler, cc, proc.c cc proc.o is in fact three compilers chained together. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 7/ 52

Compiler vs. Interpreter CC proc.c proc.i proc.s proc.o cpp cc1 as PSU CS320 Fall 17 Week 3: Compilers and Interpreters 8/ 52

Compiler vs. Interpreter CC proc.c proc.i proc.s proc.o cpp cc1 as cpp: the C preprocessor, expands the use of macros and compiler directives in the source program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 8/ 52

Compiler vs. Interpreter CC proc.c proc.i proc.s proc.o cpp cc1 as cpp: the C preprocessor, expands the use of macros and compiler directives in the source program cc1: the main C compiler, which translates C code to the assembly language for a particular machine PSU CS320 Fall 17 Week 3: Compilers and Interpreters 8/ 52

Compiler vs. Interpreter CC proc.c proc.i proc.s proc.o cpp cc1 as cpp: the C preprocessor, expands the use of macros and compiler directives in the source program cc1: the main C compiler, which translates C code to the assembly language for a particular machine as: the assembler, which translates assembly language programs into machine code PSU CS320 Fall 17 Week 3: Compilers and Interpreters 8/ 52

Compiler Overview Source Program Compiler Target Program diagnostics PSU CS320 Fall 17 Week 3: Compilers and Interpreters 9/ 52

Compiler Overview Source Program Compiler Target Program diagnostics A compiler translates a program It reads a source program as input, analyzes it, and then outputs a semantically equivalent target program. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 9/ 52

Compiler Overview Source Program Compiler Target Program diagnostics A compiler translates a program It reads a source program as input, analyzes it, and then outputs a semantically equivalent target program. In a typical setting, the source language is high-level, while the target language is low-level. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 9/ 52

Compiler Overview Source Program Compiler Front-end AST Back-end Target Program diagnostics PSU CS320 Fall 17 Week 3: Compilers and Interpreters 10 / 52

Compiler Overview Source Program Compiler Front-end AST Back-end Target Program diagnostics Front-end Main task is to understand the input program s syntax and validate its static semantics PSU CS320 Fall 17 Week 3: Compilers and Interpreters 10 / 52

Compiler Overview Source Program Compiler Front-end AST Back-end Target Program diagnostics Front-end Main task is to understand the input program s syntax and validate its static semantics Back-end Main task is to synthesize a semantically-equivalent target program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 10 / 52

Compiler Overview Source Program Compiler Front-end AST Back-end Target Program diagnostics Front-end Main task is to understand the input program s syntax and validate its static semantics Back-end Main task is to synthesize a semantically-equivalent target program AST Internal program representation, with essential syntax info PSU CS320 Fall 17 Week 3: Compilers and Interpreters 10 / 52

Basic Requirement for a Compiler A compiler needs to ensure that the source program s semantics is preserved in the target program. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 11 / 52

Basic Requirement for a Compiler A compiler needs to ensure that the source program s semantics is preserved in the target program. In today s practice, compiler s correctness is largely established through informal validation approaches. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 11 / 52

Basic Requirement for a Compiler A compiler needs to ensure that the source program s semantics is preserved in the target program. In today s practice, compiler s correctness is largely established through informal validation approaches. Provably correct compilers is still an active research topic. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 11 / 52

Desirable Properties of a Compiler PSU CS320 Fall 17 Week 3: Compilers and Interpreters 12 / 52

Desirable Properties of a Compiler Performance: Of both the compiler itself and compiled code PSU CS320 Fall 17 Week 3: Compilers and Interpreters 12 / 52

Desirable Properties of a Compiler Performance: Of both the compiler itself and compiled code Diagnostics: High quality error messages and warnings enable early diagnosis and resolution of programming errors PSU CS320 Fall 17 Week 3: Compilers and Interpreters 12 / 52

Desirable Properties of a Compiler Performance: Of both the compiler itself and compiled code Diagnostics: High quality error messages and warnings enable early diagnosis and resolution of programming errors Convenient development environment: IDEs, tools for profiling and debugging, etc. Separate compilation PSU CS320 Fall 17 Week 3: Compilers and Interpreters 12 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract Syntax Tree PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract IR Code Syntax Tree Generator PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract IR Code Syntax Tree Generator IR Code Optimizer PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract IR Code Syntax Tree Generator IR Code Optimizer Target Code Generator PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract IR Code Syntax Tree Generator IR Code Optimizer Target Code Generator Target Program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

The Compiler Pipeline The compilation process is typically broken down into a sequence of phases: Front-end: Source program Lexical Analysis (Lexer) Syntax Analysis (Parser) Static Analysis (Checker) Abstract Syntax Tree Back-end: Abstract IR Code Syntax Tree Generator IR Code Optimizer Target Code Generator Target Program There are many variations on the phase sequence of the back-end, e.g. extra phases or iterated phases. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 13 / 52

Lexical Analysis character stream Lexer token stream PSU CS320 Fall 17 Week 3: Compilers and Interpreters 14 / 52

Lexical Analysis character stream Lexer token stream Tasks: Looking for patterns in the input, converting them to tokens PSU CS320 Fall 17 Week 3: Compilers and Interpreters 14 / 52

Lexical Analysis character stream Lexer token stream Tasks: Looking for patterns in the input, converting them to tokens Skipping comments and white space characters PSU CS320 Fall 17 Week 3: Compilers and Interpreters 14 / 52

Lexical Analysis character stream Lexer token stream Tasks: Looking for patterns in the input, converting them to tokens Skipping comments and white space characters Detecting lexical errors PSU CS320 Fall 17 Week 3: Compilers and Interpreters 14 / 52

Lexer Implementation Alexerisbasicallyafiniteautomaton. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 15 / 52

Lexer Implementation Alexerisbasicallyafiniteautomaton. Every step for converting RE to DFA to lexer can be automated. As such, many lexer-generators exist: lex flex, flex++, jflex JavaCC... "mini.jflex" // minijava keywords and ID // in jflex specification %% %% "class" "extends" "static" "public" "void" "int" "boolean" "new" "if" "else" "while" "return" "main" "true" "false" "String" "System" "out" "println" [A-Za-z]* [ \t\n]+ { /* ignore */ } PSU CS320 Fall 17 Week 3: Compilers and Interpreters 15 / 52

Lexer Implementation It s possible to manually implement a lexer following the RE to DFA conversion steps. But the process can be very tedious, and the resulting DFA can be very large: linux> jflex mini.jflex Reading "mini.jflex" Constructing NFA : 118 states in NFA Converting NFA to DFA :... 92 states before minimization, 91 states in minimized DFA Writing code to "Yylex.java" PSU CS320 Fall 17 Week 3: Compilers and Interpreters 16 / 52

Lexer Implementation It s possible to manually implement a lexer following the RE to DFA conversion steps. But the process can be very tedious, and the resulting DFA can be very large: linux> jflex mini.jflex Reading "mini.jflex" Constructing NFA : 118 states in NFA Converting NFA to DFA :... 92 states before minimization, 91 states in minimized DFA Writing code to "Yylex.java" Alternative manual approaches exist. They generally process token RE patterns directly, without converting them to DFAs. They use techniques such as buffering, lookahead, and post-processing, to simply the task. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 16 / 52

Lexer Implementation Sample manual code (for keywords and IDs): // Treat keywords as IDs first, then distinguish them out int c = nextchar(); int c2 = peeknextchar(); if (isletter(c)) { // identifying an ID token while (isletter(c2) isdigit(c2)) { c = nextchar(); c2 = peeknextchar(); } // assume lexeme is buffered in a String if (lexeme.equals("class")) return CLASS; else if (lexeme.equals("extends")) return EXTENDS;... else return ID; } PSU CS320 Fall 17 Week 3: Compilers and Interpreters 17 / 52

Syntax Analysis token stream Parser abstract syntax tree PSU CS320 Fall 17 Week 3: Compilers and Interpreters 18 / 52

Syntax Analysis token stream Parser abstract syntax tree Tasks: Recognizing the hierarchical syntactic structure of the input program, representing it in an internal data structure, typically a syntax tree. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 18 / 52

Syntax Analysis token stream Parser abstract syntax tree Tasks: Recognizing the hierarchical syntactic structure of the input program, representing it in an internal data structure, typically a syntax tree. Detecting syntax errors PSU CS320 Fall 17 Week 3: Compilers and Interpreters 18 / 52

Parser Implementation A parser is basically a push-down automaton, i.e. an automaton with a stack storage. On each input token, it not only can transit from one state to another, it can also store information for later use. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 19 / 52

Parser Implementation A parser is basically a push-down automaton, i.e. an automaton with a stack storage. On each input token, it not only can transit from one state to another, it can also store information for later use. Steps to implement a parser: PSU CS320 Fall 17 Week 3: Compilers and Interpreters 19 / 52

Parser Implementation A parser is basically a push-down automaton, i.e. an automaton with a stack storage. On each input token, it not only can transit from one state to another, it can also store information for later use. Steps to implement a parser: 1. Describe the input language s syntax by a context-free grammar PSU CS320 Fall 17 Week 3: Compilers and Interpreters 19 / 52

Parser Implementation A parser is basically a push-down automaton, i.e. an automaton with a stack storage. On each input token, it not only can transit from one state to another, it can also store information for later use. Steps to implement a parser: 1. Describe the input language s syntax by a context-free grammar 2. Convert the grammar into a form that is suitable for parsing e.g. unambiguous, restricted recursion form PSU CS320 Fall 17 Week 3: Compilers and Interpreters 19 / 52

Parser Implementation A parser is basically a push-down automaton, i.e. an automaton with a stack storage. On each input token, it not only can transit from one state to another, it can also store information for later use. Steps to implement a parser: 1. Describe the input language s syntax by a context-free grammar 2. Convert the grammar into a form that is suitable for parsing e.g. unambiguous, restricted recursion form 3. Build a parser based on the transformed grammar PSU CS320 Fall 17 Week 3: Compilers and Interpreters 19 / 52

A Context-Free Grammar Example Program "begin" StmtList "end" StmtList Stmt {Stmt} Stmt Assignment ReadStmt WriteStmt Assignment id ":=" Expr ";" ReadStmt "read" "(" IdList ")" ";" WriteStmt "write" "(" ExprList ")" ";" IdList id {"," id} ExprList Expr {"," Expr} Expr Expr Op Expr "(" Expr ")" id intlit Op "+" "-" "*" "/" PSU CS320 Fall 17 Week 3: Compilers and Interpreters 20 / 52

Grammar Transformation A programming language s official grammar is not always suitable for use as the base for parser construction, e.g. it might be ambiguous it might contain wrong forms of recursion it might require multiple lookahead tokens PSU CS320 Fall 17 Week 3: Compilers and Interpreters 21 / 52

Grammar Transformation A programming language s official grammar is not always suitable for use as the base for parser construction, e.g. it might be ambiguous it might contain wrong forms of recursion it might require multiple lookahead tokens Transformation is often required to prepare a grammar for parsing. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 21 / 52

Grammar Transformation A programming language s official grammar is not always suitable for use as the base for parser construction, e.g. it might be ambiguous it might contain wrong forms of recursion it might require multiple lookahead tokens Transformation is often required to prepare a grammar for parsing. Example: Eliminating ambiguity in expression grammar. Expr Expr Op Expr "(" Expr ")" id intlit Op "+" "-" "*" "/" PSU CS320 Fall 17 Week 3: Compilers and Interpreters 21 / 52

Grammar Transformation A programming language s official grammar is not always suitable for use as the base for parser construction, e.g. it might be ambiguous it might contain wrong forms of recursion it might require multiple lookahead tokens Transformation is often required to prepare a grammar for parsing. Example: Eliminating ambiguity in expression grammar. Expr Expr Op Expr "(" Expr ")" id intlit Op "+" "-" "*" "/" Expr Expr ("+" "-") Factor Factor Factor Factor "*" "/" Primary Primary Primary "(" Expr ")" id intlit PSU CS320 Fall 17 Week 3: Compilers and Interpreters 21 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: S 1 a B cde 2 a B bcde 3 a bbcde 5 abbcde PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: S 1 a B cde 2 a B bcde 3 a bbcde 5 abbcde S a B c D e PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: S 1 a B cde 2 a B bcde 3 a bbcde 5 abbcde S a B c D e S a B c D e B b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: S 1 a B cde 2 a B bcde 3 a bbcde 5 abbcde S S S a B c D e a B c D e B b a B c D e B b b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Top-Down Parsing Building a syntax tree from top down. Use lookahread to predict the next production to apply. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: S 1 a B cde 2 a B bcde 3 a bbcde 5 abbcde S a B c D e S a B c D e B b S a B c D e B b b S a B c D e B b d b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 22 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S a b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S a b a B b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S a b a B b a B b b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S a b a B b a B b b a B B b b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Parsing Techniques Bottom-Up Parsing Build a syntax tree from bottom up: Find a sequence on the stack that matches a production s right-hand-side. 1. S abcde 2. B Bb 3. B b 4. D Dd 5. D d Input: abbcde Parsing Steps: in a in a b 3 ab in abb 2 ab in abcd 5 abcd in abcde in abc 1 S a b a B b a B b b a B B b b a B c d B b b PSU CS320 Fall 17 Week 3: Compilers and Interpreters 23 / 52

Static Analysis abstract syntax tree Static Checker validated abstract syntax tree PSU CS320 Fall 17 Week 3: Compilers and Interpreters 24 / 52

Static Analysis abstract syntax tree Static Checker validated abstract syntax tree Task: Check that the input program is valid according to the language s static semantics. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 24 / 52

Static Analysis Implementation Traverse the AST and validate every node w.r.t. semantic rules. Non-local information is often needed for the validation (such as variables types); these info is typically maintained in global data structures (i.e. environments). PSU CS320 Fall 17 Week 3: Compilers and Interpreters 25 / 52

Static Analysis Implementation Traverse the AST and validate every node w.r.t. semantic rules. Non-local information is often needed for the validation (such as variables types); these info is typically maintained in global data structures (i.e. environments). Example: // Make sure operands types are legal with respect to the operator. static Ast.Type check(ast.binop n) throws Exception { Ast.Type t1 = check(n.e1); Ast.Type t2 = check(n.e2); if (n.op == Ast.BOP.ADD n.op == Ast.BOP.SUB n.op == Ast.BOP.MUL n.op == Ast.BOP.DIV) { if ((t1 instanceof Ast.IntType) && (t2 instanceof Ast.IntType)) return Ast.IntType; } else if (n.op == Ast.BOP.AND n.op == Ast.BOP.OR) { if ((t1 instanceof Ast.BoolType) && (t2 instanceof Ast.BoolType)) return Ast.BoolType; }... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 25 / 52

IR Code Generation validated abstract syntax tree IR Code Generator IR code Task: Translating the input program into IR code. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 26 / 52

IR Code Generation validated abstract syntax tree IR Code Generator IR code Task: Translating the input program into IR code. An IR (Intermediate Representation) is an internal program representation used by a compiler. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 26 / 52

IR Code Generation validated abstract syntax tree IR Code Generator IR code Task: Translating the input program into IR code. An IR (Intermediate Representation) is an internal program representation used by a compiler. Reasons for using IR: Enables a compiler to analyze and manipulate a program independent of both input-language and target-language constraints. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 26 / 52

IR Code Generation validated abstract syntax tree IR Code Generator IR code Task: Translating the input program into IR code. An IR (Intermediate Representation) is an internal program representation used by a compiler. Reasons for using IR: Enables a compiler to analyze and manipulate a program independent of both input-language and target-language constraints. Provides a favorable environment for optimization. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 26 / 52

IR Code Generation Implementation Traverse the AST and generate IR code for every node. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 27 / 52

IR Code Generation Implementation Traverse the AST and generate IR code for every node. The rules for IR code generation can be formally specified with attribute grammars. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 27 / 52

IR Code Generation Implementation Traverse the AST and generate IR code for every node. The rules for IR code generation can be formally specified with attribute grammars. Example: Exp (Binop OP Exp 1 Exp 2 ) OP + - * / NewTemp t Exp.c := Exp 1.c Exp 2.c "t = Exp 1.v OP Exp 2.v" Exp.v := t Exp.c thegeneratedcode Exp.v Exp s value, or the temp or id holding the value The operator denotes IR code concatenation. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 27 / 52

IR Code Optimization IR code Optimizer improved IR code PSU CS320 Fall 17 Week 3: Compilers and Interpreters 28 / 52

IR Code Optimization Task: IR code Optimizer improved IR code Transforming IR code into a functionally equivalent, but more efficient form. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 28 / 52

IR Code Optimization Task: IR code Optimizer improved IR code Transforming IR code into a functionally equivalent, but more efficient form. Techniques: Pattern matching for local optimization Dataflow analysis for global optimization PSU CS320 Fall 17 Week 3: Compilers and Interpreters 28 / 52

Target Code Generation IR code Code Generator target code PSU CS320 Fall 17 Week 3: Compilers and Interpreters 29 / 52

Target Code Generation IR code Code Generator target code Tasks: Generating target machine code PSU CS320 Fall 17 Week 3: Compilers and Interpreters 29 / 52

Target Code Generation IR code Code Generator target code Tasks: Generating target machine code Performing machine-specific optimization PSU CS320 Fall 17 Week 3: Compilers and Interpreters 29 / 52

Compilation Process Example The source program (toy.c): /* A toy C program */ int main(void) { int a, b, s; printf("enter two integers: "); scanf("%d %d", &a, &b); s = a*a + b*b; printf("%d^2 + %d^2 = %d\n", a, b, s); } PSU CS320 Fall 17 Week 3: Compilers and Interpreters 30 / 52

Actual Memory Content of toy.c In unit of bytes (dumped via the linux od utility): 2F 2A 20 41 20 74 6F 79 20 43 20 70 72 6F 67 72 61 6D 20 2A 2F 0A 69 6E 74 20 6D 61 69 6E 28 76 6F 69 64 29 20 7B 0A 20 20 69 6E 74 20 61 2C 20 62 2C 20 73 3B 0A 20 20 70 72 69 6E 74 66 28 22 45 6E 74 65 72 20 74 77 6F 20 69 6E 74 65 67 65 72 73 3A 20 22 29 3B 0A 20 20 73 63 61 6E 66 28 22 25 64 20 25 64 22 2C 20 26 61 2C 20 26 62 29 3B 0A 20 20 73 20 3D 20 61 2A 61 20 2B 20 62 2A 62 3B 20 20 0A 20 20 70 72 69 6E 74 66 28 22 25 64 5E 32 20 2B 20 25 64 5E 32 20 3D 20 25 64 5C 6E 22 2C 20 61 2C 20 62 2C 20 73 29 3B 0A 7D 0A PSU CS320 Fall 17 Week 3: Compilers and Interpreters 31 / 52

Actual Memory Content of toy.c In unit of bytes (dumped via the linux od utility): 2F 2A 20 41 20 74 6F 79 20 43 20 70 72 6F 67 72 61 6D 20 2A 2F 0A 69 6E 74 20 6D 61 69 6E 28 76 6F 69 64 29 20 7B 0A 20 20 69 6E 74 20 61 2C 20 62 2C 20 73 3B 0A 20 20 70 72 69 6E 74 66 28 22 45 6E 74 65 72 20 74 77 6F 20 69 6E 74 65 67 65 72 73 3A 20 22 29 3B 0A 20 20 73 63 61 6E 66 28 22 25 64 20 25 64 22 2C 20 26 61 2C 20 26 62 29 3B 0A 20 20 73 20 3D 20 61 2A 61 20 2B 20 62 2A 62 3B 20 20 0A 20 20 70 72 69 6E 74 66 28 22 25 64 5E 32 20 2B 20 25 64 5E 32 20 3D 20 25 64 5C 6E 22 2C 20 61 2C 20 62 2C 20 73 29 3B 0A 7D 0A Binary sequences are used to represent all types of information in a computer. We have to assume an encoding scheme in order to interpret the content of any file. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 31 / 52

Interpreting the Content as ASCII Characters / * A t o y C p r o g r a m * / \n i n t m a i n ( v o i d ) { \n i n t a, b, s ; \n p r i n t f ( " E n t e r t w o i n t e g e r s : " ) ; \n s c a n f ( " % d % d ", & a, & b ) ; \n s = a * a + b * b ; \n p r i n t f ( " % d ^ 2 + % d ^ 2 = % d \ n ", a, b, s ) ; \n } \n PSU CS320 Fall 17 Week 3: Compilers and Interpreters 32 / 52

Interpreting the Content as ASCII Characters / * A t o y C p r o g r a m * / \n i n t m a i n ( v o i d ) { \n i n t a, b, s ; \n p r i n t f ( " E n t e r t w o i n t e g e r s : " ) ; \n s c a n f ( " % d % d ", & a, & b ) ; \n s = a * a + b * b ; \n p r i n t f ( " % d ^ 2 + % d ^ 2 = % d \ n ", a, b, s ) ; \n } \n This is the actual input to a compiler. The compiler will read from an input program file one character at a time. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 32 / 52

Lexing the Toy Program Input: / * A t o y C p r o g r a m * / \n i n t m a i n ( v o i d ) { \n i n t a, b, s ; \n p r i n t f ( " E n t e r t w o i n t e g e r s : " ) ; \n s c a n f ( " % d % d ", & a, & b ) ; \n s = a * a + b * b ; \n p r i n t f ( " % d ^ 2 + % d ^ 2 = % d \ n ", a, b, s ) ; \n } \n PSU CS320 Fall 17 Week 3: Compilers and Interpreters 33 / 52

Lexing the Toy Program Input: / * A t o y C p r o g r a m * / \n i n t m a i n ( v o i d ) { \n i n t a, b, s ; \n p r i n t f ( " E n t e r t w o i n t e g e r s : " ) ; \n s c a n f ( " % d % d ", & a, & b ) ; \n s = a * a + b * b ; \n p r i n t f ( " % d ^ 2 + % d ^ 2 = % d \ n ", a, b, s ) ; \n } \n Processing: Input chars Action --------------------------------------------- / * A t o y... * / skip \n skip i n t return token INT skip m a i n return token MAIN ( return token LPAREN v o i d return token VOID ) return token RPAREN skip { return token LBRACE \n skip skip... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 33 / 52

Lexing the Toy Program Input: / * A t o y C p r o g r a m * / \n i n t m a i n ( v o i d ) { \n i n t a, b, s ; \n p r i n t f ( " E n t e r t w o i n t e g e r s : " ) ; \n s c a n f ( " % d % d ", & a, & b ) ; \n s = a * a + b * b ; \n p r i n t f ( " % d ^ 2 + % d ^ 2 = % d \ n ", a, b, s ) ; \n } \n Processing: Input chars Action --------------------------------------------- / * A t o y... * / skip \n skip i n t return token INT skip m a i n return token MAIN ( return token LPAREN v o i d return token VOID ) return token RPAREN skip { return token LBRACE \n skip skip... Output: INT ID(b) MAIN ) ( ; VOID Id(s) ) = { ID(a) INT * ID(a) ID(a), + ID(b) ID(b), * ID(s) ID(b) ; ; ID(printf) ID(printf) ( ( STRLIT("Ent..") STRLIT("%d^2..") ), ; ID(a) ID(scanf), ( ID(b) STRLIT("%d %d"),, Id(s) & ) ID(a), ; & } PSU CS320 Fall 17 Week 3: Compilers and Interpreters 33 / 52

Parsing the Toy Program Input: INT ID(b) MAIN ) ( ; VOID Id(s) ) = { ID(a) INT * ID(a) ID(a), + ID(b) ID(b), * ID(s) ID(b) ; ; ID(printf) ID(printf) ( ( STRLIT("Ent..") STRLIT("%d^2..") ), ; ID(a) ID(scanf), ( ID(b) STRLIT("%d %d"),, Id(s) & ) ID(a), ; & } PSU CS320 Fall 17 Week 3: Compilers and Interpreters 34 / 52

Parsing the Toy Program Input: Output: program INT ID(b) MAIN ) ( ; VOID Id(s) ) = { ID(a) INT * ID(a) ID(a), + ID(b) ID(b), * ID(s) ID(b) ; ; ID(printf) ID(printf) ( ( STRLIT("Ent..") STRLIT("%d^2..") ), ; ID(a) ID(scanf), ( ID(b) STRLIT("%d %d"),, Id(s) & ) ID(a), ; & } decls func-decl INT main null decls stmts var-decl call-stmt call-stmt assign call-stmt...... ID(printf) args...... ID(scanf) args...... lvalue expr...... ID(printf) args STRLIT(... ) expr expr expr ID(a) ID(b) ID(s) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 34 / 52

Performing Static Analysis program decls func-decl INT main null decls stmts var-decl call-stmt call-stmt assign call-stmt...... ID(printf) args...... ID(scanf) args...... lvalue expr...... ID(printf) args STRLIT(... ) expr expr expr ID(a) ID(b) ID(s) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 35 / 52

Performing Static Analysis program decls func-decl INT main null decls stmts var-decl call-stmt call-stmt assign call-stmt...... ID(printf) args...... ID(scanf) args...... lvalue expr...... ID(printf) args Verified! STRLIT(... ) expr expr expr ID(a) ID(b) ID(s) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 35 / 52

IR Code Example Register-machine IR code t1 = malloc (8) L0: a = t1 flag = t7 t2 = -2 if flag == false goto L2 t3 = t2 * 3 t12 = 0 * 4 t4 = 1 + t3 t13 = a + t12 t5 = 0 * 4 t14 = [t13] t6 = a + t5 t15 = 1 * 4 [t6] = t4 t16 = a + t15 t7 = false [t16] = t14 if true == false goto L0 goto L3 t8 = 0 * 4 L2: t9 = a + t8 t17 = 1 * 4 t10 = [t9] t18 = a + t17 t11 = true [t18] = 0 if t10 < 0 goto L1 L3: t11 = false t19 = 1 * 4 L1: t20 = a + t19 if t11 == false goto L0 t21 = [t20] t7 = true print (t21) PSU CS320 Fall 17 Week 3: Compilers and Interpreters 36 / 52

IR Code Example Stack-machine IR code 0. CONST 2 14. CONST 0 27. LOAD 0 1. NEWARRAY 15. ALOAD 28. CONST 0 2. STORE 0 16. CONST 0 29. ALOAD 3. LOAD 0 17. IFLT +3 30. ASTORE 4. CONST 0 18. CONST 0 31. GOTO +5 5. CONST 1 19. GOTO +2 32. LOAD 0 6. CONST 2 20. CONST 1 33. CONST 1 7. NEG 21. AND 34. CONST 0 8. CONST 3 22. STORE 1 35. ASTORE 9. MUL 23. LOAD 1 36. LOAD 0 10. ADD 24. IFZ +8 37. CONST 1 11. ASTORE 25. LOAD 0 38. ALOAD 12. CONST 1 26. CONST 1 39. PRINT 13. LOAD 0 PSU CS320 Fall 17 Week 3: Compilers and Interpreters 37 / 52

Local Optimization Example Analyze and transform a few adjacent IR instructions at a time. Example: Original: t1 = malloc (8) a = t1 t2 = -2 t3 = t2 * 3 t4 = 1 + t3 t5 = 0 * 4 t6 = a + t5 [t6] = t4 Optimized: a = malloc (8) [a] = -5 PSU CS320 Fall 17 Week 3: Compilers and Interpreters 38 / 52

Local Optimization Example Analyze and transform a few adjacent IR instructions at a time. Example: Original: t1 = malloc (8) a = t1 t2 = -2 t3 = t2 * 3 t4 = 1 + t3 t5 = 0 * 4 t6 = a + t5 [t6] = t4 Optimized: a = malloc (8) [a] = -5 Optimizations Performed: constant folding, constant propagation, copy instruction elimination PSU CS320 Fall 17 Week 3: Compilers and Interpreters 38 / 52

Global Optimization Example Perform dataflow analysis over the program s control-flow graph. t6 := 4*i x := a[t6] t8 := 4*j t9 := a[t8] a[t6] := t9 i := m-1 j := n t1 := 4*n v := a[t1] i := i+1 t2 := 4*i t3 := a[t2] if t3<v goto B2 j := j-1 t4 := 4*j t5 := a[t4] if t5>v goto B3 if i>=j goto B6 B5 B1 B2 B3 B4 t11 := 4*i x := a[t11] t13 := 4*n t14 := a[t13] a[t11] := t14 B6 x := t3 a[t2] := t5 a[t4] := x goto B2 i := m-1 j := n t1 := 4*n v := a[t1] i := i+1 t2 := 4*i t3 := a[t2] if t3<v goto B2 j := j-1 t4 := 4*j t5 := a[t4] if t5>v goto B3 if i>=j goto B6 B5 B1 B2 B3 B4 x := t3 t14 := a[t1] a[t2] := t14 a[t1] := x B6 a[t8] := x goto B2 a[t13] := x After Before PSU CS320 Fall 17 Week 3: Compilers and Interpreters 39 / 52

Target Code for the Toy Program (SPARC).file "toy.c" gcc2_compiled.:.section ".rodata".align 8.LLC0:.asciz "Enter two integers: ".align 8.LLC1:.asciz "%d %d".global.umul.align 8.LLC2:.asciz "%d^2 + %d^2 = %d\n".section ".text".align 4.global main.type main,#function.proc 04 main:!#prologue# 0 save %sp, -128, %sp!#prologue# 1 sethi %hi(.llc0), %o1 or %o1, %lo(.llc0), %o0 call printf, 0 nop add %fp, -20, %o1 add %fp, -24, %o2 sethi %hi(.llc1), %o3 or %o3, %lo(.llc1), %o0 call scanf, 0 nop ld [%fp-20], %o0 ld [%fp-20], %o1 call.umul, 0 nop mov %o0, %l0 ld [%fp-24], %o0 ld [%fp-24], %o1 call.umul, 0 nop add %l0, %o0, %o1 st %o1, [%fp-28] sethi %hi(.llc2), %o1 or %o1, %lo(.llc2), %o0 ld [%fp-20], %o1 ld [%fp-24], %o2 ld [%fp-28], %o3 call printf, 0 nop.ll2: ret restore.llfe1:.size main,.llfe1-main.ident "GCC: (GNU) 2.95.2 19991024 (release)" PSU CS320 Fall 17 Week 3: Compilers and Interpreters 40 / 52

Target Code for the Toy Program (IA32) LC0: LC1: LC2:.file "toy.c".def main;.scl 2;.type 32;.endef.section.rdata,"dr".ascii "Enter two integers: \0".ascii "%d %d\0".ascii "%d^2 + %d^2 = %d\12\0".text.globl _main.def _main;.scl 2;.type 32;.endef _main: pushl %ebp movl %esp, %ebp subl $40, %esp andl $-16, %esp movl $0, %eax addl $15, %eax addl $15, %eax shrl $4, %eax sall $4, %eax movl %eax, -16(%ebp) movl -16(%ebp), %eax call alloca call main movl $LC0, (%esp) call _printf leal -8(%ebp), %eax movl %eax, 8(%esp) leal -4(%ebp), %eax movl %eax, 4(%esp) movl $LC1, (%esp) call _scanf movl -4(%ebp), %eax movl %eax, %edx imull -4(%ebp), %edx movl -8(%ebp), %eax imull -8(%ebp), %eax leal (%edx,%eax), %eax movl %eax, -12(%ebp) movl -12(%ebp), %eax movl %eax, 12(%esp) movl -8(%ebp), %eax movl %eax, 8(%esp) movl -4(%ebp), %eax movl %eax, 4(%esp) movl $LC2, (%esp) call _printf leave ret.def _scanf;.scl 3;.type 32;.endef.def _printf;.scl 3;.type 32;.endef PSU CS320 Fall 17 Week 3: Compilers and Interpreters 41 / 52

Final Executable Code 7F45 4C46 0102 0100 0000 0000 0000 0000 0002 0002 0000 0001 0001 04A0 0000 0034 0000 1474 0000 0000 0034 0020 0005 0028 001B 0019 0000 0006 0000 0034 0001 0034 0000 0000 0000 00A0 0000 00A0 0000 0005 0000 0000 0000 0003 0000 00D4 0000 0000 0000 0000 0000 0011 0000 0000 0000 0004 0000 0000 0000 0001 0000 0000 0001 0000 0000 0000 0000 0782 0000 0782 0000 0005 0001 0000 0000 0001 0000 0784 0002 0784 0000 0000 0000 0188 0000 01A4 0000 0007 0001 0000 0000 0002 0000 0838 0002 0838 0000 0000 0000 00B8 0000 0000 0000 0007 0000 0000 2F75 7372 2F6C 6962 2F6C 642E 736F 2E31 0000 0000 0000 0017 0000 0016 0000 0000 0000 0001 0000 0002 0000 0000 0000 0003 0000 0004 0000 0006 0000 0007 0000 0009 0000 0000 0000 000A 0000 000C 0000 000D 0000 000E 0000 0000 0000 0000 0000 000F 0000 0010 0000 0000 0000 0011 0000 0012 0000 0013 0000 0015 0000 0000 0000 0000 0000 0000 0000 0000 0000 0005 0000 0000 0000 0000 0000 0008 0000 0000 0000 0000 0000 000B 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0014 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 0002 0924 0000 0004 1100 0014 0000 000A 0001 0782 0000 0000 1100 000C 0000 0011 0002 07A4 0000 0000 1100 000E 0000 002B 0002 07D4 0000 0000 1200 0000 0000 0032 0002 081C 0000 0000 1200 0000 0000 0038 0000 0000 0000 0000 2000 0000 0000 004E 0002 0928 0000 0000 1100 0014 0000 0053 0002 0838 0000 0000 1100 000F 0000 005C 0002 0924 0000 0004 2100 0014 0000 0064 0002 0810 0000 0000 1200 0000 0000 006B 0002 0784 0000 0000 1100 000D 0000 0081 0002 0828 0000 0000 1200 0000 0000 0087 0002 07E0 0000 0000 1200 0000 0000 008C 0002 090C 0000 0000 1100 0013 0000 0093 0002 07EC 0000 0000 1200 0000 0000 0099 0001 0714 0000 001C 1200 000A 0000 009F 0001 0730...... A full cycle the executable file s content is just another binary sequence. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 42 / 52

Back to the Top-Level... Source Program Compiler Target Program diagnostics Question: How is the compiler program itself written and compiled? In particular, in what language? PSU CS320 Fall 17 Week 3: Compilers and Interpreters 43 / 52

Writing and Compiling the Compiler Approach 1. Use an existing language and compiler. Source Program L L Compiler Target Program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 44 / 52

Writing and Compiling the Compiler Approach 1. Use an existing language and compiler. Source Program L L Compiler Target Program L Compiler C GCC L Compiler.exe PSU CS320 Fall 17 Week 3: Compilers and Interpreters 44 / 52

Writing and Compiling the Compiler Approach 2. Cross Compiling Use an existing compiler to generate executable code for a different target machine. Source Program L L Compiler x86-64.exe Target Program x86-64.exe PSU CS320 Fall 17 Week 3: Compilers and Interpreters 45 / 52

Writing and Compiling the Compiler Approach 2. Cross Compiling Use an existing compiler to generate executable code for a different target machine. Source Program L L Compiler x86-64.exe Target Program x86-64.exe L Compiler L L Compiler IA-32.exe L Compiler x86-64.exe PSU CS320 Fall 17 Week 3: Compilers and Interpreters 45 / 52

Writing and Compiling the Compiler Approach 3. Bootstrapping Use an existing compiler for a simpler version of the source language. Source Program L L Compiler Target Program PSU CS320 Fall 17 Week 3: Compilers and Interpreters 46 / 52

Writing and Compiling the Compiler Approach 3. Bootstrapping Use an existing compiler for a simpler version of the source language. Source Program L L Compiler Target Program L Compiler L L Compiler L Compiler.exe L is a simpler version of L. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 46 / 52

Writing and Compiling the Compiler Approach 3. Bootstrapping Use an existing compiler for a simpler version of the source language. Source Program L L Compiler Target Program L Compiler L L Compiler L Compiler.exe L is a simpler version of L. L Compiler L L Compiler L Compiler.exe L is a simpler version of L. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 46 / 52

Bootstrapping (cont.) Following the chain of languages and compilers, L, L, L,..., the compiler for the first version of the language (i.e. aminimalcore)is then written in a different language, such as an assembly. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 47 / 52

Bootstrapping (cont.) Following the chain of languages and compilers, L, L, L,..., the compiler for the first version of the language (i.e. aminimalcore)is then written in a different language, such as an assembly. Many programming languages compilers are bootstrapped: BASIC, Lisp, Algol, C, Pascal, PL/I, Scheme, Java, Python, Modula-2, Oberon, Haskell, OCaml, Go, Rust, Scala,... PSU CS320 Fall 17 Week 3: Compilers and Interpreters 47 / 52

Interpreter Overview Source Program Interpreter execution diagnostics PSU CS320 Fall 17 Week 3: Compilers and Interpreters 48 / 52

Interpreter Overview Source Program Interpreter execution diagnostics An interpreter runs a program. It reads and analyzes a source program, then performs the operations implied by the program. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 48 / 52

Interpreter Overview For simple languages, an interpreter reads and executes one statement at a time. Source Program Interpreter while ( stmts) { read next stmt; execute this stmt; } execution PSU CS320 Fall 17 Week 3: Compilers and Interpreters 49 / 52

Interpreter Overview For simple languages, an interpreter reads and executes one statement at a time. Source Program Interpreter while ( stmts) { read next stmt; execute this stmt; } execution For complex languages, an interpreter may read and convert the source program into an internal AST, then executes from the AST. Source Program Interpreter Parser AST execution PSU CS320 Fall 17 Week 3: Compilers and Interpreters 49 / 52

Common Interpreter Characteristics In comparison, interpreters are generally easier to write and are more portable than compilers; while program execution through compiled code is generally faster than through interpretation. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 50 / 52

Common Interpreter Characteristics In comparison, interpreters are generally easier to write and are more portable than compilers; while program execution through compiled code is generally faster than through interpretation. Interpreters put more emphasis on interactive use. Most interpreters support the read-eval-print loop (REPL). PSU CS320 Fall 17 Week 3: Compilers and Interpreters 50 / 52

Common Interpreter Characteristics In comparison, interpreters are generally easier to write and are more portable than compilers; while program execution through compiled code is generally faster than through interpretation. Interpreters put more emphasis on interactive use. Most interpreters support the read-eval-print loop (REPL). Interpreters can be used to specify programming language semantics. PSU CS320 Fall 17 Week 3: Compilers and Interpreters 50 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 Prelude> let x = 5 Prelude> x + 1 6 :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 Prelude> let x = 5 Prelude> x + 1 6 Prelude> let x = 3 Prelude> x + 1 4 :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 Prelude> let x = 5 Prelude> x + 1 6 Prelude> let x = 3 Prelude> x + 1 4 Prelude> let y = x * x Prelude> y 9 :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 Prelude> let x = 5 Prelude> x + 1 6 Prelude> let x = 3 Prelude> x + 1 4 Prelude> let y = x * x Prelude> y 9 Prelude> reverse "abcd" "dcba" :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52

The REPL Environment Example: AHaskellinterpretersession: linux> ghci GHCi, version 7.10.3: http://www.haskell.org/ghc/ Prelude> 1+2*3 7 Prelude> let x = 5 Prelude> x + 1 6 Prelude> let x = 3 Prelude> x + 1 4 Prelude> let y = x * x Prelude> y 9 Prelude> reverse "abcd" "dcba" Prelude> :q Leaving GHCi. linux> :? for help PSU CS320 Fall 17 Week 3: Compilers and Interpreters 51 / 52