Compilers and interpreters

Similar documents
CST-402(T): Language Processors

Working of the Compilers

LECTURE 3. Compiler Phases

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

About the Authors... iii Introduction... xvii. Chapter 1: System Software... 1

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

G Programming Languages - Fall 2012

COMPILER DESIGN LECTURE NOTES

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Why are there so many programming languages? Why do we have programming languages? What is a language for? What makes a language successful?

CS A331 Programming Language Concepts

Compiler Construction D7011E

Compiling Techniques

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CS 415 Midterm Exam Spring 2002

Why are there so many programming languages?

1. INTRODUCTION TO LANGUAGE PROCESSING The Language Processing System can be represented as shown figure below.

CS606- compiler instruction Solved MCQS From Midterm Papers

Yacc: A Syntactic Analysers Generator

COMPILER CONSTRUCTION Seminar 02 TDDB44

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

programming languages need to be precise a regular expression is one of the following: tokens are the building blocks of programs

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Early computers (1940s) cost millions of dollars and were programmed in machine language. less error-prone method needed

Formats of Translated Programs

4. An interpreter is a program that

Pioneering Compiler Design

Informatica 3 Syntax and Semantics

What do Compilers Produce?

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

Programming Languages Third Edition. Chapter 9 Control I Expressions and Statements

CSE 3302 Programming Languages Lecture 2: Syntax

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

The Structure of a Syntax-Directed Compiler

Semantic Analysis. Lecture 9. February 7, 2018

Appendix A The DL Language

Programmiersprachen (Programming Languages)

CS 321 IV. Overview of Compilation

Lexical Considerations

When do We Run a Compiler?

Syntax. A. Bellaachia Page: 1

Section A. A grammar that produces more than one parse tree for some sentences is said to be ambiguous.

SYED AMMAL ENGINEERING COLLEGE (An ISO 9001:2008 Certified Institution) Dr. E.M. Abdullah Campus, Ramanathapuram

1 Lexical Considerations

PRINCIPLES OF COMPILER DESIGN UNIT I INTRODUCTION TO COMPILERS

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

SE352b: Roadmap. SE352b Software Engineering Design Tools. W3: Programming Paradigms

CS1622. Semantic Analysis. The Compiler So Far. Lecture 15 Semantic Analysis. How to build symbol tables How to use them to find

The role of semantic analysis in a compiler

Lexical Considerations

Chapter 9 :: Subroutines and Control Abstraction

COSC121: Computer Systems: Runtime Stack

About the Tutorial. Audience. Prerequisites. Copyright & Disclaimer. Compiler Design

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Theoretical Part. Chapter one:- - What are the Phases of compiler? Answer:

Faculty of Electrical Engineering, Mathematics, and Computer Science Delft University of Technology

Writing Evaluators MIF08. Laure Gonnord

Principle of Complier Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

Compiler, Assembler, and Linker

COMPILER DESIGN. For COMPUTER SCIENCE

Static Semantics. Lecture 15. (Notes by P. N. Hilfinger and R. Bodik) 2/29/08 Prof. Hilfinger, CS164 Lecture 15 1

Have examined process Creating program Have developed program Written in C Source code

Introduction. Interpreters High-level language intermediate code which is interpreted. Translators. e.g.

Appendix Set Notation and Concepts

Chapter 11 Introduction to Programming in C

22c:111 Programming Language Concepts. Fall Syntax III

Lecture 1: Course Introduction

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

Chapter 2 :: Programming Language Syntax

Programming Languages, Summary CSC419; Odelia Schwartz

Chapter 11 Introduction to Programming in C

CD Assignment I. 1. Explain the various phases of the compiler with a simple example.

Programming Languages

1. The output of lexical analyser is a) A set of RE b) Syntax Tree c) Set of Tokens d) String Character

ECE251 Midterm practice questions, Fall 2010

Principles of Compiler Design Prof. Y. N. Srikant Department of Computer Science and Automation Indian Institute of Science, Bangalore

CA Compiler Construction

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Chapter 3: Lexing and Parsing

Compiler construction in4020 lecture 5

Lecture 16: Static Semantics Overview 1

IPCoreL. Phillip Duane Douglas, Jr. 11/3/2010

UNIVERSITY OF CALIFORNIA Department of Electrical Engineering and Computer Sciences Computer Science Division. P. N. Hilfinger

Chapter 11 Introduction to Programming in C

Run-time Environments. Lecture 13. Prof. Alex Aiken Original Slides (Modified by Prof. Vijay Ganesh) Lecture 13

A Tour of Language Implementation

Introduction to Programming Languages. CSE 307 Principles of Programming Languages Stony Brook University

Chapter 11 Introduction to Programming in C

CS 345. Functions. Vitaly Shmatikov. slide 1

5. Semantic Analysis!

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

Undergraduate Compilers in a Day

Comp 411 Principles of Programming Languages Lecture 3 Parsing. Corky Cartwright January 11, 2019

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Transcription:

Lecture 5 Compilers and interpreters FAMNIT March, 2018. 1

Slides Lecture by Jan Karabaš Compilers and interpreters Programming II FAMNIT, 2015 Additional literature: Michael L. Scott, Programming Language Pragmatics (3rd ed.), Elsevier, 2009 (Chapters 1 and 2).

Topics What are compilers are interpreters? History Architecture of compiler Lexical analysis Grammars Parsing Semantic analysis Data and flow analysis Intermediate optimization Code generation Linking and archiving

Why compilers? Why interpreters? Source code: Program is mostly represented by an ordinary text file, which cannot be directly executed by target machine. Executable code: The form of the program which can be directly executed by a computer. Usual targets of compilers/interpreters machine code, in fact a form of intermediate code; bytecode; runnable file; other programming language; cross compiling.

Targets Machine code: bound to the particular architecture and the host operating system, C, C++ (gcc), Fortran (gfortran), Pascal (fpc), OCaml (ocamlc),... Bytecode: prepared for the running in the environment of the specific virtual machine, hence (mostly) machine and operating system independent, Java (javac), Python (python), Erlang (erl), C# (.NET), but also C and C++ (LLVM or.net)... Runnable source: the source code is directly executed, Unix shells (bash, csh, zsh), BASIC (many, but not Visual Basic), Web (tangle);

Targets Programming languages: Fortran to C (f2c), Python to C (cpython); Cross-compilers: applications for mobile phones and other gadgets are compiled in this way, ObjC for ios, C++ for Android, Windows Phone, NVida CUDA, C for OpenCL...

Bits of history 30-40 s: one-purpose computers, hard-wired programs and data (ZuSe, ENIAC); 1952: G. Hopper, A-0 compiler, rather linker and loader than compiler; 1952: A. Glennie, autocode, kind of simple assembler; 1957: J. Backus, Fortran, first real compiler, executable code better than the same written in assembly code; 1958: Bauer et al., Algol 58, first modern compiler; 60 s: IBM s crosscompilers for IBM 7xx architecture (e.g. from UNIVAC) 1962: Hart & Levin: LISP, first self-hosting compiler;

Bits of history 1968-1972: UNIX, C, lex, yacc, also make, sh, grep... 70's: Pascal, Niklaus Wirth, Zürich, CDC Pascal 70's: SmallTalk, first just-in-time compilers; 80's: Erlang, C++, Modula, Ada, Perl, Objective-C 90 s: Haskell, Python, Ruby, Java; 00 s: mobile architectures (ARM, Intel,... )

Structure and action of a compiler The job of a compiler usually consists of i. Front-end: Line reconstruction, Lexical analysis, Preprocessing, Syntax analysis, Semantic analysis; ii. Back-end: Flow (data and execution) analysis, Intermediate optimisation, Code generation + target-dependent optimisation; iii.linking (optional)

Phases of compilation

Line reconstruction UNICODE characters are substituted by escapesequences, e.g. č \x10d; tabulators are substituted by delimiters (usually spaces); end-of-line characters are substituted by delimiters; string literals are merged; empty lines are removed.

Lexical analysis Normalised text form of program is divided into the list of tokens, the lexical units of the languaguage. The usual types of tokens are: literals: 12345, 0xAB, "hello world", c, true... keywords: if, else, for, function, return... identifiers: var1, a$, MyClass,... type names: int, char*, a, void... operators: =, <-, +, :=, ==, >=,... parenthesis, braces, and brackets; indentations (old Fortran, Python); The sequence of tokens is produced, the delimiters are removed.

Lexical analysis: Rationale Lexical analyser reads the source by characters and if a token is recognised, it is appended to the sequence of tokens; Very tedious and complicated, regular expressions are used instead; Regular expressions are kind of abstract computers (DFAs), which recognise prescribed patterns in the text; Regular expressions are hungry, they match the longest possible text; Lexical analysers are constructed using specialised compilers - lexers (lex, flex); Lexers used to produce the source code of the lexical analyser in a high-level programming language (C, Java, OCaml, Pascal, Python).

Lexical analysis: Examples Regular expressions Regular expressions are hungry, The RE 0x[0-9a-fA-F]+ recognise in string 0x109AB a hexadecimal constant instead of <const><id>. Recognising tokens

Preprocessing #ifndef SET_H #define SET_H #define max(a,b) ((a) > (b)? (a) : (b)) Comment blocks and comment lines are thrown away; Many languages use preprocessor macros - special language for text processing; Macros are expanded at this moment; Conditional compilation is evaluated and omitted blocks of tokens are removed; Auxiliary files (header files, in C, C++) are placed instead of corresponding macros (#include); The lexical analyser is run again.

Grammars of programming languages Programming languages have simple syntax which can be described in terms of context-free grammars. Context-free grammar is a mathematical structure of the form G = (V, T, R, S) where V is the set of non-terminals (variables), T is the set of tokens (symbols), R is the set of production rules, S is the starting non-terminal..

Grammars of programming languages Example: Language of binary words consisting of equal number of 0 s and 1 can be expressed by grammar G = ({S}, {0, 1}, R, S), where R is the set of rules S 01 10 0S1 1S0

Backus-Naur form A convenient way of expressing (context-free) grammars of programming languages. Example: The simple language of mathematical expressions: <expression> ::= <term> <term> "+" <expression> <term> ::= <factor> <term> "*" <factor> <factor> ::= <constant> <variable> "(" <expression> ")" <variable> ::= "x" "y" "z" <constant> ::= <digit> <digit> <constant>

Syntax analysis - parsing Parser is the algorithm which takes the ordered list of tokens and recognise which rules of the grammar form input list of tokens. The lexical tree of the input is created. The top (base, root) vertex corresponds to the starting non-terminal (expression), nodes correspond to the rules, and leaves correspond to terminals (tokens). Every expression in the source yields a lexical tree.

Syntax analysis - parsing The process of construction of a lexical tree is ambiguous, since CFGs are ambiguous. We have several methods to get rid of ambiguities: orderting of rules in the grammar (precedence); look-ahead buffer for tokens (LL-parsers); read from left, parse from right (LR, LALR parsers)

Example: Parse tree <expression> ::= <term> <term> "+" <expression> <term> ::= <factor> <term> "*" <factor> <factor> ::= <constant> <variable> "(" <expression> ")" <variable> ::= "x" "y" "z" <constant> ::= <digit> <digit> <constant> <expression> Expression: x*10+1 <term> "+" <expression> <term> "*" <factor> <term> <factor> <constant> <factor> <variable> 10 1 x

Example: Parsing tree

Semantic analysis Semantic analysis uses parse tree as input Additional information is added to parse tree to get annotated parse tree Parse tree = concrete syntax tree Semantic analysis transforms concrete syntax tree to abstract syntax tree (AST)

Semantic analysis Type checking The type for every (terminal) token is inferred and memory requirements are computed; The type of every subtree is inferred, the temporary variables are created for every subexpression; Symbol table Maps each identifier to the information known about it The identifier s type, internal structure (if any), and scope (the portion of the program in which it is valid)

Semantic analysis Semantic analyzer enforces a large variety of rules not captured by the hierarchical structure of the context-free grammars and parse trees Static and dynamic semantic rules Static are checked in compile time Code is generated for dynamic checks

Semantic analysis Typical static rules covered by semantic analyser Every identifier is declared before it is used. No identifier is used in an inappropriate context (calling an integer as a subroutine, adding a string to an integer, referencing a field of the wrong type of struct, etc.). Subroutine calls provide the correct number and types of arguments. Labels on the arms of a switch statement are distinct constants. Any function with a non- void return type return s a value explicitly.

Semantic analysis Examples of rules enforced at run time include the following. Variables are never used in an expression unless they have been given a value. Pointers are never dereferenced unless they refer to a valid object. Array subscript expressions lie within the bounds of the array. Arithmetic operations do not overflow.

Semantic analysis Every node of the lexical tree (a rule of CFG) is represented by a template code The parsing tree is transformed to the linear sequence of instructions (codes); The templates are filled by actual variables and values; The codes for all expression in the program are merged into the resulting sequence of instructions. Pre-intermediate code Sequence of instructions Annotated abstract syntax tree

Data and flow analysis From this moment the architecture of the target determine the result of a particular step. Temporary variables are collected and it is decided, whether there are duplicates in requests for memory The consequent sequences of instructions are analysed and if recognised as repeating, the duplicates are removed and substituted e.g. by reference (jump, loop); The relative addresses in the code and in the program s memory are computed; Debug hooks and profiling information is recorded. The resulting code is called the intermediate code and can be run in a intepreter.

Intermediate optimisation The intermediate code is transformed into functionally equivalent but faster (or smaller) forms. Some kinds of optimisations: inline expansion: inline code is cheaper than function call; dead code elimination: e.g. false branches of conditionals; constant propagation: it is cheaper to allocate a fixed block of memory for constants at the beginning; loop transformation: interchange, vectorisation, reversal, removing loop-invariant code; register allocation: often used variables are held in registers, if possible; automatic parallelisation: processors have several cores, independent code segments can be executed at once.

Code generation The previous steps of back-end are processed repeatedly to obtain the optimal form of the intermediate code. The result of the procedure is translated to objective code. The objective code can be processed as follows: compiler checks whether code refers to existing resources and write bytecode for a virtual machine; can be translated to a different programming language; can be written to assembly code for the target; is sent to linker to create a part of a library or an executable code.

Linking and archiving: creating libraries Library is the collection of functions, data types, and/or resources for reusing when executables are created. We recognise (usually) two types: static library - archives of objective code of functions, data types and resources. It is endowed by kind of index. It is used by linker when it is referred; the objective code (of requested functions) and resources is directly put into the executable file. Suffixes: On Unix systems: *.a, *.la, Windows *.lib dynamic library - except index it contains a loader code and initialisation routines. It is loaded (linked) when a dependent executable is running. Suffixes: on Linux, BSD: *.so, MacOSX: *.dylib, * on Windows: *.dll

Linking: creating executables Executable is the form of the program, which can be loaded into memory and executed. Final stage of linking consists of the following steps: all object codes of the corresponding files are merged into one block; object code of library functions is added; loading code of dynamic libraries is added; all relative addresses in the objective code are recomputed to absolute (final) values; loading code for the executable is added; result is written into a file - on Windows has usually suffix *.exe.