Compilers and computer architecture: introduction

Similar documents
Finding People and Information (1) G53CMP: Lecture 1. Aims and Motivation (1) Finding People and Information (2)

CA Compiler Construction

G53CMP: Lecture 1. Administrative Details 2016 and Introduction to Compiler Construction. Henrik Nilsson. University of Nottingham, UK

COMP 3002: Compiler Construction. Pat Morin School of Computer Science

Lecture 1: Introduction

Compilers and computer architecture From strings to ASTs (2): context free grammars

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

Philadelphia University Faculty of Information Technology Department of Computer Science --- Semester, 2007/2008. Course Syllabus

Compiler Design. Dr. Chengwei Lei CEECS California State University, Bakersfield

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

Compiling Techniques

Who are we? Andre Platzer Out of town the first week GHC TAs Alex Crichton, senior in CS and ECE Ian Gillis, senior in CS

Compiler Construction LECTURE # 1

General Course Information. Catalogue Description. Objectives

CSCI 565 Compiler Design and Implementation Spring 2014

Course introduction. Advanced Compiler Construction Michel Schinz

Compiler Construction

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS 4120 and 5120 are really the same course. CS 4121 (5121) is required! Outline CS 4120 / 4121 CS 5120/ = 5 & 0 = 1. Course Information

Compiler Construction D7011E

Project Compiler. CS031 TA Help Session November 28, 2011

Compiler Construction

G.PULLAIH COLLEGE OF ENGINEERING & TECHNOLOGY

CMPE 152 Compiler Design

ECE573 Introduction to Compilers & Translators

Compiler Construction 2010 (6 credits)

CS426 Compiler Construction Fall 2006

CS Compiler Construction West Virginia fall semester 2014 August 18, 2014 syllabus 1.0

Welcome to CSE131b: Compiler Construction

CMPE 152 Compiler Design

Introduction to Compilers and Language Design Copyright (C) 2017 Douglas Thain. All rights reserved.

Compilers and computer architecture: Semantic analysis

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

CS/SE 153 Concepts of Compiler Design

CS153: Compilers Lecture 1: Introduction

Administrivia. Compilers. CS143 10:30-11:50TT Gates B01. Text. Staff. Instructor. TAs. The Purple Dragon Book. Alex Aiken. Aho, Lam, Sethi & Ullman

Compiler Construction D7011E

15-411/ Compiler Design

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

Compilers and Code Optimization EDOARDO FUSELLA

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

CS 432 Fall Mike Lam, Professor. Compilers. Advanced Systems Elective

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

CSE4305: Compilers for Algorithmic Languages CSE5317: Design and Construction of Compilers

Introduction to Syntax Analysis. Compiler Design Syntax Analysis s.l. dr. ing. Ciprian-Bogdan Chirila

LECTURE NOTES ON COMPILER DESIGN P a g e 2

Compiler Construction Lecture 1: Introduction Winter Semester 2018/19 Thomas Noll Software Modeling and Verification Group RWTH Aachen University

CMPE 152 Compiler Design

Working of the Compilers

LECTURE 3. Compiler Phases

Compiler Construction

What is a Compiler? Compiler Construction SMD163. Why Translation is Needed: Know your Target: Lecture 8: Introduction to code generation

! Broaden your language horizons! Different programming languages! Different language features and tradeoffs. ! Study how languages are implemented

What do Compilers Produce?

INTRODUCTION PRINCIPLES OF PROGRAMMING LANGUAGES. Norbert Zeh Winter Dalhousie University 1/10

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

ST. XAVIER S COLLEGE

Compiler Construction Lecture 1: Introduction Summer Semester 2017 Thomas Noll Software Modeling and Verification Group RWTH Aachen University

Undergraduate Compilers in a Day

Compiler Construction

Compiler Construction

CS/SE 153 Concepts of Compiler Design

The Environment Model. Nate Foster Spring 2018

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compiler construction

CS131: Programming Languages and Compilers. Spring 2017

Compilers - Chapter 2: An introduction to syntax analysis (and a complete toy compiler)

Compiler Design (40-414)

High Performance Computing MPI and C-Language Seminars 2009

Introduction to Compilers and Language Design

CSCI 2041: Advanced Language Processing

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers Crash Course

CST-402(T): Language Processors

Compiler Optimisation

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

CS 406/534 Compiler Construction Putting It All Together

Compilers and Interpreters

Introduction - CAP Course

C and Programming Basics

CSE 504: Compiler Design

What is concurrency? Concurrency. What is parallelism? concurrency vs parallelism. Concurrency: (the illusion of) happening at the same time.

CSC488S/CSC2107S - Compilers and Interpreters. CSC 488S/CSC 2107S Lecture Notes

CSE 501: Compiler Construction. Course outline. Goals for language implementation. Why study compilers? Models of compilation

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

HOLY ANGEL UNIVERSITY COLLEGE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY COMPILER THEORY COURSE SYLLABUS

CS 321 IV. Overview of Compilation

SRM UNIVERSITY FACULTY OF ENGINEERING AND TECHNOLOGY SCHOOL OF COMPUTING DEPARTMENT OF CSE COURSE PLAN

CSE 401/M501 Compilers

COMP3131/9102: Programming Languages and Compilers

DOID: A Lexical Analyzer for Understanding Mid-Level Compilation Processes

The Environment Model

Earlier edition Dragon book has been revised. Course Outline Contact Room 124, tel , rvvliet(at)liacs(dot)nl

Compiler Construction D7011E

Interpreters. Prof. Clarkson Fall Today s music: Step by Step by New Kids on the Block

Introduction (1) COMP 520: Compiler Design (4 credits) Alexander Krolik MWF 13:30-14:30, MD 279

Formal Languages and Compilers Lecture I: Introduction to Compilers

Prof. Mohamed Hamada Software Engineering Lab. The University of Aizu Japan

CHAPTER 3. Register allocation

Transcription:

1 / 41 Compilers and computer architecture: introduction Martin Berger September 2018

2 / 41 Administrative matters: lecturer Name: Martin Berger Email: M.F.Berger@sussex.ac.uk Web: http://users.sussex.ac.uk/~mfb21/compilers Lecture notes etc: http://users.sussex.ac.uk/ ~mfb21/compilers/material.html Linked from Canvas Office hour: after the Thursday lectures, and on request (please arrange by email, see http://users.sussex.ac.uk/~mfb21/cal for available time-slots) My room: Chichester II, 312

3 / 41 Administrative matters: dates, times and assessment Lectures: Two lectures per week, Tue 1700-1800 FULTON B, and Thu 0900-1000 FULTON B. Tutorials: please see your timetables. The TA is Jim Fielding. There will (probably) be PAL sessions, more soon. Assessment: coursework (50%) and by unseen examination (50%). Both courseworks involve writing parts of a compiler. Due dates for courseworks: Fri, 2 Nov 2018, and Fri, 14 Dec 2018, 18:00, both 18:00.

4 / 41 Questions welcome! Please, ask questions... during the lesson at the end of the lesson in my office hours (see http://users.sussex.ac.uk/~mfb21/cal for available time-slots) by email M.F.Berger@sussex.ac.uk on Canvas in the tutorials any other channels (e.g. Twitter, Whatsapp...)? Please, don t wait until the end of the course to tell me about any problems you may encounter.

5 / 41 Prerequisites Good Java programming skills are indispensable. This course is not about teaching you how to program. It helps if you have already seen e.g. regular expressions, FSMs etc. But we will cover all this from scratch. It helps if you have already seen a CPU, e.g. know what a register is or a stack pointer.

6 / 41 Course content I m planning to give a fairly orthodox compilers course that show you all parts of a compiler. At the end of this course you should be able to write a fully blown compiler yourself and implement programming languages. We will also look at computer architecture, although more superficially. This will take approximately 9 weeks, so we have time at the end for some advanced material. I m happy to tailor the course to your interest, so please let me know what you want to hear about.

7 / 41 Coursework Evaluation of assessed courseworks will (largely) be by automated tests. This is quite different from what you ve seen so far. The reason for this new approach is threefold. Compilers are complicated algorithms and it s beyond human capabilities to find subtle bugs. Realism. In industry you don t get paid for being nice, or for having code that almost works. Fairness. Automatic testing removes subjective element. Note that if you make a basic error in your compiler then it is quite likely that every test fails and you will get 0 points. So it is really important that you test your code before submission thoroughly. I encourage you to share tests and testing frameworks with other students: as tests are not part of the deliverable, you make share them. Of course the compiler must be written by yourself.

8 / 41 Plan for today s lecture Whirlwind overview of the course. Why study compilers? What is a compiler? Compiler structure Lexical analysis Syntax analysis Semantic analysis, type-checking Code generation

9 / 41 Why study compilers? To become a good programmer, you need to understand what happens under the hood when you write programs in a high-level language. To understand low-level languages (assembler, C/C++, Rust, Go) better. Those languages are of prime importance, e.g. for writing operating systems, embedded code and generally code that needs to be really fast (e.g. computer games, big data). Most large programs have a tendency to embed a programming language. The skill quickly to write an interpreter or compiler for such embedded languages is invaluable. But most of all: compilers are extremely amazing, beautiful and one of the all time great examples of human ingenuity. After 70 years of refinement compilers are a paradigm case of beautiful software structure (modularisation). I hope it inspires you.

Overview: what is a compiler? 10 / 41

Overview: what is a compiler? A compiler is a program that translates programs from one programming language to programs in another programming language. The translation should preserve meaning (what does preserve and meaning mean in this context?). Source program Compiler Target program Error messages Typically, the input language (called source language) is more high-level than the output language (called target language) Examples Source: Java, target: JVM bytecode. Source: Swift, target: ARM machine code. Source: JVM bytecode, target: ARM/x86 machine code 11 / 41

12 / 41 Example translation: source program Here is a little program. (What does it do?) int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; } Using clang -S this translates to the following x86 machine code...

13 / 41 Example translation: target program _testfun: ## @testfun.cfi_startproc ## BB#0: pushq %rbp Ltmp0:.cfi_def_cfa_offset 16 Ltmp1:.cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp2:.cfi_def_cfa_register %rbp movl %edi, -4(%rbp) movl $1, -8(%rbp) LBB0_1: ## =>This Inner Loop Header: Depth=1 cmpl $0, -4(%rbp) jle LBB0_3 ## BB#2: ## in Loop: Header=BB0_1 Depth=1 movl -4(%rbp), %eax addl $4294967295, %eax ## imm = 0xFFFFFFFF movl %eax, -4(%rbp) movl -8(%rbp), %eax shll $1, %eax movl %eax, -8(%rbp) jmp LBB0_1 LBB0_3: movl -8(%rbp), %eax popq %rbp retq.cfi_endproc

14 / 41 Compilers have a beautifully simple structure Source program Analysis phase Code generation In the analysis phase two things happen: Analysing if the program is well-formed (e.g. checking for syntax and type errors). Creating a convenient (for a computer) representation of the source program structure for further processing. (Abstract syntax tree (AST), symbol table). The executable program is then generated from the AST in the code generation phase. Let s refine this. Generated program

Compiler structure Compilers have a beautifully simple structure. This structure was arrived at by breaking a hard problem (compilation) into several smaller problems and solving them separately. This has the added advantage of allowing to retarget compilers (changing source or target language) quite easily. Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program 15 / 41

Compiler structure Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program Interesting question: when do these phases happen? In the past, all happend at... compile-time. Now some happen at run-time in Just-in-time compilers (JITs). This has profound influences on choice of algorithms and performance. 16 / 41

The phases are purely functional, in that they take one input, and return one output. Functional programming languages like Haskell, Ocaml, F#, Rust or Scala are ideal for writing compilers. 17 / 41 Compiler structure Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program Another interesting question: do you note some thing about all these phases?

18 / 41 Phases: Overview Lexical analysis Syntactic analysis (parsing) Semantic analysis (type-checking) Intermediate code generation Optimisation Code generation

19 / 41 Phases: Lexical analysis Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

20 / 41 Phases: Lexical analysis What is the input to a compiler? A (often long) string, i.e. a sequence of characters. Strings are not an efficient data-structure for a compiler to work with (= generate code from). Instead, compilers generate code from a more convenient data structure called abstract syntax trees (ASTs). We construct the AST of a program in two phases: Lexical anlysis. Where the input string is converted into a list of tokens. Parsing. Where the AST is constructed from a token list.

21 / 41 Phases: Lexical analysis In the lexical analysis, a string is converted into a list of tokens. Example: The program int testfun( int n ){ int res = 1; while( n > 0 ){ n--; res *= 2; } return res; } Is (could be) represented as the list T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while,...

22 / 41 Phases: Lexical analysis T_int, T_ident ( "testfun" ), T_left_brack, T_int, T_ident ( "n" ), T_rightbrack, T_left_curly_brack, T_int, T_ident ( "res" ), T_eq, T_num ( 1 ), T_semicolon, T_while,... Why is this interesting? Abstracts from irrelevant detail (e.g. syntax of keywords, whitespace). Makes the next phase (parsing) much easier.

23 / 41 Phases: syntax analysis (parsing) Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

24 / 41 Phases: syntax analysis (parsing) This phase converts the program (list of tokens) into a tree, the AST of the program (compare to the DOM of a webpage). This is a very convenient data structure because syntax-checking (type-checking) and code-generation can be done by walking the AST (cf visitor pattern). But how is a program a tree? T_while T_greater T_semicolon while( n > 0 ){ n--; res *= 2; } T_var ( n ) T_num ( 0 ) T_decrement T_update T_var ( n ) T_var ( res ) T_mult T_var ( res ) T_num ( 2 )

Phases: syntax analysis (parsing) T_while T_greater T_semicolon T_var ( n ) T_num ( 0 ) T_decrement T_update T_var ( n ) T_var ( res ) T_mult T_var ( res ) T_num ( 2 ) The AST is often implemented as a tree of linked objects. The compiler writer must design the AST data structure carefully so that it is easy to build (during syntax analysis), and easy to walk (during code generation). The performance of the compiler strongly depends on the AST, so a lot of optimisation goes here for instustrial strength compilers. 25 / 41

This dual role is because the rules for constructing the AST are essentially exactly the rules that determine the set of syntactically valid programs. Here the theory of formal languages (context free, context sensitive, and finite automata) is of prime importance. We will study this in detail. 26 / 41 Phases: syntax analysis (parsing) T_while T_greater T_semicolon T_var ( n ) T_num ( 0 ) T_decrement T_update T_var ( n ) T_var ( res ) T_mult T_var ( res ) T_num ( 2 ) The construction of the AST has another important role: syntax checking, i.e. checking if the program is syntactically valid!

Phases: syntax analysis (parsing) T_while T_greater T_semicolon T_var ( n ) T_num ( 0 ) T_decrement T_update T_var ( n ) T_var ( res ) T_mult T_var ( res ) T_num ( 2 ) Great news: the generation of lexical analysers and parsers can be automated by using parser generators (e.g. lex, yacc). Decades of research have gone into parser generators, and in practise they generate better lexers and parsers than most programmers would be able to. Alas, parser generators are quite complicated beasts, and in order to understand them, it is helpful to understand formal languages and lexing/parsing. The best way to understand this is to write a toy lexer and parser. 27 / 41

28 / 41 Phases: semantic analysis Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

29 / 41 Phases: semantic analysis While parsing can reject syntactically invalid programs, it cannot reject semantically invalid programs, e.g. programs with more complicated semantic mistakes are harder to catch. Examples. void main() { i = 7 int i = 7... if ( 3 + true ) > "hello" then... They are caught with semantic analysis. The key technology are types. Modern languages like Scala, Rust, Haskell, Ocaml, F# employ type inference.

30 / 41 Phases: intermediate code generation Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

Phases: intermediate code generation There are many different CPUs with different machine languages. Often the machine language changes subtly from CPU version to CPU version. It would be annoying if we had to rewrite large parts of the compiler. Fortunately, most machine languages are rather similar. This helps us to abstract almost the whole compiler from the details of the target language. The way we do this is by using in essence two compilers. Develop an intermediate language that captures the essence of almost all machine languages. Compile to this intermediate language. Do compiler optimisations in the intermediate language. Translate the intermediate representation to the target machine language. This step can be seen as a mini-compiler. If we want to retarget the compiler to a new machine language, only this last step needs to be rewritten. Nice data abstraction. 31 / 41

32 / 41 Phases: optimiser Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

Phases: optimiser Translating a program often introduces various inefficiencies, make the program e.g. run slow, or use a lot of memories, or use a lot of power (important for mobile phones). Optimisers try to remove these inefficiencies, by replacing the inefficient program with a more efficient version (without changing the meaning of the program). Most code optimisations are problems are difficult (NP complete or undecidable), so optimisers are expensive to run, often (but not always) lead to modest improvements only. They are also difficult algorithmically. These difficulties are exacerbate for JITs because the are executed at program run-time. However, some optimisations are easy, e.g. inlining of functions: if a function is short (e.g. computing sum of two numbers), replacing the call to the function with its code, can lead to faster code. (What is the disadvantage of this?) 33 / 41

34 / 41 Phases: code generation Source program Lexical analysis Intermediate code generation Syntax analysis Optimisation Semantic analysis, e.g. type checking Code generation Translated program

35 / 41 Phases: code generation This straighforward phase translates the generated intermediate code to machine code. As machine code and intermediate code are much alike, this mini-compiler is simple and fast.

36 / 41 Compilers vs interpreters Interpreters are a second way to run programs. Source program Source program Compiler Syntax error? Data Executable Output At runtime. Data Interpreter Output The advantage of compilers is that generated code is faster, because a lot of work has to be done only once (e.g. lexing, parsing, type-checking, optimisation). And the results of this work are shared in every execution. The interpreter has to redo this work everytime. Syntax error? The advantage of interpreters is that they are much simpler than compilers. We won t say much more about interpreters in this course.

37 / 41 Literature Compilers are among the most studied and most well understood parts of informatics. Many good books exist. Here are some of my favourites, although I won t follow any of them closely. Modern Compiler Implementation in Java (second edition) by Andrew Appel and Jens Palsberg. Probably closest to our course. Moves quite fast. Compilers - Principles, Techniques and Tools (second edition) by Alfred V. Aho, Monica Lam, Ravi Sethi, and Jeffrey D. Ullman. The first edition of this book is is the classic text on compilers, known as the Dragon Book, but its first edition is a bit obsolete. The second edition is substantially expanded and goes well beyond the scope of our course. For my liking, the book is a tad long.

38 / 41 Literature Some other material: Engineering a Compiler, by Keith Cooper, Linda Torczon. The Alex Aiken s Stanford University online course on compilers. This course coveres similar ground as ours, but goes more in-depth. I was quite influenced by Aiken s course when I designed our s. Computer Architecture - A Quantitative Approach (sixth edition) by John Hennessey and David Patterson. This is the bible for computer architecture. It goes way beyond what is required for our course, but very well written by some of the world s leading experts on computer architecture. Well worth studying. This book contains a great deal of information about the MIPS architecture used in this course.

39 / 41 How to enjoy and benefit from this course Assessed coursework is designed to reinforce and integrate lecture material; it s designed to help you pass the exam Go look at the past papers - now. Use the tutorials to get feedback on your solutions Substantial lab exercise should bring it all together Ask questions, in the lectures, in the labs, on Study Direct or in person! Design your own mini-languages and write compilers for them. Have a look at real compilers. There are many free, open-source compilers, g.g. GCC, LLVM, TCC, MiniML, Ocaml, the Scala compiler, GHC, the Haskell compiler.

40 / 41 Feedback In this module, you will receive feedback through: The mark and comments on your assessment Feedback to the whole class on assessment and exams Feedback to the whole class on lecture understanding Model solutions Worked examples in class and lecture Verbal comments and discussions with tutors in class Discussions with your peers on problems Online discussion forums One to one sessions with the tutors The more questions you ask, the more you participate in discussions, the more you engage with the course, the more feedback you get.

Questions? 41 / 41