Compilers. Lecture 2 Overview. (original slides by Sam

Similar documents
COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Introduction to Programming Languages and Compilers. CS164 11:00-12:30 TT 10 Evans. UPRM ICOM 4029 (Adapted from: Prof. Necula UCB CS 164)

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

Compiler Design 1. Introduction to Programming Language Design and to Compilation

1DL321: Kompilatorteknik I (Compiler Design 1)

1DL321: Kompilatorteknik I (Compiler Design 1) Introduction to Programming Language Design and to Compilation

Administrivia. Compilers. CS143 10:30-11:50TT Gates B01. Text. Staff. Instructor. TAs. The Purple Dragon Book. Alex Aiken. Aho, Lam, Sethi & Ullman

PLAGIARISM. Administrivia. Compilers. CS143 11:00-12:15TT B03 Gates. Text. Staff. Instructor. TAs. Office hours, contact info on 143 web site

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ICOM 4036 Spring 2004

PLAGIARISM. Administrivia. Course home page: Introduction to Programming Languages and Compilers

Introduction to Programming Languages and Compilers. CS164 11:00-12:00 MWF 306 Soda

Introduction to Compiler

The View from 35,000 Feet

Compiling Techniques

Compilers and Interpreters

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Programming Language Design and Implementation. Cunning Plan. Your Host For The Semester. Wes Weimer TR 9:30-10:45 MEC 214. Who Are We?

Overview of a Compiler

High-level View of a Compiler

CSE 3302 Programming Languages Lecture 2: Syntax

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CS131: Programming Languages and Compilers. Spring 2017

Introduction to Compilers

Front End. Hwansoo Han

CPS 506 Comparative Programming Languages. Syntax Specification

Defining Program Syntax. Chapter Two Modern Programming Languages, 2nd ed. 1

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

CS 132 Compiler Construction

Overview of a Compiler

Overview of a Compiler

CS 4201 Compilers 2014/2015 Handout: Lab 1

Part 5 Program Analysis Principles and Techniques

Compiler Construction LECTURE # 1

What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Parsing II Top-down parsing. Comp 412

CS 352: Compilers: Principles and Practice

Undergraduate Compilers in a Day

Introduction to Parsing

Compiler Construction

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

CS415 Compilers Overview of the Course. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

Introduction to Compiler Design

Intermediate Representations

LANGUAGE PROCESSORS. Presented By: Prof. S.J. Soni, SPCE Visnagar.

Lexical Analysis. Introduction

Compilers. Type checking. Yannis Smaragdakis, U. Athens (original slides by Sam

Time : 1 Hour Max Marks : 30

CS415 Compilers. Syntax Analysis. These slides are based on slides copyrighted by Keith Cooper, Ken Kennedy & Linda Torczon at Rice University

CSE 582 Autumn 2002 Exam 11/26/02

Lecture 14 Sections Mon, Mar 2, 2009

IR Optimization. May 15th, Tuesday, May 14, 13

Compilers. Bottom-up Parsing. (original slides by Sam

EECS 6083 Intro to Parsing Context Free Grammars

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

Syntax Intro and Overview. Syntax

Compiler Optimisation

COSE312: Compilers. Lecture 1 Overview of Compilers

Introduction to Lexical Analysis

A Simple Syntax-Directed Translator

Syntactic Analysis. The Big Picture Again. Grammar. ICS312 Machine-Level and Systems Programming

A simple syntax-directed

Goals for course What is a compiler and briefly how does it work? Review everything you should know from 330 and before

Introduction to Compiler Construction

Compilers. Intermediate representations and code generation. Yannis Smaragdakis, U. Athens (original slides by Sam

Intermediate Representations

Chapter 10 Language Translation

CS 406/534 Compiler Construction Parsing Part I

Introduction to Compiler Construction

4. An interpreter is a program that

CSE P 501 Compilers. Intermediate Representations Hal Perkins Spring UW CSE P 501 Spring 2018 G-1

CS5363 Final Review. cs5363 1

Introduction to Compiler Construction

CSE302: Compiler Design

Compiler Code Generation COMP360

CSEP 501 Compilers. Languages, Automata, Regular Expressions & Scanners Hal Perkins Winter /8/ Hal Perkins & UW CSE B-1

Syntax. A. Bellaachia Page: 1

Parsing. Roadmap. > Context-free grammars > Derivations and precedence > Top-down parsing > Left-recursion > Look-ahead > Table-driven parsing

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

CA Compiler Construction

CS415 Compilers. Lexical Analysis

Overview. Overview. Programming problems are easier to solve in high-level languages

Formats of Translated Programs

CSc 453 Compilers and Systems Software

The Structure of a Syntax-Directed Compiler

Crafting a Compiler with C (II) Compiler V. S. Interpreter

Compilers. Yannis Smaragdakis, U. Athens (original slides by Sam

1 Lexical Considerations

Semester Review CSC 301

CSE 413 Programming Languages & Implementation. Hal Perkins Autumn 2012 Grammars, Scanners & Regular Expressions

Intermediate Representations

Topic 3: Syntax Analysis I

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Principles of Programming Languages COMP251: Syntax and Grammars

The Structure of a Syntax-Directed Compiler

Intermediate Representations

Syntax. Syntax. We will study three levels of syntax Lexical Defines the rules for tokens: literals, identifiers, etc.

Parsing and Pattern Recognition

Transcription:

Compilers Lecture 2 Overview Yannis Smaragdakis, U. Athens Yannis Smaragdakis, U. Athens (original slides by Sam Guyer@Tufts)

Last time The compilation problem Source language High-level abstractions Easy to understand and maintain Target language Very low-level, close to machine Few abstractions Concerns Systematic, correct translation High-quality translation 2

Translation strategy Meaning Sentences Sentences Words Words Letters Letters High-level language Assembly/machine code 3

Compilation strategy Follows directly from translation strategy: Grouping letters into words is called lexical analysis Sentences Meaning Sentences Words Words Letters High-level language Letters Assembly code A series of passes Each pass performs one step Transforms the program representation 4

Basic compiler structure Source Assembly code Front End IR Back End code Traditional two-pass compiler Front-end reads in source code Internal representation captures meaning Back-end generates assembly Advantages? Decouples input language from target machine 5

Modern optimizing compiler Middle end: 4. Optimization Source code Front End IR Optimizer IR Back End Assembly code Front end: 1. Lexical analysis 2. Parsing 3. Semantic analysis Back end: 5. Instruction selection 6. Register allocation 7. Instruction scheduling 6

The front end text chars Lexical analyzer tokens Parser IR Errors Responsibilities? Recognize legal (and illegal programs) Report errors in a useful way Generate internal representation How it works Good news: linear time, mostly generated automatically By analogy to natural languages 7

Lexical Analysis First step: recognize words. Smallest unit above letters This is a sentence. Some lexical rules Capital T (start of sentence symbol) Blank (word separator) Period. (end of sentence symbol) 8

More Lexical Analysis Lexical analysis is not trivial. Consider: ist his ase nte nce Often a key question: What is the role of white space in the language? Plus,,programming glanguages g are typically ymore cryptic than English: *p->f ++ = -.12345e-5 9

Early compilers Strict formatting rules: C AREA OF THE TRIANGLE 799 S = (IA + IB + IC) / 2.0 AREA = SQRT( S * (S - IA) * (S - IB) * + (S - FLOATF(IC))) WRITE OUTPUT TAPE 6, 601, IA, IB, IC, AREA Why? Punch cards! And it s easier 10

Lexical analysis Another example: void func(float * ptr, float val) { } float result; result = val/*ptr; Why is this case interesting? /* is the comment delimiter 11

Lexical Analysis Lexical analyzer divides program text into words or tokens if x == y then z = 1; else z = 2; Tokens have value and type: <if, keyword>, <x, identifier>, <==, operator>, etc. 12

Specification How do we specify tokens? Keyword an exact string What about identifier? floating point number? Regular expressions Just like Unix tools grep, awk, sed, etc. Identifier: [a-za-z_][a-za-z_0-9]* Algorithms for matching regexps Actually, generate code that does the matching This code is often called a scanner 13

Parsing Once words are understood, the next step is to understand sentence structure Parsing = Diagramming Sentences The diagram is a tree 14

Diagramming a Sentence This line is a longer sentence article noun verb article adjective noun subject object sentence 15

Diagramming programs Diagramming g program expressions is the same Consider: Diagrammed: If x == y then z = 1; else z = 2; x == y z 1 z 2 relation predicate assign then-stmt if-then-else assign else-stmt 16

Specification How do we describe the language? Same as English: using grammar rules 1. sentence subject verb object 1. goal expr 2. subject noun-phrase 3. noun-phrase article noun-phrase 4. adjective noun-phrase 5. noun 2. expr expr op term 3. term 4. term number 5. id etc 6. op + 7. - Formal grammars Chomsky hierarchy context-free grammars Each rule is called a production Tokens from scanner 17

Using grammars Given a grammar, we can derive sentences by repeated substitution Parsing is the reverse process given a sentence, find a derivation (same as diagramming) 1. goal expr 2. expr expr op term 3. term 4. term number 5. id 6. op + 7. - Production Result goal 1 expr 2 expr op term 5 expr op y 7 expr - y 2 expr op term - y 4 expr op 2 - y 6 expr + 2 - y 3 term + 2 - y 5 x + 2 - y 18

Representation Diagram is called a parse tree or syntax tree x + 2 - y expr goal expr op term expr op term - <id,y> term <id,x> + <number,2> 1. goal expr 2. expr expr op term 3. term 4. term number 5. id Notice: Contains a lot of unneeded information. 6. op + 7. - 19

Representation Compilers often use an abstract syntax tree - x + 2 - y + <id, y> <id, x> <num, 2> More oeconcise cseand dconvenient: e Summarizes grammatical structure without including all the details of the derivation ASTs are one kind of intermediate representation (IR) 20

Semantic Analysis Once sentence structure is understood, we can try to understand meaning What would the ideal situation be? Formally check the program against a specification This capability is coming Compilers perform limited analysis to catch inconsistencies Some do more analysis to improve the performance of the program 21

Semantic Analysis in English Example: Jack said Jerry left his assignment at home. What does his refer to? Jack or Jerry? Even worse: Jack said Jack left his assignment at home? How many Jacks are there? Which one left the assignment? 22

Semantic analysis in programs Programming g languages define strict rules to avoid such ambiguities What does this code print? Why? This Java code prints 4 ; the inner-most declaration is used. { } int Jack = 3; { int Jack = 4; System.out.print(Jack); } 23

More Semantic Analysis Compilers perform many semantic checks besides variable bindings Example: Jack left her homework at home. A type mismatch between her and Jack; we know they are different people (I m assuming Jack is male) 24

Where are we? Source code Front End IR Optimizer IR Back End Assembly code Errors Front end Produces fully-checked AST Problem: AST still represents source-level semantics 25

Intermediate representations Many different kinds of IRs High-level IR (e.g. AST) Closer to source code Hides implementation details Low-level IR Closer to the machine Exposes details (registers, instructions, etc) Many tradeoffs in IR design Most compilers have 1 or maybe 2 IRs: Typically closer to low-level l l IR Better for optimization and code generation 26

IR lowering Preparing for optimization and code gen Dismantle complex structures into simple ones Process is called lowering Result is an IR called three-address code if (x == y) z = 1; else z = 2; if z = z 2 t0 = x == y == br t0 label1 goto label2 x y label1: z = 1 = goto label3 1 label2: z = 2 label3: 27

Optimization IR Opt 1 Opt 2 Opt 3 Opt 2 IR Series of passes often repeated Goal: reduce some cost Run faster Use less memory Conserve some other resource, like power Must preserve program semantics Dominant cost in most modern compilers 28

Optimization General scheme Analysis phase: Pass over code looking for opportunities Often uses a formal analysis framework Transformation phase Modify the code to exploit opportunity Classic optimizations i Dead-code elimination, common sub-expression elimination, loop-invariant code motion, strength reduction This class: time permitting 29

Optimization example Array accesses for (i = 0; i < N; i++) for (j = 0; j < M; j++) A[i][j] = A[i][j] + C; for (i = 0; i < N; i++) { t1 = i * M; M for (j = 0; j < M; j++){ t0 = &A + t1 + j (*t0) += C; } } for (i = 0; i < N; i++) for (j = 0; j < M; j++){ t0 = &A + (i * M) + j (*t0) += C; } t1 = 0; for (i = 0; i < N; i++) { for (j = 0; j < M; j++){ t0 = &A + t1 + j (*t0) += C; } t1 = t1 + M; } 30

Optimization Often contain assumptions about performance tradeoffs of the underlying machine Like what? Relative speed of arithmetic ti operations plus versus times Possible parallelism in CPU Example: multiple additions can go on concurrently Cost of memory versus computation Should I save values I ve already computed or recompute? Size of various caches In particular, the instruction cache 31

Where are we? Source Assembly code Front End IR Optimizer IR Back End code Errors Optimization output Transformed program Typically, same level of abstraction 32

Back end IR Instruction selection Register allocation Instruction scheduling Assembly Responsibilities Map abstract instructions to real machine architecture Allocate storage for variables in registers Schedule instructions (often to exploit parallelism) How it works Bad news: very expensive, poorly understood, some automation 33

Instruction selection Example: RISC instructions Notice: load @b => r1... load @c => r2 label1: mult r1, r2 => r3 t1 = b * c load @a => r1 y = a + t1 add r3, r1 => r1 z = d + t1 store r1 => @y... load @d => r1 add r3, r1 => r1 store r1 => @z Explicit loads and stores Lots of registers virtual registers 34

Register allocation Goals: Have each value in a register when it is used Manage a limited set of resources Often need to insert loads and stores Intel Nehalem Intel Penryn L1 Size / L1 Latency 64KB / 4 cycles 64KB / 3 cycles L2 Size / L2 Latency 256KB / 11 cycles 6MB* / 15 cycles L3 Size / L3 Latency 8MB / 39 cycles N/A Main Memory (DDR3) 107 cycles (33.4 ns) 160 cycles (50.3 ns) Algorithms Optimal allocation is NP-complete Many back-end algorithms compute approximate solutions to NP-complete problems 35

Instruction scheduling Change the order of instructions Why would that matter? Even single-core CPUs have parallelism Multiple functional units called superscalar Group together different kinds of operations E.g., g, integer vs floating gpoint Parallelism in memory subsystem Initiate a load from memory Do other work while waiting 36

Instruction scheduling Example: Move loads early to avoid waiting BUT: often creates extra register pressure load @b => r1 load @b => r1 load @c => r2 load @c => r2 mult r1, r2 => r3 load @a => r4 load @a => r1 load @d => r5 add r3, r1 => r1 mult r1, r2 => r3 store r1 => @y add r3, r4 => r4 load @d => r1 store r4 => @y add r3, r1 => r1 add r3, r5 => r5 store r1 => @z store r5 => @z May stall on loads Start loads early, hide latency, but need 5 registers 37

Finished program What else does the code need to run? Programs need support at run-time Start-up code Interface to OS Libraries Varies significantly between languages C fairly minimal Java Java virtual machine 38

Run-time System Memory management services Manage heap allocation Garbage collection Run-time type checking Error processing (exception handling) Interface to the operating system Support of parallelism Parallel thread initiation Communication and synchronization 39