What is a compiler? var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Similar documents
What is a compiler? Xiaokang Qiu Purdue University. August 21, 2017 ECE 573

What do Compilers Produce?

The Structure of a Syntax-Directed Compiler

Formats of Translated Programs

Compilers and Interpreters

The Structure of a Compiler

9/5/17. The Design and Implementation of Programming Languages. Compilation. Interpretation. Compilation vs. Interpretation. Hybrid Implementation

Crafting a Compiler with C (II) Compiler V. S. Interpreter

CSE450. Translation of Programming Languages. Lecture 11: Semantic Analysis: Types & Type Checking

Compiler Design (40-414)

Intermediate Code Generation

The Compiler So Far. CSC 4181 Compiler Construction. Semantic Analysis. Beyond Syntax. Goals of a Semantic Analyzer.

Compilers. History of Compilers. A compiler allows programmers to ignore the machine-dependent details of programming.

CST-402(T): Language Processors

Semantic Analysis. Lecture 9. February 7, 2018

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler Front-End

The role of semantic analysis in a compiler

Project 2 Interpreter for Snail. 2 The Snail Programming Language

Undergraduate Compilers in a Day

Compilers & Translation Systems Engineering

Introduction to Compiler Design

COP4020 Programming Languages. Compilers and Interpreters Robert van Engelen & Chris Lacher

Program Analysis ( 软件源代码分析技术 ) ZHENG LI ( 李征 )

CS 415 Midterm Exam Spring 2002

Appendix A The DL Language

Structure of a compiler. More detailed overview of compiler front end. Today we ll take a quick look at typical parts of a compiler.

Group B Assignment 9. Code generation using DAG. Title of Assignment: Problem Definition: Code generation using DAG / labeled tree.

When do We Run a Compiler?

COMP 181 Compilers. Administrative. Last time. Prelude. Compilation strategy. Translation strategy. Lecture 2 Overview

Alternatives for semantic processing

COLLEGE OF ENGINEERING, NASHIK. LANGUAGE TRANSLATOR

COMPILER CONSTRUCTION LAB 2 THE SYMBOL TABLE. Tutorial 2 LABS. PHASES OF A COMPILER Source Program. Lab 2 Symbol table

Compilers. Lecture 2 Overview. (original slides by Sam

CS 321 IV. Overview of Compilation

Context-free grammars (CFG s)

Syntax and Grammars 1 / 21

Compilers and Code Optimization EDOARDO FUSELLA

Working of the Compilers

Compiling and Interpreting Programming. Overview of Compilers and Interpreters

Introduction. CS 2210 Compiler Design Wonsun Ahn

COMPILERS AND INTERPRETERS Lesson 4 TDDD16

Semantic Analysis. Outline. The role of semantic analysis in a compiler. Scope. Types. Where we are. The Compiler so far

Syntax Analysis/Parsing. Context-free grammars (CFG s) Context-free grammars vs. Regular Expressions. BNF description of PL/0 syntax

Front End. Hwansoo Han

TDDD55- Compilers and Interpreters Lesson 3

The Structure of a Syntax-Directed Compiler

Semantic actions for declarations and expressions

TDDD55 - Compilers and Interpreters Lesson 3

Semantic actions for declarations and expressions

Compiler Design. Computer Science & Information Technology (CS) Rank under AIR 100

Semantic actions for declarations and expressions. Monday, September 28, 15

Advances in Compilers

CS Lecture 2. The Front End. Lecture 2 Lexical Analysis

Intermediate Representations

LECTURE 3. Compiler Phases

Intermediate Code & Local Optimizations

Syntax/semantics. Program <> program execution Compiler/interpreter Syntax Grammars Syntax diagrams Automata/State Machines Scanning/Parsing

Compilers. Prerequisites

4. An interpreter is a program that

Parsing II Top-down parsing. Comp 412

An Overview of Compilation

Semantic actions for expressions

Last time. What are compilers? Phases of a compiler. Scanner. Parser. Semantic Routines. Optimizer. Code Generation. Sunday, August 29, 2010

6.035 Project 3: Unoptimized Code Generation. Jason Ansel MIT - CSAIL

COMPILER DESIGN LECTURE NOTES

Compiler Structure. Lexical. Scanning/ Screening. Analysis. Syntax. Parsing. Analysis. Semantic. Context Analysis. Analysis.

Pioneering Compiler Design

Introduction. Compiler Design CSE Overview. 2 Syntax-Directed Translation. 3 Phases of Translation

CSE 413 Languages & Implementation. Hal Perkins Winter 2019 Structs, Implementing Languages (credits: Dan Grossman, CSE 341)

CS 360 Programming Languages Interpreters

Anatomy of a Compiler. Overview of Semantic Analysis. The Compiler So Far. Why a Separate Semantic Analysis?

Syntax Errors; Static Semantics

Wednesday, September 9, 15. Parsers

Parsers. What is a parser. Languages. Agenda. Terminology. Languages. A parser has two jobs:

CSE 504: Compiler Design. Code Generation

History of Compilers The term

Yacc: A Syntactic Analysers Generator

The View from 35,000 Feet

CS A331 Programming Language Concepts

Class Information ANNOUCEMENTS

Administration CS 412/413. Why build a compiler? Compilers. Architectural independence. Source-to-source translator

Computing Inside The Parser Syntax-Directed Translation. Comp 412 COMP 412 FALL Chapter 4 in EaC2e. source code. IR IR target.

Where We Are. Lexical Analysis. Syntax Analysis. IR Generation. IR Optimization. Code Generation. Machine Code. Optimization.

Syntax. A. Bellaachia Page: 1

CSE450 Translation of Programming Languages. Lecture 4: Syntax Analysis

Language Translation. Compilation vs. interpretation. Compilation diagram. Step 1: compile. Step 2: run. compiler. Compiled program. program.

Wednesday, August 31, Parsers

Building a Parser Part III

Time : 1 Hour Max Marks : 30

Intermediate Representations

Intermediate Representation (IR)

CS321 Languages and Compiler Design I. Winter 2012 Lecture 1

COMPILER CONSTRUCTION Seminar 02 TDDB44


Announcements. My office hours are today in Gates 160 from 1PM-3PM. Programming Project 3 checkpoint due tomorrow night at 11:59PM.

Building a Compiler with. JoeQ. Outline of this lecture. Building a compiler: what pieces we need? AKA, how to solve Homework 2

Compiler Theory Introduction and Course Outline Sandro Spina Department of Computer Science

Programming Language Specification and Translation. ICOM 4036 Fall Lecture 3

ICOM 4036 Spring 2004

Compilers Crash Course

Transcription:

What is a compiler?

What is a compiler? Traditionally: Program that analyzes and translates from a high level language (e.g., C++) to low-level assembly language that can be executed by hardware int a, b; a = 3; if (a < 4) { b = 2; } else { b = 3; } var a var b mov 3 a mov 4 r1 cmpi a r1 jge l_e mov 2 b jmp l_d l_e: mov 3 b l_d: ;done

Compilers are translators Fortran C C++ Java Text processing language HTML/XML Command & Scripting Languages Natural language Domain specific languages translate Machine code Virtual machine code Transformed source code Augmented source code Low-level commands Semantic components Another language

Compilers are optimizers Can perform optimizations to make a program more efficient int a, b, c; b = a + 3; c = a + 3; var a var b var c mov a r1 addi 3 r1 mov r1 b mov a r2 addi 3 r2 mov r2 c var a var b var c mov a r1 addi 3 r1 mov r1 b mov r1 c

Why do we need compilers? Compilers provide portability Old days: whenever a new machine was built, programs had to be rewritten to support new instruction sets IBM System/360 (1964): Common Instruction Set Architecture (ISA) programs could be run on any machine which supported ISA Common ISA is a huge deal (note continued existence of x86) But still a problem: when new ISA is introduced (EPIC) or new extensions added (x86-64), programs would have to be rewritten Compilers bridge this gap: write new compiler for an ISA, and then simply recompile programs!

Why do we need compilers? (II) Compilers enable high performance and productivity Old: programmers wrote in assembly language, architectures were simple (no pipelines, caches, etc.) Close match between programs and machines easier to achieve performance New: programmers write in high level languages (Ruby, Python), architectures are complex (superscalar, out-oforder execution, multicore) Compilers are needed to bridge this semantic gap Compilers let programmers write in high level languages and still get good performance on complex architectures

def index @posts = Post.find(:all) Semantic Gap respond_to do format format.html # index.html.erb format.xml { render :xml => @posts } end end Snippet of Ruby-on-rails code AMD Barcelona architecture

def index @posts = Post.find(:all) respond_to do format format.html # index.html.erb format.xml { render :xml => @posts } end end Semantic Gap Snippet of Ruby-on-rails code This would be impossible without compilers! AMD Barcelona architecture

Some common compiler types 1. High level language assembly language (e.g., gcc) 2. High level language machine independent bytecode (e.g., javac) 3. Bytecode native machine code (e.g., java s JIT compiler) 4. High level language high level language (e.g., domain specific languages, many research languages including mine!)

HLL to Assembly Program Compiler Assembly Assembler Machine Code Compiler converts program into assembly Assembler is machine-specific translator which converts assembly into machine code add $7 $8 $9 ($7 = $8 + $9) 000000 00111 01000 01001 00000 100000 Conversion is usually one-to-one with some exceptions Program locations Variable names

HLL to Bytecode Program Compiler Bytecode Interpreter Execute! Compiler converts program into machine independent bytecode e.g., javac generates Java bytecode, C# compiler generates CIL Interpreter then executes bytecode on-the-fly Bytecode instructions are executed by invoking methods of the interpreter, rather than directly executing on the machine Aside: what are the pros and cons of this approach?

Bytecode to Assembly Program Compiler Bytecode JIT Comp. Machine Code Compiler converts program into machine independent bytecode e.g., javac generates Java bytecode, C# compiler generates CIL Just-in-time compiler compiles code while program executes to produce machine code Is this better or worse than a compiler which generates machine code directly from the program?

Structure of a Compiler

Scanner Compiler starts by seeing only program text if (a < 4) { b = 5 }

Scanner Compiler starts by seeing only program text i f ( a < 4 ) { \n \t b = 5 \n }

Scanner Compiler starts by seeing only program text Scanner converts program text into string of tokens i f ( a < 4 ) { \n \t b = 5 \n }

Scanner Compiler starts by seeing only program text Scanner converts program text into string of tokens if ( ID(a) OP(<) LIT(4) ) { ID(b) = LIT(5) } But we still don t know what the syntactic structure of the program is

Parser Converts string of tokens into a parse tree or an abstract syntax tree. Captures syntactic structure of code (i.e., this is an if statement, with a then-block ) if ( ID(a) OP(<) LIT(4) ) { ID(b) = LIT(5) }

Parser Converts string of tokens into a parse tree or an abstract syntax tree. Captures syntactic structure of code (i.e., this is an if statement, with a then-block ) a if-stmt cond-expr then-clause < stmt_list lhs rhs 4 assign_stmt lhs b rhs 5

Semantic actions Interpret the semantics of syntactic constructs Note that up until now we have only been concerned with what the syntax of the code is What s the difference?

Syntax vs. Semantics Syntax: grammatical structure of language What symbols, in what order, is a legal part of the language? But something that is syntactically correct may mean nothing! colorless green ideas sleep furiously Semantics: meaning of language What does a particular set of symbols, in a particular order, mean? What does it mean to be an if statement? evaluate the conditional, if the conditional is true, execute the then clause, otherwise execute the else clause

A note on semantics How do you define semantics? Static semantics: properties of programs All variables must have a type Expressions must use consistent types Can define using attribute grammars Execution semantics: how does a program execute? Can define an operational or denotational semantics for a language Well beyond the scope of this class! For many languages, the compiler is the specification

Semantic actions Actions taken by compiler based on the semantics of program statements Building a symbol table Generating intermediate representations

Symbol tables A list of every declaration in the program, along with other information Variable declarations: types, scope Function declarations: return types, # and type of arguments Program Example Integer ii;... ii = 3.5;... print ii; Symbol Table Name Type Scope ii int global...

Intermediate representation Also called IR A (relatively) low level representation of the program But not machine-specific! One example: three address code Each instruction can take at most three operands (variables, literals, or labels) Note: no registers! bge a, 4, done mov 5, b done: //done!

Optimizer Transforms code to make it more efficient High-level optimizations Loop interchange, parallelization Scalar optimizations Operates on IR Local optimizations Strength reduction, constant folding Different kinds, operating at different levels Operates at level of AST, or even source code Dead code elimination, common sub-expression elimination Operates on small sequences of instructions

Code generation Generate assembly from intermediate representation Select which instructions to use Schedule instructions Decide which registers to use bge a, 4, done mov 5, b done: //done! ld a, r1 mov 4, r2 cmp r1, r2 bge done mov 5, r3 st r3, b done:

Code generation Generate assembly from intermediate representation Select which instructions to use Schedule instructions Decide which registers to use bge a, 4, done mov 5, b done: //done! mov 4, r1 ld a, r2 cmp r1, r2 blt done mov 5, r1 st r1, b done:

Overall structure of a compiler Source code Scanner Use regular expressions to define tokens. Can then use scanner generators such as lex or flex. Tokens Parser Define language using context free grammars. Can then use parser generators such as yacc or bison. Syntax tree Semantic Routines Semantic routines done by hand. But can be formalized. IR Optimizer IR Code Generation Written manually. Automation is an active research area (e.g., dataflow analysis frameworks) Written manually Executable

Overall structure of a compiler Source code Scanner Use regular expressions to define tokens. Can then use scanner generators such as lex or flex. Tokens Parser Syntax tree Semantic Routines IR Optimizer IR Code Generation Define language using context free grammars. Can then use parser generators such as yacc or bison. Semantic routines done by hand. But can be formalized. Many of these passes can be combined Written manually. Automation is an active research area (e.g., dataflow analysis frameworks) Written manually Executable

Front-end vs. Back-end Scanner + Parser + Semantic actions + (high level) optimizations called the front-end of a compiler IR-level optimizations and code generation (instruction selection, scheduling, register allocation) called the back-end of a compiler Can build multiple front-ends for a particular back-end e.g., gcc & g++, or many front-ends which generate CIL Can build multiple back-ends for a particular front-end e.g., gcc allows targeting different architectures Source code Scanner Tokens Parser Syntax tree Semantic Routines IR Optimizer IR Code Generation Executable front end back end

Design considerations (I) Compiler and language designs influence each other Higher level languages are harder to compile More work to bridge the gap between language and assembly Flexible languages are often harder to compile Dynamic typing (Ruby, Python) makes a language very flexible, but it is hard for a compiler to catch errors (in fact, many simply won t)

Design considerations (II) Compiler design is influenced by architectures CISC vs. RISC CISC designed for days when programmers wrote in assembly For a compiler to take advantage of string manipulation instructions, it must be able to recognize them RISC has a much simpler instruction model Easier to compile for