Compiler Construction Virendra Singh Computer Architecture and Dependable Systems Lab Department of Electrical Engineering Indian Institute of Technology Bombay http://www.ee.iitb.ac.in/~viren/ E-mail: viren@ee.iitb.ac.in Advanced Topics in Computing @ MNIT Lecture 1 (27 Aug 2015)
Problem Solving: Main Steps 1. Problem defini/on 2. Algorithm design/ Algorithm specifica/on 3. Algorithm analysis 4. Implementa/on 5. Tes/ng 6. [Maintenance] 27 Aug 2015 virendra@mnit 2
From Source to Executable source program foo.c main() sub1() data Compiler sta;c library libc.a prin9 scanf gets fopen exit data... object modules foo.o main sub1 data Linkage Editor load module a.out main sub1 data prin9 exit data Load ;me (Run Time) Loader Machine memory? other programs... main sub1 data prin9 exit data other...... kernel (system calls) Dynamic library case not shown 27 Aug 2015 virendra@mnit 3
Running Program on Processor Time Processor Performance = - - - - - - - - - - - - - - - Program Instructions Cycles = X X Program Instruction Time Cycle (code size) (CPI) (cycle time) Architecture - - > Implementa;on - - > Realiza;on Compiler Designer Processor Designer Chip Designer 27 Aug 2015 virendra@mnit 4
Iron Law Instruc/ons/Program Ø Instruc/ons executed, not sta/c code size Ø Determined by algorithm, compiler, ISA Cycles/Instruc/on Ø Determined by ISA and CPU organiza/on Ø Overlap among instruc/ons reduces this term Time/cycle Ø Determined by technology, organiza/on, clever circuit design 27 Aug 2015 virendra@mnit 5
What is a compiler? A program that reads a program wrisen in one language and translates it into another language. Source language Target language Tradi/onally, compilers go from high- level languages to low- level languages. 27 Aug 2015 virendra@mnit 6
Compilers Common compila/on tasks Language transla/on Error checking and report Performance improvement Fundamental compila/on principles ü The compiler must preserve the meaning of source program ü The compiler must improve the source program in some discernible way 27 Aug 2015 virendra@mnit 7
Compilers Evolution In the beginning, there was machine language Ugly wri/ng code, debugging Then came textual assembly s/ll used on DSPs High- level languages Fortran, Pascal, C, C++ Machine structures became too complex and so]ware management too difficult to con/nue with low- level languages 27 Aug 2015 virendra@mnit 8
Why are Compilers Important? Computer architecture Build processors that so]ware can be automa/cally mapped to efficiently Exploi/ng hardware features CAD tools Behavioral synthesis / C- to- gates tools are hardware compilers Use program analysis/op/miza/on to generate cheaper hardware So]ware developers How do I create a compiler? How does it map my code to the hardware 27 Aug 2015 virendra@mnit 9
Compiler Architecture In more detail: Intermediate Language Source Language Front End language specific Back End machine specific Target Language Separa/on of Concerns Retarge/ng 27 Aug 2015 virendra@mnit 10
Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Code Optimizer Code Generator Target language Symbol Table 27 Aug 2015 virendra@mnit 11
Translation of an assignment Translation of an assignment statement 27 Aug 2015 virendra@mnit 12
Lexical Analysis Character stream è token stream Recognize words of a language Theore/cal problem: specify and recognize paserns in strings Scanner as a prac/cal applica/on Regular expression, finite automata Tools that automa/cally generate scanners are commonly used Input: After scanning: index := start + step * 20 index := start + step * 20 identifier operator number 27 Aug 2015 virendra@mnit 13
Syntactical Analysis Token stream è syntax tree Recognize sentences of a language Grammars and parsers CFG Parsers can be automa/cally generated Top- down and bosom- up parsing Predic/ve parsing Driven process of compiler front- ends After scanning: index := start + step * 20 index After parsing: Assign ID := Exp ID start Exp + Exp Exp ID * Exp Num step 20 27 Aug 2015 virendra@mnit 14
Semantic Analysis The seman/c analyzer uses the syntax tree and the informa/on in the symbol table to check the source program for seman/c consistency with the language defini/on. Gathers type informa/on and saves it in either the syntax tree or the symbol table, for subsequent use during intermediate- code genera/on. An important part of seman/c analysis is type checking, where the compiler checks that each operator has matching operands. Ø For example, many programming language defini/ons require an array index to be an integer; the compiler must report an error if a floa/ng- point number is used to index an array. 27 Aug 2015 virendra@mnit 15
Semantic Analysis The language specifica/on may permit some type conversions called coercions. For example, a binary arithme/c operator may be applied to either a pair of integers or to a pair of floa/ng- point numbers. If the operator is applied to a floa/ng- point number and an integer, the compiler may convert or coerce the integer into a floa/ng- point number. 27 Aug 2015 virendra@mnit 16
Semantic Analysis Understand/annotate meaning of the program Syntax- directed transla/on Check seman/c errors Inconsistent variable defini/ons and uses Type systems Collect knowledge of the input program Symbol tables Scopes 27 Aug 2015 virendra@mnit 17
Compiler Architecture Intermediate Language Intermediate Language Source language Scanner (lexical analysis) tokens Parser (syntax analysis) Syntactic structure Semantic Analysis (IC generator) Code Optimizer Code Generator Target language Symbol Table 27 Aug 2015 virendra@mnit 18
General Structure of a Modern Compiler Source Program Lexical Analysis Scanner Context Symbol Table CFG Syntax Analysis Parser Front end Build high-level IR Semantic Analysis High-level IR to low-level IR conversion Controlflow/Dataflow Optimization Back end Assembly Code Code Generation Machine independent asm to machine dependent 27 Aug 2015 virendra@mnit 19
Multiple IRs Most compilers use 2 IRs: High- level IR (HIR): Language independent but closer to the language Low- level IR (LIR): Machine independent but closer to the machine A significant part of the compiler is both language and machine independent! C++ C Fortran AST optimize HIR optimize LIR optimize Pentium Java bytecode Itanium TI C5x ARM 27 Aug 2015 virendra@mnit 20
Thank You 27 Aug 2015 virendra@mnit 21