COP4020 mig Laguages Compilers ad Iterpreters Prof. Robert va Egele
Overview Commo compiler ad iterpreter cofiguratios Virtual machies Itegrated developmet eviromets Compiler phases Lexical aalysis Sytax aalysis Sematic aalysis Itermediate (machie-idepedet) code geeratio Itermediate code optimizatio Target (machie-depedet) code geeratio Target code optimizatio COP4020 Fall 2016 2
Compilers versus Iterpreters The compiler versus iterpreter implemetatio is ofte fuzzy Oe ca view a iterpreter as a virtual machie that executes highlevel code Java is compiled to bytecode Java bytecode is iterpreted by the Java virtual machie (JVM) or traslated to machie code by a just-i-time compiler (JIT) A processor (CPU) ca be viewed as a implemetatio i hardware of a virtual machie (e.g. bytecode ca be executed i hardware) Some programmig laguages caot be purely compiled ito machie code aloe Some laguages allow programs to rewrite/add code to the code base dyamically Some laguages allow programs to traslate data to code for executio (iterpretatio) COP4020 Fall 2016 3
Compilers versus Iterpreters Compilers try to be as smart as possible to fix decisios that ca be take at compile time to avoid to geerate code that makes this decisio at ru time Type checkig at compile time vs. rutime Static allocatio Static likig Code optimizatio Compilatio leads to better performace i geeral Allocatio of variables without variable lookup at ru time Aggressive code optimizatio to exploit hardware features Iterpretatio facilitates iteractive debuggig ad testig Iterpretatio leads to better diagostics of a programmig problem Procedures ca be ivoked from commad lie by a user Variable values ca be ispected ad modified by a user COP4020 Fall 2016 4
Compilatio Compilatio is the coceptual process of traslatig source code ito a CPU-executable biary target code Compiler rus o the same platform X as the target code Source Compiler Target Debug o X Compile o X Iput Target Output Ru o X COP4020 Fall 2016 5
Cross Compilatio Compiler rus o platform X, target code rus o platform Y Source Cross Compiler Compile o X Target Copy to Y Debug o X (= emulate Y) Iput Target Ru o Y Output COP4020 Fall 2016 6
Iterpretatio Iterpretatio is the coceptual process of ruig highlevel code by a iterpreter Source Iput Iterpreter Output COP4020 Fall 2016 7
Virtual Machies A virtual machie executes a istructio stream i software Adopted by Pascal, Java, Smalltalk-80, C#, fuctioal ad logic laguages, ad some scriptig laguages Pascal compilers geerate P-code that ca be iterpreted or compiled ito object code Java compilers geerate bytecode that is iterpreted by the Java virtual machie (JVM) The JVM may traslate bytecode ito machie code by just-itime (JIT) compilatio COP4020 Fall 2016 8
Compilatio ad Executio o Virtual Machies Compiler geerates itermediate program Virtual machie iterprets the itermediate program Source Compiler Compile o X Itermediate Ru o VM Iput Virtual Machie Ru o X, Y, Z, Output COP4020 Fall 2016 9
Pure Compilatio ad Static Likig Adopted by the typical Fortra systems Library routies are separately liked (merged) with the object code of the program Source Compiler Icomplete Object Code exter pritf(); _pritf _fget _fsca Static Library Object Code Liker Biary Executable COP4020 Fall 2016 10
Compilatio, Assembly, ad Static Likig Facilitates debuggig of the compiler Source Compiler Assembly exter pritf(); Assembler _pritf _fget _fsca Static Library Object Code Liker Biary Executable COP4020 Fall 2016 11
Compilatio, Assembly, ad Dyamic Likig Dyamic libraries (DLL,.so,.dylib) are liked at ru-time by the OS (via stubs i the executable) Source exter pritf(); Compiler Assembly Assembler Shared Dyamic Libraries _pritf, _fget, _fsca, Iput Icomplete Executable Output COP4020 Fall 2016 12
Preprocessig Most C ad C++ compilers use a preprocessor to import header files ad expad macros Source Preprocessor Modified Source #iclude <stdio.h> #defie N 99 for (i=0; i<n; i++) Compiler for (i=0; i<99; i++) Assembly or Object Code COP4020 Fall 2016 13
The CPP Preprocessor Early C++ compilers used the CPP preprocessor to geerated C code for compilatio C++ Source Code C++ Preprocessor C Source Code C Compiler Assembly or Object Code COP4020 Fall 2016 14
Itegrated Developmet Eviromets mig tools fuctio together i cocert Editors Compilers/preprocessors/iterpreters Debuggers Emulators Assemblers Likers Advatages Tools ad compilatio stages are hidde Automatic source-code depedecy checkig Debuggig made simpler Editor with search facilities Examples Smalltalk-80, Eclipse, MS VisualStudio, Borlad COP4020 Fall 2016 15
Compilatio Phases ad Passes Compilatio of a program proceeds through a fixed series of phases Each phase use a (itermediate) form of the program produced by a earlier phase Subsequet phases operate o lower-level code represetatios Each phase may cosist of a umber of passes over the program represetatio Pascal, FORTRAN, C laguages desiged for oe-pass compilatio, which explais the eed for fuctio prototypes Sigle-pass compilers eed less memory to operate Java ad ADA are multi-pass COP4020 Fall 2016 16
Compiler Frot- ad Back-ed Frot ed aalysis Source program (character stream) Scaer (lexical aalysis) Tokes Parser (sytax aalysis) Parse tree Sematic Aalysis ad Itermediate Code Geeratio Abstract sytax tree or other itermediate form Back ed sythesis Abstract sytax tree or other itermediate form Machie- Idepedet Code Improvemet Modified itermediate form Target Code Geeratio Assembly or object code Machie-Specific Code Improvemet Modified assembly or object code COP4020 Fall 2016 17
Scaer: Lexical Aalysis Lexical aalysis breaks up a program ito tokes program gcd (iput, output); var i, j : iteger; begi read (i, j); while i <> j do if i > j the i := i - j else j := j - i; writel (i) ed. program gcd ( iput, output ) ; var i, j : iteger ; begi read ( i, j ) ; while i <> j do if i > j the i := i - j else j := i - i ; writel ( i ) ed. COP4020 Fall 2016 18
Cotext-Free Grammars A cotext-free grammar defies the sytax of a programmig laguage The sytax defies the sytactic categories for laguage costructs Statemets Expressios Declaratios Categories are subdivided ito more detailed categories A Statemet is a For-statemet If-statemet Assigmet <statemet> ::= <for-statemet> <if-statemet> <assigmet> <for-statemet> ::= for ( <expressio> ; <expressio> ; <expressio> ) <statemet> <assigmet> ::= <idetifier> := <expressio> COP4020 Fall 2016 19
Example: Micro Pascal <> ::= program <id> ( <id> <More_ids> ) ; <Block>. <Block> ::= <Variables> begi <Stmt> <More_Stmts> ed <More_ids> ::=, <id> <More_ids> ε <Variables> ::= var <id> <More_ids> : <Type> ; <More_Variables> ε <More_Variables> ::= <id> <More_ids> : <Type> ; <More_Variables> ε <Stmt> ::= <id> := <Exp> if <Exp> the <Stmt> else <Stmt> while <Exp> do <Stmt> begi <Stmt> <More_Stmts> ed <Exp> ::= <um> <id> <Exp> + <Exp> <Exp> - <Exp> COP4020 Fall 2016 20
Parser: Sytax Aalysis Parsig orgaizes tokes ito a hierarchy called a parse tree (more about this later) Essetially, a grammar of a laguage defies the structure of the parse tree, which i tur describes the program structure A sytax error is produced by a compiler whe the parse tree caot be costructed for a program COP4020 Fall 2016 21
Sematic Aalysis Sematic aalysis is applied by a compiler to discover the meaig of a program by aalyzig its parse tree or abstract sytax tree Static sematic checks are performed at compile time Type checkig Every variable is declared before used Idetifiers are used i appropriate cotexts Check subroutie call argumets Check labels Dyamic sematic checks are performed at ru time, ad the compiler produces code that performs these checks Array subscript values are withi bouds Arithmetic errors, e.g. divisio by zero Poiters are ot derefereced uless poitig to valid object A variable is used but has't bee iitialized Whe a check fails at ru time, a exceptio is raised COP4020 Fall 2016 22
Sematic Aalysis ad Strog Typig A laguage is strogly typed "if (type) errors are always detected" Errors are either detected at compile time or at ru time Examples of such errors are listed o previous slide Laguages that are strogly typed are Ada, Java, ML, Haskell Laguages that are ot strogly typed are Fortra, Pascal, C/C+ +, Lisp Strog typig makes laguage safe ad easier to use, but potetially slower because of dyamic sematic checks I some laguages, most (type) errors are detected late at ru time which is detrimetal to reliability e.g. early Basic, Lisp, Prolog, some script laguages COP4020 Fall 2016 23
Code Geeratio ad Itermediate Code Forms A typical itermediate form of code produced by the sematic aalyzer is a abstract sytax tree (AST) The AST is aotated with useful iformatio such as poiters to the symbol table etry of idetifiers Example AST for the gcd program i Pascal COP4020 Fall 2016 24
Target Code Geeratio ad Optimizatio The AST with the aotated iformatio is traversed by the compiler to geerate a low-level itermediate form of code, close to assembly This machie-idepedet itermediate form is optimized From the machie-idepedet form assembly or object code is geerated by the compiler This machie-specific code is optimized to exploit specific hardware features COP4020 Fall 2016 25