COMP455: COMPILER AND LANGUAGE DESIGN Dr. Alaa Aljanaby University of Nizwa Spring 2013
Chapter 1: Introduction Compilers draw together all of the theory and techniques that you ve learned about in most of your previous computer sciences courses. You will gain a deeper understanding of how compilers work, and be able to write better code. We will focusing on a little language - you will be writing a simple compiler or may be parts of it Chapter 1: introduction ٢
Chapter 1: introduction ٣
Compilers and Interpreters Source program Compiler Target program Source program Input Interpreter Output input Target program output Chapter 1: introduction ٤
The compiler A Compiler is a program that reads a program written in a language (Source language) and translates it into an equivalent program in another language (target language). An important role of the complier is to report any error it detects during the translation process. Chapter 1: introduction ٥
The interpreter It is another type of language processor, instead of producing a target program, an interpreter directly executes the operations specified in the source program on inputs supplied by the user. Chapter 1: introduction ٦
Language Processing System Chapter 1: introduction ٧
The structure of the compiler A compiler operates in a sequence of phases each phase transforms the source program from one representation to another. Chapter 1: introduction ٨
Chapter 1: introduction ٩
Structure of a Compiler character stream These steps are often done in phases or passes. This structure is very common. Each step will be a set of algorithms we ll explore. Symbol Table Lexical Analysis token stream Parsing syntax tree Semantic Analysis syntax tree Intermediate Code Generate intermediate code Optimization intermediate code Code Generation target machine code Front End Back End Chapter 1: introduction ١٠
Analysis Synthesis Model Chapter 1: introduction ١١
Lexical Analysis character stream Read the character stream and converts it into a stream of tokens Lexical Analysis token stream A sequential set of characters, called a lexeme, becomes a token. We re recognizing substrings that are meaningful. What is meaningful about this speed = speed + 10 * me Sort of like recognizing the words in a sentence. Chapter 1: introduction ١٢
Lexical Analysis The lexemes and their tokens will be determined Things that become lexemes: punctuation, symbols, keywords, constants, etc. The tool lex creates lexical analyzers Chapter 1: introduction ١٣
Lexemes for this string speed = speed + 10 * me We ll convert each of these into a token of the form <name, value>. Sometime the value will be omitted. speed becomes: <id, 1>, where id means this is a symbol and 1 is the location in the symbol table. 10 becomes: <constant, 10> (or just <10>) Symbol Table: Location Name 1 speed Chapter 1: introduction ١٤
Lexemes for this string speed = speed + 10 * me Lexical Analysis <id, 1> <=> <id,1> <+> <10> <*> <id, 2> Lexical Table: Symbol Table: Location Name 1 speed 2 time Lexeme Token Symbol Table Entry speed id 1 = ass speed id 1 + opr 10 num * opr time id 2 Chapter 1: introduction ١٥
Syntax Analysis token stream Converting the token stream into a syntax tree. Parsing syntax tree In a syntax tree, the nodes are operations and the children are the arguments to the operation. What are the operations and arguments here? <id, 1> <=> <id,1> <+> <10> <*> <id, 2> Sort of like diagramming a sentence in English class. Chapter 1: introduction ١٦
Grammar Rules Chapter 1: introduction ١٧
Parse Tree Chapter 1: introduction ١٨
Syntax Trees <id, 1> <=> <id,1> <+> <10> <*> <id, 2> Here s the assignment operation <=> <id, 1> <id,1> <+> <10> <*> <id, 2> Chapter 1: introduction ١٩
A complete syntax tree <id, 1> <=> <id,1> <+> <10> <*> <id, 2> Parsing <=> <id, 1> <+> <id,1> <*> <10> <id, 2> Symbol Table: Location Name 1 speed 2 time Chapter 1: introduction ٢٠
Semantic Analysis Semantics are the meaning of the programming language. Now we re going to analyze our syntax tree to see if it is, or can be converted, to a tree that semantically meaningful. Common checks: Valid arguments Type checking Semantic Analysis <=> syntax tree <id, 1> <+> <id,1> <*> <10> <id, 2> Symbol Table: Location Name Type 1 speed float 2 time float Chapter 1: introduction ٢١
Type Checking <=> We modify the syntax tree to fix semantic issues that are fixable What if there are not fixable? What s an example of something not fixable? <id, 1> <+> <id,1> <*> <inttofloat> Coercion <10> <id, 2> Symbol Table: Location Name Type 1 speed float 2 time float Chapter 1: introduction ٢٢
Semantic Analysis <=> <id, 1> <+> <id,1> <*> Semantic Analysis <=> <id, 1> <+> <10> <id, 2> <id,1> <*> <inttofloat> <id, 2> <10> Chapter 1: introduction ٢٣
Intermediate Code Generator syntax tree Intermediate Code Generate intermediate code Most compilers convert the syntax tree into some intermediate code. This is then subject to optimization and conversion to the final machine code. Why an intermediate code? Chapter 1: introduction ٢٤
Intermediate code example <=> t1 = in ofloat(10) t2 = t1 * id2 t3 = id1 + t2 id1 = t3 Each operation became a line of intermediate code. The t values are temporary variables. <id, 1> <+> <id,1> <*> <inttofloat> <10> <id, 2> The textbook refers to this as three address code. Each operation has up to 3 operands (some have fewer). Can you see the three operands in each of these statements? Chapter 1: introduction ٢٥
Intermediate code example <=> t1 = in ofloat(10) t2 = t1 * id2 t3 = id1 + t2 id1 = t3 <id, 1> <+> <id,1> <*> <inttofloat> <id, 2> t2 = t1 * id2 Operands are: t2, t1, id2 This like an assembly instruc on: mult t1, id2, t2 <10> t1 = in ofloat(10) Operands are: t1, 10 Chapter 1: introduction ٢٦
Optimization intermediate code t1 = in ofloat(10) t2 = t1 * id2 t3 = id1 + t2 id1 = t3 Optimization intermediate code Optimization: Making the code more efficient. Any optimization ideas here? Chapter 1: introduction ٢٧
Optimization t1 = in ofloat(10) t2 = t1 * id2 t3 = id1 + t2 id1 = t3 Optimization t2 = 10.0 * id2 id1 = id1 + t2 Chapter 1: introduction ٢٨
Code Generation intermediate code Translate the intermediate code into a target code. Code Generation target machine code t2 = 10.0 * id2 id1 = id1 + t2 Code Generation LDF R2, id2 MULF R2, #10.0 LDF R1, t2 ADDF R1, R2 STF id1, R1 Chapter 1: introduction ٢٩
Chapter 1: introduction ٣٠
Chapter 1: introduction ٣١
Cousins of the Compiler 1.Preprocessors It produce input to compiler, it may perform the following functions: Macro processing # define --------- File inclusion # include ------- Rational preprocessors: augment older language with modern control structures Language extensions e.g. : C uses # # to indicate data lease access statement that is embedded with in a C program. Chapter 1: introduction ٣٢
Cousins of the Compiler 2- Assemblers Assembly code is a mnemonic version of machine e.g. : b := a + 2 is the same as Load R1, a1 ADD R1, # 2 Store b, R1 Chapter 1: introduction ٣٣
Two Passes Assembler The simplest form of assemblers makes two phases over the input In the first pass, the identifiers are found and stored in a symbol table In the second pass, it translates operations & identifiers to binary codes & addresses. load: memory to register store : register to memory Chapter 1: introduction ٣٤
Example E.G.: Hypothetical machine with 4-bits instruction code 0001, 0010, 0011 stand for load, store, and Add. Address mode: 00 ordinary address modes: next 8-bits refer to memory address. 10 immediate mode: next 8-bits are constant. Chapter 1: introduction ٣٥
Example The equivalent machine code might be: inst. Code reg. no address mode address or value Load: 0001 01 00 00000000 Add: 0011 01 10 00000010 Store: 0010 01 00 00000100 Chapter 1: introduction ٣٦
Cousins of the Compiler 3. Loaders and Link- editors Loading means to take the re locatable machine code and placing the instructions and data in memory at the proper locations. The Link- editor allows making a single program from several files of re-locatable machine code. Chapter 1: introduction ٣٧
The Grouping of phases Front end: consist of phases that depend on source program and are independent of the target machine (first 4 phases) Back End: phases that depend on target machine Chapter 1: introduction ٣٨
Front end vs. Back end Chapter 1: introduction ٣٩
Reducing the no of passes Pass: several phases are usually implemented in a single pass reading input file and writing output file. it is desirable to have relatively few passes, since it takes time to read and write intermediate files. Chapter 1: introduction ٤٠
Compiler Construction Tools Scanner generator Lexical Analyzer Passer generators Syntax analyzer Syntax-directed translation engines Intermediate code generator automatic code generators - produce machine code. dataflow engines - code optimizing Chapter 1: introduction ٤١